Feature #60
closed
provide download statistics
Added by Florian Effenberger over 10 years ago.
Updated almost 10 years ago.
Description
We need comprehensive and detailed download statistics, based on MirrorBrain's log
Data required is at least
- geography
- platform/OS
- architecture/CPU
- version
- language
Here the details from the initial request:
- automate weekly graphical download statistics (source is in git: dev-tools/download-stats.pl)
- impossible slow, and consumes vast amounts of memory
- we essentially need to inline merge_results into parse_log and store a cache of the results of each file next to it - so that we do this expensive analysis per log file just once
- this should let us build incremental updates reasonably easily
- add random cookie generation to track real users vs. automatic IPs
I will give feedback soon - we have one approach in the pipeline right now that might do, will keep you posted
- Description updated (diff)
- Assignee deleted (
Christian Lohmaier)
Status update:
Main problem with current tools is that they are incredibly slow and suck huge amounts of RAM.
A third-party tool might solve that - we're currently investigating whether that can output in detail what we need, especially in spreadsheet form.
Based on this outcome, we either need to do not much on our own, or:
1. Fix the current scripting
2. Come up with something completely different (e.g. via Piwik or Alex' Django approach)
- Assignee set to Christian Lohmaier
- Assignee changed from Christian Lohmaier to Florian Effenberger
- Start date deleted (
2014-01-14)
Some feedback from FOSDEM: MirrorBrain's mod_stats might do it. It already provides a framework for direct database logging for each download, plus a script to parse afterwards if needed (which, however, is slow). The module itself needs some more development work, mostly to make it run with the flexible RegExps rather than hardcoded file names. http://mirrorbrain.org/download-statistics/ might be helpful as well.
Still in evaluation which method is best, but that might be a promising approach.
- Assignee changed from Florian Effenberger to Alexander Werner
- Priority changed from Normal to High
- Category set to Webserver
Shot one more e-mail to the toolkit author who previously offered help. Should I get back no reply within 10 days, let's finally put this on Alex' or Cloph's table, with a concrete ETA, to finally solve it.
So, we should go on implementing on our own. Let's discuss details soon.
Alex, you mentioned it's on your near-time agenda. Any details or ETA?
- Status changed from Feedback to In Progress
At the moment defining data models, deciding if using regexps or a dedicated logparser, etc pp.... ETA at least one week if doing nothing else...
Thanks for the update! Given you have some other more items on the
table, the ETA then is rather 2-4 weeks realistically, to not block on
other tasks. :-)
Thanks for your work here!
- % Done changed from 0 to 70
Currently waiting for the release of django 1.7, on the same day tdc should be ready.
- Status changed from In Progress to Closed
Also available in: Atom
PDF