Project

General

Profile

Actions

Feature #60

closed

provide download statistics

Added by Florian Effenberger over 10 years ago. Updated almost 10 years ago.

Status:
Closed
Priority:
High
Category:
Webserver
Target version:
-
Start date:
Due date:
% Done:

70%

Tags:

Description

We need comprehensive and detailed download statistics, based on MirrorBrain's log
Data required is at least

  • geography
  • platform/OS
  • architecture/CPU
  • version
  • language

Here the details from the initial request:

  • automate weekly graphical download statistics (source is in git: dev-tools/download-stats.pl)
  • impossible slow, and consumes vast amounts of memory
  • we essentially need to inline merge_results into parse_log and store a cache of the results of each file next to it - so that we do this expensive analysis per log file just once
  • this should let us build incremental updates reasonably easily
  • add random cookie generation to track real users vs. automatic IPs
Actions #1

Updated by Florian Effenberger over 10 years ago

I will give feedback soon - we have one approach in the pipeline right now that might do, will keep you posted

Actions #2

Updated by Florian Effenberger over 10 years ago

  • Description updated (diff)
Actions #3

Updated by Florian Effenberger over 10 years ago

  • Assignee deleted (Christian Lohmaier)

Status update:
Main problem with current tools is that they are incredibly slow and suck huge amounts of RAM.
A third-party tool might solve that - we're currently investigating whether that can output in detail what we need, especially in spreadsheet form.

Based on this outcome, we either need to do not much on our own, or:

1. Fix the current scripting
2. Come up with something completely different (e.g. via Piwik or Alex' Django approach)

Actions #4

Updated by Florian Effenberger over 10 years ago

  • Assignee set to Christian Lohmaier
Actions #5

Updated by Florian Effenberger about 10 years ago

  • Assignee changed from Christian Lohmaier to Florian Effenberger
  • Start date deleted (2014-01-14)
Actions #6

Updated by Florian Effenberger about 10 years ago

Some feedback from FOSDEM: MirrorBrain's mod_stats might do it. It already provides a framework for direct database logging for each download, plus a script to parse afterwards if needed (which, however, is slow). The module itself needs some more development work, mostly to make it run with the flexible RegExps rather than hardcoded file names. http://mirrorbrain.org/download-statistics/ might be helpful as well.

Still in evaluation which method is best, but that might be a promising approach.

Actions #7

Updated by Alexander Werner about 10 years ago

  • Assignee changed from Florian Effenberger to Alexander Werner
  • Priority changed from Normal to High
Actions #8

Updated by Alexander Werner about 10 years ago

  • Category set to Webserver
Actions #9

Updated by Florian Effenberger about 10 years ago

Shot one more e-mail to the toolkit author who previously offered help. Should I get back no reply within 10 days, let's finally put this on Alex' or Cloph's table, with a concrete ETA, to finally solve it.

Actions #10

Updated by Florian Effenberger about 10 years ago

So, we should go on implementing on our own. Let's discuss details soon.

Actions #11

Updated by Florian Effenberger about 10 years ago

Alex, you mentioned it's on your near-time agenda. Any details or ETA?

Actions #12

Updated by Alexander Werner about 10 years ago

  • Status changed from Feedback to In Progress

At the moment defining data models, deciding if using regexps or a dedicated logparser, etc pp.... ETA at least one week if doing nothing else...

Actions #13

Updated by Florian Effenberger about 10 years ago

Thanks for the update! Given you have some other more items on the
table, the ETA then is rather 2-4 weeks realistically, to not block on
other tasks. :-)

Thanks for your work here!

Actions #14

Updated by Alexander Werner about 10 years ago

  • % Done changed from 0 to 70

Currently waiting for the release of django 1.7, on the same day tdc should be ready.

Actions #15

Updated by Alexander Werner almost 10 years ago

  • Status changed from In Progress to Closed
Actions

Also available in: Atom PDF