Project

General

Profile

Actions

Task #2981

closed

Help Online: Optionally compress static pages

Added by Guilhem Moulin about 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Team - Q1/2020
Start date:
Due date:
% Done:

0%

Tags:
Documentation

Description

Help online packages are rather large, with 1.8GiB per release. Xapian indices (for full text search, cf. #2555), including spelling/stemming suggestions, add another 1.9GiB per release on top of that, which raise some concerns in term of salability and sustainability from an infrastructure perspective.

6.4 has 164735 files right now; here are the extensions weighing ≥4MiB:

ext   #files  avg. size   tot size
----  ------  ---------  ---------
svg     4341    1.16kiB    4.91MiB
ods      329   20.53kiB    6.59MiB
png      724   13.88kiB    9.81MiB
js       204  276.81kiB   55.96MiB
html  159061    9.43kiB 1464.93MiB

Some of these files (html, js, svg, css) have a fairly high compression ratio. In fact all modern browsers send Accept-Encoding: gzip headers in their requests, causing the HTTPd to compress on the fly the payload, which on reception is decompressed by the client. Saving traffic, but not space. (And causing the HTTPd some overhead due to the extra processing.)

Instead, I would like to store these files gzipped on the server. Aside from saving space, this has a number of advantages:
  • compression is done once and for all on Olivier Hallot 's workstation, meaning less work to be done on the HTTPd side (hence faster processing time);
  • since compression isn't done on the fly one can safely use more aggressive options (compression level) without risk of DoS'ing ourselves; and
  • The HTTPd can safely add a Content-Length header to the response (this is not possible for pipelined compression since the server doesn't know the size of the payload by the time it writes the header part).

For the few browsers not supporting gzip or not sending Accept-Encoding: gzip in the request, the requested file, stored compressed on the server, would be decompressed on the fly by the server, and the decompressed payload is served as is (without Content-Length header). So pretty much the opposite of what's performed right now.

Concretely, what I request is a flag to optionally run

find /path/to/6.4 -type f \
\( -name "*.css" -o -name "*.html" -o -name "*.js" -o -name "*.svg" \) \
\! -size -128c \
-print0 | xargs -r0 gzip -n

After a successful build (symbolic links require some extra care: if the target is compressed, then the link name should be removed and replaced — targeting the .gz counterpart — with a .gz suffix).

I.e., compress (with gzip(1)'s default options) these files. But only when exceeding 128 bytes.

Maybe the list of extensions to compress and the compression threshold (128 bytes) could be specified by the flag.

I'll take care of the server configuration. (In fact I already have a PoC for 6.4.) That requires a new location{} block, and since we already had to add one for 6.4 (for #2555) it's best for the infra team if that flag would be added to 6.4 as well.

Actions

Also available in: Atom PDF