Tuesday, May 31, 2011

Web server on-demand compression

Today we enabled mod_deflate on Apache (the primary algorithm used in ZIP and gzip). This allows for compression of our web pages prior to being sent to the client. Virtually all web browsers now support compression, so it seems absurd not to use it. Web pages, being text, are highly compressible - often achieving 70-80% compression ratios with lossless LZ77/LZSS derivatives like deflate. With fast CPUs, the compression overhead is extremely low. Decompression on the client end is of course even more rapid than compression. The reduction in transfer time far outweighs the cost of compression.

Want to test to see if your favorite web site uses compression, and how much time and bandwidth are saved? Some web browsers will tell you, but I discovered this little online tool that may make it easy for you: http://www.gidnetwork.com/tools/gzip-test.php .

Testing a few sites, it seems that the IIS community (our server was IIS before switching to linux) has still not adopted compression, despite IIS supporting it as of v6.0 [ref]. There could be a technical or backwards compatibility reason for this, though I'm not aware of it.

Of course, Apache has supported compression for many years via mod_deflate. I believe it was the first major server to implement it. It took a little time for all browsers to support it, but in the case of browsers that don't, the content is simply sent uncompressed (plaintext). Support of compression is indicated in the headers the browser supplies to the web server, specifically the Accept-Encoding parameter [see your browser headers].

It is true that some WAN connections compress data themselves. Telephony modems for instance long ago implemented on-the-fly streaming compression and error correction (using things like hamming code). Most modern cable and ADSL modems support compression and error correction as well. However, this compression is done by your ISP, meaning the data spends most of its network life in uncompressed form, slowing the transfer.

One good side-effect of compression on the server end is that it turns plaintext into compressed data. While it is not secure, it is not readable without decompressing the entire page. This can help deter sniffing a little even for unencrypted connections. Another benefit is that the browser will be sure to wait until the page is done transferring in its entirety before rendering it, making it appear to snap onto the screen. Although most web browsers implement a load delay to achieve this same effect, you'll still often see the annoying page position adjustments as it renders the content as it comes in.

Note that when enabling mod_deflate or compression in IIS, you want to limit it to MIME types that are compressible. While it *should* handle pre-compressed data like ZIP archives and most image formats fine, it will not be able to reduce their size, so it is only wasting CPU cycles. In cases where the post-compressed data is larger than the original, I believe it emits an error indicating such and sends the original content. However, I should note that during a few hours where I had mod_deflate enabled for all files, I got a report that downloaded ZIP archives were corrupt. I couldn't reproduce this on my own systems, and I'm not sure what browser the client was using. Never-the-less, one should be cautious.

In the end, if your favorite site isn't using compression -- encourage them to do so. Why in the world wouldn't they?

No comments:

Post a Comment