gziptest.sh part 2: multi-threaded compression benchmarks
After Part 1 compression tests, I further improved my gziptest.sh script and added three other compression tools to the test, zip, lbzip2 and p7zip. Update: For a more recent live production compression comparison benchmark test between pigz vs gzip on dual Xeon E5520 with 8 physical cpu cores and 16 cpu threads, read here.
The full list of tested compression methods:
- zip v3.0
- gzip v1.3.5 – http://www.gzip.org/
- bzip2 v1.03 – http://bzip.org/
- pigz v2.1.6 – multi-threaded version of gzip http://www.zlib.net/pigz
- pbzip2 v1.1.6 – multi-threaded version of bzip2 http://compression.ca/pbzip2
- lbzip2 – multi-threaded version of bzip2 https://github.com/kjn/lbzip2
- lzip v1.1.3 rc1 – based on LZMA compression algorithm http://www.nongnu.org/lzip/lzip.html
- plzip v0.80 rc1 – multi-threaded version of lzip http://www.nongnu.org/lzip/plzip.html
- p7zip – linux version of 7-zip multi-threaded for bzip2 and 7z (LZMA) only http://p7zip.sourceforge.net
I ran two comparison test – and measured compress and decompression times, along with cpu and memory usage and compression ratios (how small the compressed files are in relation to the original file size). This time I tested on a live CentOS 6.0 32bit VPS server to get real numbers:
1). on a vB MySQL sql backup file (626,613 KB) for compression levels 1 to 3
2). on tar compressed /usr directory on the server, usr.tar (1,536,030 KB) for compression levels 1 to 5
Test Server configuration:
- Xeon X3450 VPS – 4 cpu thread allocation (2 physical + 2 virtual) equal share
- 1GB DDR3 guaranteed and 2GB DDR3 burstable
- 30GB EXT4 disk space
- CentOS 6.0 32bit OpenVZ based
MySQL sql backup file gziptest.sh results:
- For compression speed, pigz is hands down the fastest – utilising multi-threaded gzip compression resulted in 1.9x to 2.9x speed up compared to gzip. But compression ratio like single threaded zip and gzip, was the worst with largest compressed file sizes. Level 3 compression ratio was ~23%.
- Bzip2, pbzip2 and lbzip2 all have much better compression ratios compared to gzip but at the expense of much slower compression times. At level 3, gzip 23% vs bzip2, pbzip2 and lbzip2 14.4% compression ratios.
- For bzip2 comparison against two multi-threaded bzip2 alternatives, pbzip2 and lbzip2, lbzip2 is definitely the fastest in terms of bzip2 compression but also has one of the highest peak memory consumption levels – second place behind multi-threaded lzip, plzip which had the highest memory consumption. At compression level 3, lbzip2 was 5.55x times faster than bzip2 and 1.68x times faster than pbzip2. While pbzip2 was 3.29x times faster than than bzip2.
- Note, lbzip2 for decompression used alot more memory to the extent of getting errors: lbzip2: xalloc: Not enough space when decompression used default lbzip2 -d which would run 4 threads. I had to reduce decompression to using 2 threads via lbzip2 -d -n2 for above tests. This led to slightly slower decompression times for lbzip2 compared to pbzip2 at level 1 and 2 but was faster at level 3 despite only using 2 cpu threads at 1.61x times faster for lbzip2 compression + decompression times than pbzip2. I didn’t have any problems on my Virtualbox 1GB server with using default all 4 threads for lbzip2. Could be related to this OpenVZ VPS which counts virtual memory as actually used memory which could lead to much higher memory consumption ?
- Lzip and plzip used the most memory but also produced the best compression ratios at 8.1% and 9.1% respectively. That’s nearly a 1/3 the compressed size of gzip compressed files or nearly 1/2 the compressed size of bzip2 compression format. Multi-threaded lzip, plzip is 3.16x times faster than single threaded lzip but looses some ground in terms of compression ratios at level 3 compression – plzip 9.1% vs lzip 8.1%.
- For 7zip linux version, p7zip, I ran two tests for bzip2 and 7z (LZMA) compression formats. The version I used should of have multi-threaded support for bzip2 and LZMA (7z) files but from cpu utilisation numbers you can clearly see LZMA compressed only hit 99% maxing out 1 cpu thread and the compression, decompression times and compression ratio was about the same for level 1 and 2. Only at level 3 compression did compression ratios drop from 15% to 9.1% while the compression times remained the same ~41 seconds but decompression times halved at level 3. If you need good compression ratios but less memory consumption, then p7zip LZMA (7z) compression method at level 3 would be a good choice matching plzip level 3 at 9.1% compression ratios but 20% slower in compression times and twice slower for decompression times.
- For p7zip bzip2 tests, you can see multi-threaded implementation at work with cpu utilisation hitting ~360% for 4 cpu thread VPS server. However, p7zip bzip2 compression was slower than pbzip2 and lbzip2 – 1.61x and 2.73x times slower respectively. But p7zip bzip2 was 2.03x times faster than single threaded bzip2 at level 3 compression. But memory usage wise p7zip bzip2 used alot less memory compared to pbzip2 and lbzip2. Especially when it came to decompression peak memory usage – p7zip bzip2 38MB vs pbzip2 403MB vs lbzip2 348MB.
- For LZMA (7Z) compression in general, there’s a limitation in what it can be used for while mysql database sql file back ups compression would be fine I guess – as per 7z Wiki page quoted below:
1). The 7z format does not store UNIX owner/group permissions, and hence can be inappropriate for backup/archival purposes. A workaround for this is to convert data to a tar bitstream before compressing with 7z. But it is worth noting that GNU tar (common in many UNIX environments) can also compress with the LZMA algorithm natively, without the use of 7z, and that in this case the suggested file extension for the archive is “.tar.lzma” (or just “.tlz”), and not “.tar.7z”.
2). The 7z format does not allow extraction of some “broken files” — that is (for example) if one has the first segment of a series of 7z files, 7z cannot give the start of the files within the archive — it must wait until all segments are downloaded. The format 7z also lacks recovery records, which might be a problem when limited file corruption has occurred.
MySQL sql backup file gziptest.sh Summary:
- So for fast compression speeds at expense of compression ratio, the choice is pigz.
- If you want better compression ratios but decent memory usage, bzip2. If you have more memory at hand, then alternative to bzip2 would be lbzip2.
- If you want the best compression ratios and have adequate memory resources, then plzip would be the pick with resulting compressed file sizes at 1/3 of gzip and zip compressed file sizes. Even gzip -9 max compression level only gives you ~19% compression ratio compared to plzip 9.1%. Definitely will look at adding plzip and lbzip2 as additional compression options for my mysqlmybackup.sh script.
Next page will show results for tar compressed /usr partition and then on last pages will show raw benchmark results.