c440ac9c20
1 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Ondrej Dubaj
|
fabbdc87f0 |
Add support for IBM Z hardware-accelerated deflate
Future versions of IBM Z mainframes will provide DFLTCC instruction, which implements deflate algorithm in hardware with estimated compression and decompression performance orders of magnitude faster than the current zlib and ratio comparable with that of level 1. This patch adds DFLTCC support to zlib. In order to enable it, the following build commands should be used: $ CFLAGS=-DDFLTCC ./configure $ make OBJA=dfltcc.o PIC_OBJA=dfltcc.lo When built like this, zlib would compress in hardware on level 1, and in software on all other levels. Decompression will always happen in hardware. In order to enable DFLTCC compression for levels 1-6 (i.e. to make it used by default) one could either add -DDFLTCC_LEVEL_MASK=0x7e at compile time, or set the environment variable DFLTCC_LEVEL_MASK to 0x7e at run time. Two DFLTCC compression calls produce the same results only when they both are made on machines of the same generation, and when the respective buffers have the same offset relative to the start of the page. Therefore care should be taken when using hardware compression when reproducible results are desired. One such use case - reproducible software builds - is handled explicitly: when SOURCE_DATE_EPOCH environment variable is set, the hardware compression is disabled. DFLTCC does not support every single zlib feature, in particular: * inflate(Z_BLOCK) and inflate(Z_TREES) * inflateMark() * inflatePrime() * deflateParams() after the first deflate() call When used, these functions will either switch to software, or, in case this is not possible, gracefully fail. This patch tries to add DFLTCC support in a least intrusive way. All SystemZ-specific code was placed into a separate file, but unfortunately there is still a noticeable amount of changes in the main zlib code. Below is the summary of those changes. DFLTCC takes as arguments a parameter block, an input buffer, an output buffer and a window. Since DFLTCC requires parameter block to be doubleword-aligned, and it's reasonable to allocate it alongside deflate and inflate states, ZALLOC_STATE, ZFREE_STATE and ZCOPY_STATE macros were introduced in order to encapsulate the allocation details. The same is true for window, for which ZALLOC_WINDOW and TRY_FREE_WINDOW macros were introduced. While for inflate software and hardware window formats match, this is not the case for deflate. Therefore, deflateSetDictionary and deflateGetDictionary need special handling, which is triggered using the new DEFLATE_SET_DICTIONARY_HOOK and DEFLATE_GET_DICTIONARY_HOOK macros. deflateResetKeep() and inflateResetKeep() now update the DFLTCC parameter block, which is allocated alongside zlib state, using the new DEFLATE_RESET_KEEP_HOOK and INFLATE_RESET_KEEP_HOOK macros. In order to make unsupported deflateParams(), inflatePrime() and inflateMark() calls to fail gracefully, the new DEFLATE_PARAMS_HOOK, INFLATE_PRIME_HOOK and INFLATE_MARK_HOOK macros were introduced. The algorithm implemented in hardware has different compression ratio than the one implemented in software. In order for deflateBound() to return the correct results for the hardware implementation, the new DEFLATE_BOUND_ADJUST_COMPLEN and DEFLATE_NEED_CONSERVATIVE_BOUND macros were introduced. Actual compression and decompression are handled by the new DEFLATE_HOOK and INFLATE_TYPEDO_HOOK macros. Since inflation with DFLTCC manages the window on its own, calling updatewindow() is suppressed using the new INFLATE_NEED_UPDATEWINDOW() macro. In addition to compression, DFLTCC computes CRC-32 and Adler-32 checksums, therefore, whenever it's used, software checksumming needs to be suppressed using the new DEFLATE_NEED_CHECKSUM and INFLATE_NEED_CHECKSUM macros. DFLTCC will refuse to write an End-of-block Symbol if there is no input data, thus in some cases it is necessary to do this manually. In order to achieve this, send_bits, bi_reverse, bi_windup and flush_pending were promoted from local to ZLIB_INTERNAL. Furthermore, since block and stream termination must be handled in software as well, block_state enum was moved to deflate.h. Since the first call to dfltcc_inflate already needs the window, and it might be not allocated yet, inflate_ensure_window was factored out of updatewindow and made ZLIB_INTERNAL. |