To Zip or not to Zip: effective resource usage for real-time compression
نویسندگان
چکیده
Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponentially, but adding compression on the data path consumes scarce CPU and memory resources on the storage system. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield significant space savings when compressed. The first level of filtering that we employ is at the data set level (e.g., volume or file system), where we estimate the overall compressibility of the data at rest. According to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a second level of finer-grained filtering. The second filtering scheme examines data being written to the storage system in an online manner and determines its compressibility. The first-level filtering runs in mere minutes while providing mathematically proven guarantees on its estimates. In addition to aiding in selecting which volumes to compress, it has been released as a public tool, allowing potential customers to determine the effectiveness of compression on their data and to aid in capacity planning. The second-level filtering has shown significant CPU savings (up to 35%) while maintaining compression savings (within 2%).
منابع مشابه
Lossless Compression Techniques for Maskless Lithography Data
Future lithography systems must produce more dense chips with smaller feature sizes, while maintaining the throughput of one wafer per sixty seconds per layer achieved by today’s optical lithography systems. To achieve this throughput with a direct-write maskless lithography system, using 25 nm pixels for 50 nm feature sizes, requires data rates of about 10 Tb/s. In a previous paper, we present...
متن کاملWrap&Zip decompression
The Edgebreaker compression [16, 11] is guaranteed to encode any unlabeled triangulated planar graph of t triangles with at most 1.84t bits. It stores the graph as a CLERS stringa sequence of t symbols from the set {C,L,E,R,S}, each represented by a 1, 2 or 3 bit code. We show here that, in practice, the string can be further compressed to between 0.91t and 1.26t bits using an entropy code. Th...
متن کاملAdvanced low-complexity compression for maskless lithography data
A direct-write maskless lithography system using 25nm for 50nm feature sizes requires data rates of about 10 Tb/s to maintain a throughput of one wafer per minute per layer achieved by today’s optical lithography systems. In a previous paper, we presented an architecture that achieves this data rate contingent on 25 to 1 compression of lithography data, and on implementation of a real-time deco...
متن کاملEffective compression algorithms for pulsed thermography data
Two compression algorithms for the image sequences generated by pulsed-transient thermography for non-destructive testing were developed. The first algorithm allows to balance the quality of the original measurement data reproduction against the compression ratio. This algorithm comprises a dedicated space/time mapping (STM) method and an image compression algorithm (JPEG2000). The second algor...
متن کاملThe ZPAQ Compression Algorithm
ZPAQ is a tool for creating compressed archives and encrypted user-level incremental backups with rollback capability. It deduplicates any new or modified files by splitting them into fragments along content-dependent boundaries and comparing their cryptographic hashes to previously stored fragments. Unmatched fragments are grouped by file type and packed into blocks and either stored or compre...
متن کامل