To Zip or not to Zip: effective resource usage for real-time compression

نویسندگان

  • Danny Harnik
  • Ronen I. Kat
  • Oded Margalit
  • Dmitry Sotnikov
  • Avishay Traeger
چکیده

Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponentially, but adding compression on the data path consumes scarce CPU and memory resources on the storage system. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield significant space savings when compressed. The first level of filtering that we employ is at the data set level (e.g., volume or file system), where we estimate the overall compressibility of the data at rest. According to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a second level of finer-grained filtering. The second filtering scheme examines data being written to the storage system in an online manner and determines its compressibility. The first-level filtering runs in mere minutes while providing mathematically proven guarantees on its estimates. In addition to aiding in selecting which volumes to compress, it has been released as a public tool, allowing potential customers to determine the effectiveness of compression on their data and to aid in capacity planning. The second-level filtering has shown significant CPU savings (up to 35%) while maintaining compression savings (within 2%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lossless Compression Techniques for Maskless Lithography Data

Future lithography systems must produce more dense chips with smaller feature sizes, while maintaining the throughput of one wafer per sixty seconds per layer achieved by today’s optical lithography systems. To achieve this throughput with a direct-write maskless lithography system, using 25 nm pixels for 50 nm feature sizes, requires data rates of about 10 Tb/s. In a previous paper, we present...

متن کامل

Wrap&Zip decompression

The Edgebreaker compression [16, 11] is guaranteed to encode any unlabeled triangulated planar graph of t triangles with at most 1.84t bits. It stores the graph as a CLERS stringa sequence of t symbols from the set {C,L,E,R,S}, each represented by a 1, 2 or 3 bit code. We show here that, in practice, the string can be further compressed to between 0.91t and 1.26t bits using an entropy code. Th...

متن کامل

Advanced low-complexity compression for maskless lithography data

A direct-write maskless lithography system using 25nm for 50nm feature sizes requires data rates of about 10 Tb/s to maintain a throughput of one wafer per minute per layer achieved by today’s optical lithography systems. In a previous paper, we presented an architecture that achieves this data rate contingent on 25 to 1 compression of lithography data, and on implementation of a real-time deco...

متن کامل

Effective compression algorithms for pulsed thermography data

Two compression algorithms for the image sequences generated by pulsed-transient thermography for non-destructive testing were developed. The first algorithm allows to balance the quality of the original measurement data reproduction against the compression ratio. This algorithm comprises a dedicated space/time mapping (STM) method and an image compression algorithm (JPEG2000). The second algor...

متن کامل

The ZPAQ Compression Algorithm

ZPAQ is a tool for creating compressed archives and encrypted user-level incremental backups with rollback capability. It deduplicates any new or modified files by splitting them into fragments along content-dependent boundaries and comparing their cryptographic hashes to previously stored fragments. Unmatched fragments are grouped by file type and packed into blocks and either stored or compre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013