Guidelines for Using Compare-by-hash

نویسندگان

  • Val Henson
  • Richard Henderson
چکیده

Recently, a new technique called compare-by-hash has become popular. Compare-by-hash is a method of content-based addressing in which data is identified only by the cryptographic hash of its contents. Hash collisions are ignored, with the justification that they occur less often than many kinds of hardware errors. Compare-by-hash is a powerful, versatile tool in the software architect’s bag of tricks, but it is also poorly understood and frequently misused. The consequences of misuse range from significant performance degradation to permanent, unrecoverable data corruption or loss. The proper use of compare-by-hash is a subject of debate[10, 29], but recent results in the field of cryptographic hash function analysis, including the breaking of MD5[28] and SHA-0[12] and the weakening of SHA-1[3], have clarified when compare-by-hash is appropriate. In short, compare-by-hash is appropriate when it provides some benefit (performance, code simplicity, etc.), when the system can survive intentionally generated hash collisions, and when hashes can be thrown away and regenerated at any time. In this paper, we propose and explain some simple guidelines to help software architects decide when to use compare-by-hash.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Hash Function Based on the Tillich-Zémor Hash Function

Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.

متن کامل

Compressed Image Hashing using Minimum Magnitude CSLBP

Image hashing allows compression, enhancement or other signal processing operations on digital images which are usually acceptable manipulations. Whereas, cryptographic hash functions are very sensitive to even single bit changes in image. Image hashing is a sum of important quality features in quantized form. In this paper, we proposed a novel image hashing algorithm for authentication which i...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

An Analysis of Compare-by-hash

Recent research has produced a new and perhaps dangerous technique for uniquely identifying blocks that I will call compare-by-hash. Using this technique, we decide whether two blocks are identical to each other by comparing their hash values, using a collision-resistant hash such as SHA-1[5]. If the hash values match, we assume the blocks are identical without further ado. Users of compare-by-...

متن کامل

Hash challenges: Stretching the limits of compare-by-hash in distributed data deduplication

We propose a technique for reducing communication overheads when sending data across a network. Our technique, called hash challenges, leverages existing deduplication solutions based on compare-by-hash by being able to determine redundant data chunks by exchanging substantially less meta-data. Hash challenges can be used directly on any existing compare-by-hash protocol, with no relevant addit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004