Chapter 3 Normalized Information Distance

نویسندگان

  • Paul M. B. Vitányi
  • Frank J. Balbach
  • Rudi L. Cilibrasi
  • Ming Li
چکیده

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, especially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation. The typical data mining algorithm uses explicitly given features of the data to as-AQ: Please specify corresponding Author. sess their similarity and discover patterns among them. It also comes with many parameters for the user to tune to specific needs according to the domain at hand.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wid...

متن کامل

Data Clustering and Graph-Based Image Matching Methods

This thesis describes our novel methods for data clustering, graph characterizing and image matching. In Chapter 3, our main contribution is the M1NN agglomerative clustering method with a new parallel merging algorithm. A cluster characterizing quantity is derived from the path-based dissimilarity measure. In Chapter 4, our main contribution is the modified log-likelihood model for quantitativ...

متن کامل

REVIEW SECTION

A Look at Contemporary Persian Poetry, Currents in Persian Poetry in 20th Century This book is a historical survey of literature though the writer has tried to distance himself from ancient approaches and to apply a modern look of analysis, critique and stylistics. In the first chapter the methodology is discussed followed by the second chapter which talks of text and metatext and the relation...

متن کامل

Persian sign language detection based on normalized depth image information

There are many reports of using the Kinect to detect hand and finger gestures after release of device by Microsoft. The depth information is mostly used to separate the hand image in the two-dimension of RGB domain. This paper proposes a method in which the depth information plays a more dominant role. Using a threshold in depth space first the hand template is extracted. Then in 3D domain the ...

متن کامل

Study of Driver Performance/ Acceptance Using Aspheric Mirrors In Light Vehicle Applications

............................................................................................................................... xvii ACKNOWLEDGMENTS ........................................................................................................... xix EXECUTIVE SUMMARY ......................................................................................................... xxi Informat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008