Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Study of Near Duplicate Content: Identification of Categories Generating Maximum Duplicate URL in Results

The study of identification of near duplicate content involves identifying search categories which generate same URL in a query result. These categories are needed to be identified so that results can be improved by removing duplicate URL. Generating same URL in results irritates the user and it also decreases priority of other URL. These URL displayed on second or third page which user do not ...

متن کامل

Content-Based Keyframe Clustering Using Near Duplicate Keyframe Identification

In this paper, the authors propose an effective content-based clustering method for keyframes of news video stories using the Near Duplicate Keyframe (NDK) identification concept. Initially, the authors investigate the near-duplicate relationship, as a content-based visual similarity across keyframes, through the Near-Duplicate Keyframe (NDK) identification algorithm presented. The authors assi...

متن کامل

Identification of MIR-Flickr Near-duplicate Images - A Benchmark Collection for Near-duplicate Detection

There are many contexts where the automated detection of near-duplicate images is important, for example the detection of copyright infringement or images of child abuse. There are many published methods for the detection of similar and near-duplicate images; however it is still uncommon for methods to be objectively compared with each other, probably because of a lack of any good framework in ...

متن کامل

A Survey of Duplicate And Near Duplicate Techniques

--World Wide Web consists of more than 50 billion pages online. The advent of the World Wide Web caused a dramatic increase in the usage of the Internet. The World Wide Web is a broadcast medium where a wide range of information can be obtained at a low cost. A great deal of the Web is replicate or nearreplicate content. Documents may be served in different formats: HTML, PDF, and Text for diff...

متن کامل

Near-duplicate detection for eRulemaking

U.S. regulatory agencies are required to solicit, consider, and respond to public comments before issuing regulations. In recent years, agencies have begun to accept comments via both email and Web forms. The transition from paper to electronic comments makes it much easier for individuals to customize “form” letters, which they do, creating “near-duplicate” comments that express the same viewp...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Applications

سال: 2017

ISSN: 0975-8887

DOI: 10.5120/ijca2017913526