Identifying and Indexing Near-Duplicate Images Using Optimizing Technique in Web Search
نویسندگان
چکیده
Today's World Wide Web is growing drastically and duplicates occur in many fields. Importantly duplicate images that are uploaded into internet like a food product, document image, medical images, textile fields etc. So it becomes very important to identify those duplicate images. Near duplicates can be similar copies or differ a little in their visual content. Duplicate images introduce many problems of redundancy and copyright infringement in large set of image collections. This paper proposes a methodology for identifying and indexing the near duplicate images on web and optimizing the results. First step is to get the search image from the user and enhance the search image and then Features are extracted from search image using SURF (Speeded up Robust Features) that is to extract the local invariant features of search image. After this calculate the similarity measured among the features extracted images using sim-hash algorithm and then indexing Near duplicate images based on user’s search image using Locality Sensitive Hashing (LSH). And finally optimizing the results using Particle swarm optimization (PSO).We demonstrate that our identifying and indexing approach is highly effective for collections of up to a few hundred thousand images.
منابع مشابه
A Comparing between the impacts of text based indexing and folksonomy on ranking of images search via Google search engine
Background and Aim: The purpose of this study was to compare the impact of text based indexing and folksonomy in image retrieval via Google search engine. Methods: This study used experimental method. The sample is 30 images extracted from the book “Gray anatomy”. The research was carried out in 4 stages; in the first stage, images were uploaded to an “Instagram” account so the images are tagge...
متن کاملDuplicate Web Pages Detection with the Support of 2d Table Approach
Duplicate and near duplicate web pages are stopping the process of search engine. As a consequence of duplicate and near duplicates, the common issue for the search engines is raising the indexed storage pages. This high storage memory will slow down the process which automatically increases the serving cost. Finally, the duplication will be raised while gathering the required data from the var...
متن کاملOptimization of Search Results with Duplicate Page Elimination using Usage Data
The performance and scalability of search engines are greatly affected by the presence of enormous amount of duplicate data on the World Wide Web. The flooded search results containing a large number of identical or near identical web pages affect the search efficiency and seek time of the users to find the desired information within the search results. When navigating through the results, the ...
متن کاملIdentifying and Filtering Near-Duplicate Documents
The mathematical concept of document resemblance captures well the informal notion of syntactic similarity. The resemblance can be estimated using a fixed size “sketch” for each document. For a large collection of documents (say hundreds of millions) the size of this sketch is of the order of a few hundred bytes per document. However, for efficient large scale web indexing it is not necessary t...
متن کاملExploring Web Attributes Related to Image Accessibility and their Impact on Search Engine Indexing
The purpose of this study is to analyze how search engines index web content inserted in image attributes for alternative/complementary texts that favor web page accessibility. The study discussed the importance of optimizing websites to improve search tool indexing and explored how these engines index image attributes. We conducted empirical observations of tests carried out in a controlled en...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016