web documents

Clustering for Ontology Evolution

2005

George Tsatsaronis Reetta Pitkänen Michalis Vazirgiannis

The Semantic Web initiative aims at automating semantics’ embedding in Web pages so that richer information retrieval, data integration and improved navigation can be supported. Domain ontologies are used in this direction, providing a way to semantically characterizing Web documents if a mapping of the documents to the ontology concepts can be managed. Given such an ontology, the main problem ...

متن کامل

Focused Crawling: A Means to Acquire Biological Data from the Web

2007

Ari Pirkola

Experience paper. World Wide Web contains billions of publicly available documents (pages) and it grows and changes rapidly. Web search engines, such as Google and Altavista, provide access to indexable Web documents. An important part of a search engine is a Web crawler whose function is to collect Web pages for the search engine. Due to the Web’s immense size and dynamic nature no crawler is ...

متن کامل

Classification of Web Documents Using Concept Extraction from Ontologies

2007

Marina Litvak Mark Last Slava Kisilevich

In this paper, we deal with the problem of analyzing and classifying web documents in a given domain by information filtering agents. We present the ontology-based web content mining methodology that contains such main stages as creation of ontology for the specified domain, collecting a training set of labeled documents, building a classification model in this domain using the constructed onto...

متن کامل

Semantic-Sensitive Web Information Retrieval Model for HTML Documents

Journal: :CoRR 2012

Youssef Bassil Paul Semaan

With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information Retrieval (IR) is the science and practice of storing documents and retrieving information from within these documents. Mathematically, IR systems are at the...

متن کامل

An Implementation of Web Personalization Using Web Mining Techniques

2013

V. Shanmuga Priya S. Sakthivel

Abstract— Web mining is a class of data mining. In order to relieve a “Data Rich but Information Poor” dilemma, Data Mining emerged. Web Mining is a variation of this field that distils untapped source of abundantly available free textual information. The importance of web mining is growing along with the massive volumes of data generated in web day-to-day life. In general, web data always arri...

متن کامل

structural investigation of websites’ ranking systems in view of information retrieval

Journal: :international journal of information science and management 0

j. mehrad ph.d. president of regional information center for science and technology a. shemrani m.s. , research department of system design and operations, regional information center for science and technology

for years, search engines have been considered as one of the most frequently used information seeking tools through the web. efficiency, ease of use and search quality are the main factors for giving precedence over search engines. search quality is assessed based on the concept of page rank which is applied to scoring web documents. the structures designed for linkage among the pages of a site...

متن کامل

Efficient Algorithm for Mining on Bio Medical Data for Ranking the Web Pages

2017

Information in the internet is evolving in terms of high volume through different sources. Extracting tuples from HTML pages has been an important issue in various web applications such as web data integration, e-commerce market monitoring, and mash ups that repurpose and selectively combine existing web data services. Data Mining is the process of analyzing data from different perspectives and...

متن کامل

Semi-Automatic Semantic Annotations for Web Documents

2005

Nadzeya Kiyavitskaya Nicola Zeni James R. Cordy Luisa Mich John Mylopoulos

Semantic annotation of the web documents is the only way to make the Semantic Web vision a reality. Considering the scale and dynamics of worldwide web, the largest knowledge base ever built, it becomes clear that we cannot afford to annotate web documents manually. In this work we propose a generic domain-independent architecture for semi-automatic semantic annotation, basing on the lightweigh...

متن کامل

A Survey of Duplicate And Near Duplicate Techniques

2014

Rahul Mahajan Rajeev Bedi

--World Wide Web consists of more than 50 billion pages online. The advent of the World Wide Web caused a dramatic increase in the usage of the Internet. The World Wide Web is a broadcast medium where a wide range of information can be obtained at a low cost. A great deal of the Web is replicate or nearreplicate content. Documents may be served in different formats: HTML, PDF, and Text for diff...

متن کامل

An Improved Approach to Ranking Web Documents

Journal: :JIPS 2013

Pooja Gupta Sandeep K. Singh Divakar Yadav A. K. Sharma

Ranking thousands of web documents so that they are matched in response to a user query is really a challenging task. For this purpose, search engines use different ranking mechanisms on apparently related resultant web documents to decide the order in which documents should be displayed. Existing ranking mechanisms decide on the order of a web page based on the amount and popularity of the lin...

متن کامل