Cohesiveness Relationships to Empower Keyword Search on Tree Data on the Web
نویسندگان
چکیده
Keyword search has been for several years the most popular technique for retrieving information over semistructured data on the web. The reason of this unprecedented success is well known and twofold: (1) the user does not need to master a complex query language to specify her requests for data, and (2) she does not need to have any knowledge of the structure of the data sources. However, these advantages come with two drawbacks: (1) as a result of the imprecision of keyword queries, there is usually a huge number of candidate results of which only very few match the user’ s intent. Unfortunately, the existing semantics are ad-hoc and they generally fail to“guess”the user intent. (2) As the number of keywords and the size of data grows the existing approaches do not scale satisfactorily. In this paper, we focus on keyword search on tree data and we introduce keyword queries which can express cohesiveness relationships. Intuitively, a cohesiveness relationship on keywords indicates that the instances of these keywords in a query result should form a cohesive whole, where instances of the other keywords do not interpolate. Cohesive keyword queries allow also keyword repetition and cohesiveness relationship nesting. Most importantly, despite their increased expressiveness, they enjoy both advantages of plain keyword search. We provide formal semantics for cohesive keyword queries on tree data which ranks query results on the proximity of the keyword instances. We design a stack based algorithm which builds a lattice of keyword partitions to efficiently compute keyword queries and further leverages cohesiveness relationships to significantly reduce the dimensionality of the lattice. We implemented our approach and ran extensive experiments to measure the effectiveness of keyword queries and the efficiency and scalability of our algorithm. Our results demonstrate that our approach outperforms previous filtering semantics and our algorithm scales smoothly achieving interactive response times on queries of 20 frequent keywords on large datasets.
منابع مشابه
Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملCohesive Keyword Search on Tree Data
Keyword search is the most popular querying technique on semistructured data. Keyword queries are simple and convenient. However, as a consequence of their imprecision, there is usually a huge number of candidate results of which only very few match the user’s intent. Unfortunately, the existing semantics for keyword queries are ad-hoc and they generally fail to “guess” the user intent. Therefo...
متن کاملAn Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملA Structure-Based Search Engine for Phylogenetic Databases
Phylogenetic trees are essential for understanding the relationships among organisms or taxa. Many of the current techniques for searching phylogenetic repositories allow the user to perform a keyword-type search or an aligned sequence data search, or to browse a hierarchical list of taxa. Here we describe a new search engine that allows the user to present an example phylogeny, or a query tree...
متن کاملFuzzy retrieval of encrypted data by multi-purpose data-structures
The growing amount of information that has arisen from emerging technologies has caused organizations to face challenges in maintaining and managing their information. Expanding hardware, human resources, outsourcing data management, and maintenance an external organization in the form of cloud storage services, are two common approaches to overcome these challenges; The first approach costs of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1508.04957 شماره
صفحات -
تاریخ انتشار 2015