Probabilistic Ranking of Database Query Results
نویسندگان
چکیده
• The first step is to divide the attributes in a table into specified (the ones specified by the query) and the unspecified attributes (the ones not specified by the query). • Every tuple in the database is considered a document and correlation between different attributes in the tuple are found. Correlations are often ignored in high dimensional and sparsely populated data spaces in IR but there are strong correlations between the attribute values in relational data spaces. • The authors make a limited independence assumption the specified (and unspecified) attributes within them are assumed to be independent but correlations between specified and unspecified attributes are allowed. • Two kinds of scores are used to rank the documents. A global score which captures the global score of unspecified attributes and a conditional score that captures the strength of correlations between specified and unspecified attribute values. • In the preprocessing phase of the computation two lists global list and the conditional list containing the global and conditional scores of the attributes for tuples are calculated. These are stored as auxiliary tables in the database. • Instead of pre-computing the Top K results for all possible queries a ranked list of tuples for all atomic queries is calculated. Threshold Algorithm, a well known Top K algorithm is adapted to rank the queries using the computed scores at query time. • The efficient adaptation of the algorithm is due to the limited independence assumption that is novel to this paper • The processing involves two modules: (i) The index module (the preprocessing step), where the global and the conditional lists are constructed, and (ii) the List Merge Algorithm that is used to merge the lists associated with the attributes at query time. • Extensive experiments are carried out on the internet movie database and the MSN home advisor database. • The results are compared to the results from a rival query ranking method.
منابع مشابه
A Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing
Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...
متن کاملProbabilistic Ranking of Database Query Results
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present result...
متن کاملApproximate Lifted Inference in Probabilistic Databases
This paper proposes a new approach for approximate evaluation of #P-hard queries over probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enume...
متن کاملApproximate Lifted Inference with Probabilistic Databases
This paper proposes a new approach for approximate evaluation of #P-hard queries with probabilistic databases. In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enume...
متن کاملA Probabilistic Framework for Vague Queries and Imprecise Information in Databases
A probabilistic learning model for vague queries and missing or imprecise information in databases is described. Instead of retrieving only a set of answers, our approach yields a ranking of objects from the database in response to a query. By using relevance judgements from the user about the objects retrieved, the ranking for the actual query as well as the overall retrieval quality of the sy...
متن کاملBuilding Ranked Mashups of Unstructured Sources with Uncertain Information
Mashups are situational applications that join multiple sources to better meet the information needs of Web users. Web sources can be huge databases behind query interfaces, which triggers the need of ranking mashup results based on some user preferences. We present MashRank, a mashup authoring and processing system building on concepts from rank-aware processing, probabilistic databases, and i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004