C3D+P: A summarization method for interactive entity resolution
نویسندگان
چکیده
Entity resolution is a fundamental task in data integration. Recent studies of this problem, including active learning, crowdsourcing, and pay-as-you-go approaches, have started to involve human users in the loop to carry out interactive entity resolution tasks, namely to invite human users to judge whether two entity descriptions refer to the same real-world entity. This process of judgment requires tool support, particularly when entity descriptions contain a large number of features (i.e. property-value pairs). To facilitate judgment, in this article, we propose to select, from entity descriptions, a subset of critical features as a summary to be shown and judged by human users. Features preferred to be selected are those that reflect the most commonalities shared by and the most conflicts between the two entities, and that carry the largest amount of characteristic and diverse information about them. Selected features are then grouped and ordered to improve readability and further speed up judgment. Experimental results demonstrate that summaries generated by our method help users judge more efficiently (3.57–3.78 times faster) than entire entity descriptions, without significantly hurting the accuracy of judgment. The accuracy achieved by our method is also higher than those achieved by existing summarization methods.
منابع مشابه
Corpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملAn Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملThe Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution
This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...
متن کاملSummarization of Broadcast News Video through Link Analysis of Named Entities
This paper describes the use of connections between named entities for summarization of broadcast news. We first extract named entities from a transcript of a news story, and find related entities nearby. In the context of a query, a link graph of relevant entities is rendered in an interactive display, allowing the user to manipulate, browse and examine the components, including the ability to...
متن کاملGraph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Web Sem.
دوره 35 شماره
صفحات -
تاریخ انتشار 2015