Metadata for Name Disambiguation and Collocation
نویسنده
چکیده
Searching names of persons, families, and organizations is often difficult in online databases because different persons or organizations frequently share the same name and because a single person’s or organization’s name may appear in different forms in various online documents. Databases and search engines can use metadata as a tool to solve the problem of name ambiguity and name variation in online databases. This article describes the challenges names pose in information retrieval and some emerging name metadata databases that can help ameliorate the problems. Effective name disambiguation and collocation increase search precision and recall and can improve assessment of scholarly work.
منابع مشابه
Lexical Concept Acquisition From Collocation Map
This paper introduces an algorithm for automatically acquiring the conceptual structure of each word from corpus. The concept of a word is defined within the probabilistic framework. A variation of Belief Net named as Collocation Map is used to compute the probabilities. The Belief Net captures the conditional independences of words, which is obtained from the cooccurrence relations. The comput...
متن کامل"I Cannot Tell What the Dickens His Name Is": Name Disambiguation in Institutional Repositories
INTRODUCTION Authors who publish under more than one form of their name, multiple authors with the same name, and incomplete author information can all create challenges for repository staff when entering metadata. Unless properly addressed, these variations and duplications can result in search and retrieval errors for users. Name disambiguation, the process of identifying, merging, and making...
متن کاملNew Techniques for Disambiguation in Natural Language and Their Application to Biological Text
We study the problems of disambiguation in natural language, focusing on the problem of gene vs. protein name disambiguation in biological text and also considering the problem of contextsensitive spelling error correction. We introduce a new family of classifiers based on ordering and weighting the feature vectors obtained from word counts and word co-occurrence in the text, and inspect severa...
متن کاملOne Sense per Collocation and Genre/Topic Variations
This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre a...
متن کاملA Network Analysis Model for Disambiguation of Names in Lists
In research and application, social networks are increasingly extracted from relationships inferred by name collocations in text-based documents. Despite the fact that names represent real entities, names are not unique identifiers and it is often unclear when two name observations correspond to the same underlying entity. One confounder stems from ambiguity, in which the same name correctly re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Future Internet
دوره 2 شماره
صفحات -
تاریخ انتشار 2010