Metadata for Name Disambiguation and Collocation

نویسنده

  • Jeffrey Beall
چکیده

Searching names of persons, families, and organizations is often difficult in online databases because different persons or organizations frequently share the same name and because a single person’s or organization’s name may appear in different forms in various online documents. Databases and search engines can use metadata as a tool to solve the problem of name ambiguity and name variation in online databases. This article describes the challenges names pose in information retrieval and some emerging name metadata databases that can help ameliorate the problems. Effective name disambiguation and collocation increase search precision and recall and can improve assessment of scholarly work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Concept Acquisition From Collocation Map

This paper introduces an algorithm for automatically acquiring the conceptual structure of each word from corpus. The concept of a word is defined within the probabilistic framework. A variation of Belief Net named as Collocation Map is used to compute the probabilities. The Belief Net captures the conditional independences of words, which is obtained from the cooccurrence relations. The comput...

متن کامل

"I Cannot Tell What the Dickens His Name Is": Name Disambiguation in Institutional Repositories

INTRODUCTION Authors who publish under more than one form of their name, multiple authors with the same name, and incomplete author information can all create challenges for repository staff when entering metadata. Unless properly addressed, these variations and duplications can result in search and retrieval errors for users. Name disambiguation, the process of identifying, merging, and making...

متن کامل

New Techniques for Disambiguation in Natural Language and Their Application to Biological Text

We study the problems of disambiguation in natural language, focusing on the problem of gene vs. protein name disambiguation in biological text and also considering the problem of contextsensitive spelling error correction. We introduce a new family of classifiers based on ordering and weighting the feature vectors obtained from word counts and word co-occurrence in the text, and inspect severa...

متن کامل

One Sense per Collocation and Genre/Topic Variations

This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre a...

متن کامل

A Network Analysis Model for Disambiguation of Names in Lists

In research and application, social networks are increasingly extracted from relationships inferred by name collocations in text-based documents. Despite the fact that names represent real entities, names are not unique identifiers and it is often unclear when two name observations correspond to the same underlying entity. One confounder stems from ambiguity, in which the same name correctly re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Future Internet

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2010