Identifying Poorly-Defined Concepts in WordNet with Graph Metrics

نویسندگان

  • John P. McCrae
  • Narumol Prangnawarat
چکیده

Princeton WordNet is the most widely-used lexical resource in natural language processing and continues to provide a gold standard model of semantics. However, there are still significant quality issues with the resource and these affect the performance of all NLP systems built on this resource. One major issue is that many nodes are insufficiently defined and new links need to be added to increase performance in NLP. We combine the use of graph-based metrics with measures of ambiguity in order to predict which synsets are difficult for word sense disambiguation, a major NLP task, which is dependent on good lexical information. We show that this method allows use to find poorly defined nodes with a 89.9% precision, which would assist manual annotators to focus on improving the most in-need parts of the WordNet graph.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Approximation Approach for Semantic Queries of Naïve Users by a New Query Language

This paper focuses on querying semi structured data such as RDF data, using a proposed query language for the non-expert user, in the context of a lack knowledge structure. This language is inspired from the semantic regular path queries. The problem appears when the user specifies concepts that are not in the structure, as approximation approaches, operations based on query modifications and c...

متن کامل

Voting Theory for Concept Detection

This paper explores the issue of detecting concepts for ontology learning from text. We investigate various metrics from graph theory and propose various voting schemes based on these metrics. The idea draws its root in social choice theory, and our objective is to mimic consensus in automatic learning methods and increase the confidence in concept extraction through the identification of the b...

متن کامل

Opportunistic Search with Semantic Fisheye Views EPFL Technical Report: IC/2004/42

Search goals are often too complex or poorly defined to be solved in a single query. While refining their search goals, users are likely to apply a variety of strategies, such as searching for more general or more specific concepts in reaction to the information and structures they encounter in the results. This is called opportunistic search. In this paper we describe how semantic fisheye view...

متن کامل

Lexical Chains on WordNet and Extensions

Lexical chains between two concepts are sequences of semantically related words interconnected via semantic relations. This paper presents a new approach for the automatic construction of lexical chains on knowledge bases. Experiments were performed building lexical chains on WordNet, Extended WordNet, and Extended WordNet Knowledge Base. The research addresses the problems of lexical chains ra...

متن کامل

Domain-Specific Knowledge Acquisition Using WordNet

This paper presents a method that acquires new concepts and connections associated with user-selected seed concepts, and adds them to the WordNet linguistic knowledge structure. New domain knowledge can be acquired around some seed concepts that a user considers important. The knowledge we seek to acquire relates to one or more of these concepts, and consists of new concepts not defined in Word...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016