Using Wikipedia for Hierarchical Finer Categorization

نویسنده

  • Aasish Pappu
چکیده

Wikipedia is one of the largest growing structured resources on the Web and can be used as a training corpus in natural language processing applications. In this work, we present a method to categorize named entities under the hierarchical fine-grained categories provided by the Wikipedia taxonomy. Such a categorization can be further used to extract semantic relations among these named entities. More specifically, we examine instances of different kinds of Named Entities picked from Wikipedia articles categorized under 55 categories. We employ a Maximum Entropy based method to perform supervised learning that learns from local context of a named entity as well as a higher-level context such as hypernyms/hyponyms from Wikipedia and WordNet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Categorization of Wikipedia Articles with Spectral Clustering

The article reports application of clustering algorithms for creating hierarchical groups within Wikipedia articles. We evaluate three spectral clustering algorithms based on datasets constructed with usage of Wikipedia categories. Selected algorithm has been implemented in the system that categorize Wikipedia search results in the fly.

متن کامل

Knowledge categorization affects popularity and quality of Wikipedia articles

The existence of a shared classification system is essential to knowledge production, transfer, and sharing. Studies of knowledge classification, however, rarely consider the fact that knowledge categories exist within hierarchical information systems designed to facilitate knowledge search and discovery. This neglect is problematic whenever information about categorical membership is itself us...

متن کامل

Coarse to Fine: Diffusing Categories in Wikipedia

Automatic taxonomy construction aims to build a categorization system without human efforts. Traditional textual pattern based methods extract hyponymy relation in raw texts. However, these methods usually yield low precision and recall. In this paper, we propose a method to automatically find diffusing attributes to a category from Wikipedia infoboxes. We use the diffusing attribute to diffuse...

متن کامل

Self-Organizing Map Representation for Clustering Wikipedia Search Results

The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality...

متن کامل

Hierarchical Categorization of Open Source Software by Online Profiles

The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been cond...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009