Measuring Semantic Relatedness using Wikipedia Signed Network
نویسندگان
چکیده
Identifying the semantic relatedness of two words is an important task for the information retrieval, natural language processing, and text mining. However, due to the diversity of meaning for a word, the semantic relatedness of two words is still hard to precisely evaluate under the limited corpora. Nowadays, Wikipedia is now a huge and wiki-based encyclopedia on the internet that has become a valuable resource for research work. Wikipedia articles, written by a live collaboration of user editors, contain a high volume of reference links, URL identification for concepts and a complete revision history. Moreover, each Wikipedia article represents an individual concept that simultaneously contains other concepts that are hyperlinks of other articles embedded in its content. Through this, we believe that the semantic relatedness between two words can be found through the semantic relatedness between two Wikipedia articles. Therefore, we propose an Editor-Contribution-based Rank (ECR) algorithm for ranking the concepts in the article’s content through all revisions and take the ranked concepts as a vector representing the article. We classify four types of relationship in which the behavior of addition and deletion maps appropriate and inappropriate concepts. ECR also extend the concept semantics by the editor-concept network. ECR ranks those concepts depending on the mutual signed-reinforcement relationship between the concepts and the editors. The results reveal that our method leads to prominent performance improvement and increases the correlation coefficient by a factor ranging from 4% to 23% over previous methods that calculate the relatedness between two articles.
منابع مشابه
A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia
Determining semantic relatedness between words or concepts is a fundamental process to many Natural Language Processing applications. Approaches for this task typically make use of knowledge resources such as WordNet and Wikipedia. However, these approaches only make use of limited number of features extracted from these resources, without investigating the usefulness of combining various diffe...
متن کاملAn API for Measuring the Relatedness of Words in Wikipedia
We present an API for computing the semantic relatedness of words in Wikipedia.
متن کاملMeasuring of Semantic Relatedness between Words based on Wikipedia Links
A novel technique of semantic relatedness measurement between words based on link structure of Wikipedia was provided. Only Wikipedia’s link information was used in this method, which avoid researchers from burdensome text processing. During the process of relatedness computation, the positive effects of two-directional Wikipedia’s links and four link types are taken into account. Using a widel...
متن کاملKnowledge Derived From Wikipedia For Computing Semantic Relatedness
Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts,...
متن کاملBuilding Semantic Networks from Plain Text and Wikipedia with Application to Semantic Relatedness and Noun Compound Paraphrasing
The construction of suitable and scalable representations of semantic knowledge is a core challenge in Semantic Computing. Manually created resources such as WordNet have been shown to be useful for many AI and NLP tasks, but they are inherently restricted in their coverage and scalability. In addition, they have been challenged by simple distributional models on very large corpora, questioning...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 29 شماره
صفحات -
تاریخ انتشار 2013