cosine similarity measure

Using Parallel Corpora to enrich Multilingual Lexical Resources

2002

Dominic Widdows Beate Dorow Chiu-Ki Chan

This paper describes the use of a bilingual vector model for the automatic discovery of German translations of English terms. The model is built by analysing co-occurence patterns in a parallel corpus of English and German medical abstracts, a method also used for CrossLingual Information Retrieval. The model generates candidate German translations of English words using the cosine similarity m...

متن کامل

DAMSEL: The DSTO/Macquarie System for Entity-Linking

2009

Matthew Honnibal Robert Dale

This paper describes the DSTO/Macquarie University System for Entity Linking (DAMSEL), which competed in the 2009 Text Acquisition Conference Knowledge Base Population task. The system achieves 73.5% accuracy. For a given named entity mention, the system selects a set of candidate entities from the knowledge base and selects the most likely candidate based on the similarity between the document...

متن کامل

Neutrosophic Sets and Systems, Vol. 13, 2016

2017

Kalyan Mondal Surapati Pramanik Florentin Smarandache

The purpose of this study is to propose new similarity measures namely rough variational coefficient similarity measure under the rough neutrosophic environment. The weighted rough variational coefficient similarity measure has been also defined. The weighted rough variational coefficient similarity measures between the rough ideal alternative and each alternative are xxxxx calculated to find t...

متن کامل

Distributional Similarity of Words with Different Frequencies

2013

Christian Wartena

Distributional semantics tries to characterize the meaning of words by the contexts in which they occur. Similarity of words hence can be derived from the similarity of contexts. Contexts of a word are usually vectors of words appearing near to that word in a corpus. It was observed in previous research that similarity measures for the context vectors of two words depend on the frequency of the...

متن کامل

Improving Search and Exploration in Tag Spaces Using Automated Tag Clustering

Journal: :J. Web Eng. 2014

Joni Radelaar Aart-Jan Boor Damir Vandic Jan-Willem van Dam Flavius Frasincar

In recent years we have experienced an increase in the usage of tags to describe resources. However, the free nature of tagging presents some challenges regarding the search and exploration of tag spaces. In order to deal with these challenges we propose the Semantic Tag Clustering Search (STCS) framework. The framework first groups syntactic variations using several measures based on the Leven...

متن کامل

Exploring Vector Spaces for Semantic Relations

2017

Kata Gábor Haïfa Zargayouna Isabelle Tellier Davide Buscaldi Thierry Charnois

Word embeddings are used with success for a variety of tasks involving lexical semantic similarities between individual words. Using unsupervised methods and just cosine similarity, encouraging results were obtained for analogical similarities. In this paper, we explore the potential of pre-trained word embeddings to identify generic types of semantic relations in an unsupervised experiment. We...

متن کامل

Comparison Clustering using Cosine and Fuzzy set based Similarity Measures of Text Documents

Journal: :CoRR 2015

Manan Mohan Goyal Neha Agrawal Manoj Kumar Sarma Nayan Jyoti Kalita

Keeping in consideration the high demand for clustering, this paper focuses on understanding and implementing K-means clustering using two different similarity measures. We have tried to cluster the documents using two different measures rather than clustering it with Euclidean distance. Also a comparison is drawn based on accuracy of clustering between fuzzy and cosine similarity measure. The ...

متن کامل

Yet Another Matcher

2017

Fabien Duchateau Remi Coletta Zohra Bellahsene Renée Miller

Discovering correspondences between schema elements is a crucial task for data integration. Most matching tools are semi-automatic, e.g. an expert must tune some parameters (thresholds, weights, etc.). They mainly use several methods to combine and aggregate similarity measures. However, their quality results often decrease when one requires to integrate a new similarity measure or when matchin...

متن کامل

A Similarity Measure for Collaborative Filtering with Implicit Feedback

2007

Tong-Queue Lee Young Park Yong-Tae Park

Collaborative Filtering(CF) is a widely accepted method of creating recommender systems. CF is based on the similarities among users or items. Measures of similarity including the Pearson Correlation Coefficient and the Cosine Similarity work quite well for explicit ratings, but do not capture real similarity from the ratings derived from implicit feedback. This paper identifies some problems t...

متن کامل

Leveraging word embeddings for spoken document summarization

2015

Kuan-Yu Chen Shih-Hung Liu Hsin-Min Wang Berlin Chen Hsin-Hsi Chen

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express the most important theme of the document, has been an active area of research and experimentation. On the other hand, word embedding has emerged as a newly fa...

متن کامل