text similarity

The Evaluation of Sentence Similarity Measures

2008

Palakorn Achananuparp Xiaohua Hu Xiajiong Shen

The ability to accurately judge the similarity between natural language sentences is critical to the performance of several applications such as text mining, question answering, and text summarization. Given two sentences, an effective similarity measure should be able to determine whether the sentences are semantically equivalent or not, taking into account the variability of natural language ...

متن کامل

A technical study and analysis on fuzzy similarity based models for text classification

Journal: :CoRR 2012

Shalini Puri Sona Kaushik

In this new and current era of technology, advancements and techniques, efficient and effective text document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Fuzzy similarity provides a way to find the similarity of features among various documents. In this paper, a technical review on various fuzzy simil...

متن کامل

Comparative Study of Verse Similarity for Multi-lingual Representations of the Qur’an

2016

A. Basharat

Text similarity is a subject that has received great attention in recent years. However, the application of text similarity tools to Semitic languages such as Arabic faces unique challenges. Moreover, the increasing number of texts being made available online, not only in native languages but also in translation, adds further challenge to identifying similar portions of texts across different d...

متن کامل

Textual Entailment as a Directional Relation

Journal: :Journal of Research and Practice in Information Technology 2009

Doina Tatar Gabriela Serban Czibula Andreea Diana Mihis Rada Mihalcea

This paper presents three methods for solving the problem of textual entailment, obtained from an equal number of text-to-text similarity metrics. The first method starts with the directional measure of text-to-text similarity presented in Corley and Mihalcea (2005), and integrates word sense disambiguation and several heuristics. The second method exploits the relations between the cosine dire...

متن کامل

Tibetan-Chinese Cross Language Text Similarity Calculation Based on LDA Topic Model

2015

Sun Yuan Zhao Qian Pablo Gamallo Otero

Topic model building is the basis and the most critical module of cross-language topic detection and tracking. Topic model also can be applied to cross-language text similarity calculation. It can improve the efficiency and the speed of calculation by reducing the texts’ dimensionality. In this paper, we use the LDA model in cross-language text similarity computation to obtain Tibetan-Chinese c...

متن کامل

Learning bilingual word embeddings with (almost) no bilingual data

2017

Mikel Artetxe Gorka Labaka Eneko Agirre

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs. This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead. In this work, we further reduce the need of bilingual resources using a very simple sel...

متن کامل

SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation

2016

Eneko Agirre Carmen Banea Daniel M. Cer Mona T. Diab Aitor Gonzalez-Agirre Rada Mihalcea German Rigau Janyce Wiebe

Semantic Textual Similarity (STS) seeks to measure the degree of semantic equivalence between two snippets of text. Similarity is expressed on an ordinal scale that spans from semantic equivalence to complete unrelatedness. Intermediate values capture specifically defined levels of partial similarity. While prior evaluations constrained themselves to just monolingual snippets of text, the 2016 ...

متن کامل

Learning Text Pair Similarity with Context-sensitive Autoencoders

2016

Hadi Amiri Philip Resnik Jordan L. Boyd-Graber Hal Daumé

We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks input...

متن کامل

A Web-based Kernel Function for Matching Short Text Snippets

2005

Mehran Sahami Tim Heilman

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in common between two short text snippets. We address this problem by introducing a novel method for measuring the similarity between short text snippets (even those without any overlapping terms) by levera...

متن کامل

Correlation Coefficient Based Average Textual Similarity Model for Information Retrieval System in Wide Area Networks

2015

Jaswinder Singh Parvinder Singh Yogesh Chaba

In wide area networks, retrieving the relevant text is a challenging task for information retrieval because most of the information requests are text based. The focus of paper is on the similarity measurement, performance evaluation and design of information retrieval techniques using the four similarity functions i.e. Jaccard, Cosine, Dice and Overlap. The performance evaluation of these simil...

متن کامل