text similarity

Kernels and Similarity Measures for Text Classification

2006

André T. Martins Mário A. T. Figueiredo Pedro M. Q. Aguiar

Measuring similarity between two strings is a fundamental step in text classification and other problems of information retrieval. Recently, kernel-based methods have been proposed for this task; since kernels are inner products in a feature space, they naturally induce similarity measures. Information theoretic (dis)similarities have also been the subject of recent research. This paper describ...

متن کامل

Text Classification by Aggregation of SVD Eigenvectors

2012

Panagiotis Symeonidis Ivaylo Kehayov Yannis Manolopoulos

Text classification is a process where documents are categorized usually by topic, place, readability easiness, etc. For text classification by topic, a well-known method is Singular Value Decomposition. For text classification by readability, “Flesh Reading Ease index” calculates the readability easiness level of a document (e.g. easy, medium, advanced). In this paper, we propose Singular Valu...

متن کامل

Short Answer Grading Using String Similarity And Corpus-Based Similarity

2012

Wael H. Gomaa Aly A. Fahmy

Most automatic scoring systems use pattern based that requires a lot of hard and tedious work. These systems work in a supervised manner where predefined patterns and scoring rules are generated. This paper presents a different unsupervised approach which deals with students’ answers holistically using text to text similarity. Different String-based and Corpus-based similarity measures were tes...

متن کامل

Semantic Cosine Similarity

2012

Faisal Rahutomo Teruaki Kitasuka Masayoshi Aritsugi

Cosine similarity is a widely implemented metric in information retrieval and related studies. This metric models a text as a vector of terms and the similarity between two texts is derived from cosine value between two texts' term vectors. Cosine similarity however still can't handle the semantic meaning of the text perfectly. This paper proposes an enhancement of cosine similarity measurement...

متن کامل

The Study and Review of Paraphrase Detection Techniques in Machine Learning

2017

Darshana S Bhole Sandip S. Patil

ABSTARCT: Paraphrase is a process of computing the semantic similarity between sentences, which are not lexicographically similar. Though a number of metrics for English language have been proposed in literature, to quantify textual similarity; it addresses the problem for detection of monolingual text-text lexical similarity. Existing system for Indian Language paraphrase detection uses lexica...

متن کامل

TextFlow: A Text Similarity Measure based on Continuous Sequences

2017

Yassine Mrabet Halil Kilicoglu Dina Demner-Fushman

Text similarity measures are used in multiple tasks such as plagiarism detection, information ranking and recognition of paraphrases and textual entailment. While recent advances in deep learning highlighted further the relevance of sequential models in natural language generation, existing similarity measures do not fully exploit the sequential nature of language. Examples of such similarity m...

متن کامل

Improving Semantic Similarity for Pairs of Short Biomedical Texts with Concept Definitions and Ontology Structure

2014

Olivia Sanchez Graillet

Finding semantic similarity between short biomedical texts, such as article abstracts or experiment descriptions, may provide important information for health researchers. This paper presents a method for calculating text similarity in the biomedical context. The method implements a pairwise concept semantic similarity measure that uses concept definitions and ontology structure. The respective...

متن کامل

Scalable Ordinal Embedding to Model Text Similarity

2017

Jesse Anderton

Practitioners of Machine Learning and related fields commonly seek out embeddings of object collections into some Euclidean space. These embeddings are useful for dimensionality reduction, for data visualization, as concrete representations of abstract notions of similarity for similarity search, or as features for some downstream learning task such as web search or sentiment analysis. A wide a...

متن کامل

Corpus-Based methods for Short Text Similarity

2011

Prajol Shrestha PRAJOL SHRESTHA

This paper presents corpus-based methods to find similarity between short text (sentences, paragraphs, ...) which has many applications in the field of NLP. Previous works on this problem have been based on supervised methods or have used external resources such as WordNet, British National Corpus etc. Our methods are focused on unsupervised corpus-based methods. We present a new method, based ...

متن کامل

Partial similarity of objects and text sequences

2007

Alexander M. Bronstein Michael M. Bronstein Alfred M. Bruckstein Ron Kimmel

Similarity is one of the most important abstract concepts in the human perception of the world. In computer vision, numerous applications deal with comparing objects observed in a scene with some a priori known patterns. Often, it happens that while two objects are not similar, they have large similar parts, that is, they are partially similar. Here, we present a novel approach to quantify this...

متن کامل