similarity classifier

Experiencing the Shotgun Distance for Time Series Analysis

Journal: :Trans. MLDM 2014

Patrick Schäfer

Similarity search is a core functionality in many data mining algorithms. Over the past decade algorithms were designed to mostly work with human assistance to extract characteristic, aligned patterns of equal length and scaling. We propose the shotgun distance similarity measure that extracts, scales, and aligns segments from a query to a sample time series. This greatly simplifies the time se...

متن کامل

Baseline Results for the CLEF 2008 Medical Automatic Annotation Task

2008

Mark Oliver Güld Thomas Martin Deserno

This work reports baseline results for the CLEF 2008 Medical Automatic Annotation Task (MAAT) by applying a classifier with a fixed parameter set to all tasks 2005 – 2008. The classifier performs a weighted combination of three distance and similarity measures operating on global image features: Scaled-down representations of the images are compared via metrics that model the typical variabilit...

متن کامل

Automatic detection of plagiarized spoken responses

2014

Keelan Evanini Xinhao Wang

This paper addresses the task of automatically detecting plagiarized responses in the context of a test of spoken English proficiency for non-native speakers. A corpus of spoken responses containing plagiarized content was collected from a high-stakes assessment of English proficiency for non-native speakers, and several text-to-text similarity metrics were implemented to compare these response...

متن کامل

Using kNN Model-based Approach for Automatic Text Categorization

2003

Gongde Guo Hui Wang David Bell Yaxin Bi Kieran Greer

An investigation has been conducted on two well known similarity-based learning approaches to text categorization: the k-nearest neighbor (k-NN) classifier and the Rocchio classifier. After identifying the weakness and strength of each technique, a new classifier called the kNN model-based classifier (kNNModel) has been proposed. It combines the strength of both k-NN and Rocchio. A text categor...

متن کامل

IBEnt: Chemical Entity Mentions in Patents using ChEBI

2017

Andre Lamurias Luis F. Campos Francisco M. Couto

This article presents our approach to the CEMP task of BioCreative V.5, which consisted in using our system, IBEnt, to identify chemical entity mentions in patents through machine learning and semantic similarity techniques. The features used combine the results of a CRF classifier, two lexical matching methods (FiGO and MER) and semantic similarity measures on ChEBI ontology. We also tested th...

متن کامل

Identifying Quora question pairs having the same intent

2017

Shashi Shankar Aniket Shenoy

This paper presents a system which uses a combination of multiple text similarity measures of varying complexities to classify Quora question pairs as duplicate or different. The solution uses a support vector classifier model trained using the precomputed features ranging from longest common sub-string and sub sequences to word similarity based on lexical and semantic resources. The scope of t...

متن کامل

A Comparative Study of Centroid-Based, Neighborhood-Based and Statistical Approaches for Effective Document Categorization

2002

Vincent Tam Ardi Santoso Rudy Setiono

Associating documents to relevant categories is critical for effective document retrieval. Here, we compare the well-known k-Nearest Neighborhood (kNN) algorithm, the centroid-based classifier and the Highest Average Similarity over Retrieved Documents (HASRD) algorithm, for effective document categorization. We use various measures such as the micro and macro F1 values to evaluate their perfor...

متن کامل

Predicting the Operon Structure of Bacillus subtilis Using Operon Length, Intergene Distance, and Gene Expression Information

Journal: :Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 2004

Michiel J. L. de Hoon Seiya Imoto Kazuo Kobayashi Naotake Ogasawara Satoru Miyano

We predict the operon structure of the Bacillus subtilis genome using the average operon length, the distance between genes in base pairs, and the similarity in gene expression measured in time course and gene disruptant experiments. By expressing the operon prediction for each method as a Bayesian probability, we are able to combine the four prediction methods into a Bayesian classifier in a s...

متن کامل

Relevancy contemplation in medical data analytics and ranking of feature selection algorithms

Journal: :Etri Journal 2022

Abstract This article performs a detailed data scrutiny on chronic kidney disease (CKD) dataset to select efficient instances and relevant features. Data relevancy is investigated using feature extraction, hybrid outlier detection, handling of missing values. that do not influence the target are removed envelopment analysis enable reduction rows. Column achieved by ranking attributes through se...

متن کامل

Supplement to pursuit tracks chase

2015

Birgit Träuble

We use a support vector machine (SVM) to obtain a classifier that best discriminates ES and CS1. SVM requires a similarity matrix S as input. This matrix describes the pair-wise distance between all samples. In our case, the samples are the ES and CS1 targets xn. We compute similarity euclidean distance between two samples S(xm,xn) = ∑ f ∑i ∑ j(xm( f , i, j)− xn( f , i, j))2 where x( f , i, j) ...

متن کامل