feature weighting

Cleaning for Web Mining through Feature Weighting

2003

Lan Yi

Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, and copyright notices. Such irrelevant information (which we call Web page noise) in Web pages can seriously harm Web mining, e.g., clustering and classification. In this paper, we propose a novel feature weighting tec...

متن کامل

A dynamic approach to feature weighting

2002

B. Arslan F. Ricci N. Mirzadeh A. Venturini

The main objective of a personalized recommender system is to filter and present (recommend) to the user the most appropriate items according to his preferences. In many Case Based Recommendation systems, this goal is achieved by using weighted similarity measures. Thus, weighting the features, i.e. describing the items to be recommended, is a key issue in such systems. In this paper, we propos...

متن کامل

Dimensions For Distinguishing Feature Weighting MethodsDimension Possible

1995

Dietrich Wettschereck David W. Aha

Many case-based reasoning algorithms retrieve cases using a derivative of the k-nearest neighbor (k-NN) classiier, whose similarity function is sensitive to irrelevant, interacting, and noisy features. Many proposed methods for reducing this sensitivity parameterize k-NN's similarity function with feature weights. We focus on methods that automatically assign weight settings using little or no ...

متن کامل

A B . dynamic approach to feature weighting

2003

Arslan F. Ricci N Mirzadeh A. Venturini

The main objective of a personalized recommender system is to filter and present (recommend) to the user the most appropriate items according to his preferences. In many Case Based Recommendation systems, thk goal is achieved by using weighted similarity measures. Thus, weighting the features, i.e. describing the items to be recommended, is a key issue in such systems. In this paper, we propose...

متن کامل

The Utility of Feature Weighting inNearest -

1997

Ron Kohavi Yeogirl Yun

Nearest-neighbor algorithms are known to depend heavily on their distance metric. In this paper, we investigate the use of a weighted Euclidean metric in which the weight for each feature comes from a small set of options. We describe Diet, an algorithm that directs search through a space of discrete weights using cross-validation error as its evaluation function. Although a large set of possib...

متن کامل

Feature Weighting for Lazy Learning Algorithms

1998

David W. Aha

Learning algorithms diier in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide decision making. The performance of many lazy learners signiicantly degrades when samples are deened ...

متن کامل

The Utility of Feature Weighting inNearest - Neighbor

2011

Ron Kohavi Yeogirl Yun

Nearest-neighbor algorithms are known to depend heavily on their distance metric. In this paper, we investigate the use of a weighted Euclidean metric in which the weight for each feature comes from a small set of options. We describe Diet, an algorithm that directs search through a space of discrete weights using cross-validation error as its evaluation function. Although a large set of possib...

متن کامل

Feature Weighting Strategies in Sentiment Analysis

2012

Olena Kummer Jacques Savoy

In this paper we propose an adaptation of the KullbackLeibler divergence score for the task of sentiment and opinion classification on a sentence level. We propose to use the obtained score with the SVM model using different thresholds for pruning the feature set. We argue that the pruning of the feature set for the task of sentiment analysis (SA) may be detrimental to classifiers performance o...

متن کامل

Tweets Language Identification using Feature Weighting

2014

Juglar Díaz Zamora Adrian Fonseca Bruzón Reynier Ortega Bueno

This paper describes the language identification method presented in Twitter Language Identification Workshop (TweetLID-2014). The proposed method represents tweets by weighted character-level trigrams. We employed three different weighting schemes used in Text Categorization to obtain a numerical value that represents the relation between trigrams and languages. For each language, we add up th...

متن کامل

A New Algorithm for Term Weighting in Text Summarization Process

2006

Reza Zaefarian Jawed Siddiqi Babak Akhgar Ghasem Zaefarian

The importance of good weighting methodology in information retrieval methods – the method that affects the most useful features of a document or query representative is examined. Good weighting methodologies are supposed to be more important than the feature selection process. Weighting features is the thing that many information retrieval systems are regarding as being of minor importance as ...

متن کامل