نتایج جستجو برای: term frequency and inverse document frequency tf idf

تعداد نتایج: 16977020  

Journal: :JASIST 2006
David J. Newman Sharon Block

vector space model for text data (Salton & McGill, 1983). In this model, each document in a corpus is represented by a term-frequency vector whose elements are the number of occurrences of each word in the vocabulary. Collectively, the set of these term-frequency vectors forms the document– word matrix representation of the corpus. All the methods we consider have this document–word matrix repr...

Journal: :International Journal of Advanced Computer Science and Applications 2020

Journal: :Symmetry 2022

Text classification is a major task of NLP (Natural Language Processing) and has been the focus attention for years. News as branch text characterized by complex structure, large amounts information long length, which in turn leads to decrease accuracy classification. To improve Chinese news texts, we present model based on multi-level semantic features. First, add category correlation coeffici...

1995
Kenneth Church William Gale

Low frequency words tend to be rich in content, and vice versa. But not all equally frequent words are equally mean!ngful. We will use inverse document frequency (IDF), a quantity borrowed from Information Retrieval, to distinguish words like somewhat and boycott. Both somewhat and boycott appeared approximately 1000 times in a corpus of 1989 Associated Press articles, but boycott is a better k...

Journal: :Urban rail transit 2023

Abstract With the promotion of major strategy national transportation power, super-large-scale metro operation network has become an inevitable trend, and safety increasingly prominent. Metro dispatch logs accident reports were taken as research object, hazard sources efficiently accurately identified, risk chains mined, evolution mechanism was revealed. Firstly, lexicon constructed to improve ...

Journal: :CoRR 2015
S. Amarappa S. V. Sathyanarayana

Named Entity Recognition and Classification (NERC) is a process of identification of proper nouns in the text and classification of those nouns into certain predefined categories like person name, location, organization, date, and time etc. NERC in Kannada is an essential and challenging task. The aim of this work is to develop a novel model for NERC, based on Multinomial Naïve Bayes (MNB) Clas...

2014
Suraj Maharjan Prasha Shrestha Thamar Solorio

Author profiling, being an important problem in forensics, security, marketing, and literary research, needs to be accurate. With massive amounts of online text readily available on which we might need to perform author profiling, building a fast system is as important as building an accurate system, but this can be challenging. However, the use of distributive computing techniques like MapRedu...

2009
Aviv Segev Quan Z. Sheng

Ontologies have become the de-facto modeling tool of choice, employed in a variety of applications and prominently in the Semantic Web. Nevertheless, ontology construction remains a daunting task. Ontological bootstrapping, which aims at automatically generating concepts and their relations in a given domain, is a promising technique for ontology construction. Bootstrapping an ontology based on...

2015
Octavia-Maria Sulea Daniel Dichiu

In this paper we go through our approach at solving the PAN Author Profiling task. We introduce a novel way of computing the type/token ratio of an author and show that, although strong correlations have been observed between high extroversion and low type/token ratios in the past, this ratio is not necessarily a strong indicator of extroversion. Since the text of a person is influenced by all ...

2015
A. Muthusamy

The survey paper explains about the extraction and retrieval of personal name alias using various techniques from the web with the help of web crawls. The existing methods help to improve the depth of knowledge relevant to alias extraction and retrieval process. It also describes about how the aliases are ranked, then page counts on the web, word co-occurrence using anchor text and techniques l...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید