نتایج جستجو برای: term frequency and inverse document frequency tf idf

تعداد نتایج: 16977020  

2007
Padmini Srinivasan Miguel E. Ruiz Wai Lam

We propose a model that assists in understanding indexing on the World Wide Web (WWW). This model speciies key feature of indexing strategies that are currently being used. We also present an experiment assessing the validity of Inverse Document Frequency (IDF) as a term weighting strategy for WWW documents. The experiment indicates that IDF scores are not stable in the heterogeneous and dynami...

Journal: :International Journal of Emerging Technologies in Learning (ijet) 2023

With the help of natural language processing and machine learning, we can analyze information online recruitment text posted by employer companies dig requirement features these employment positions, this is a meaningful work with practical value. However, existing methods for analyzing are too simple to withstand mass data on Internet, so paper aims study analysis forecast position requirement...

Journal: :Sustainability 2023

The occurrence of fatal traffic accidents often causes serious casualties and property losses, endangering travel safety. This work uses the statistical data road in Shenzhen from 2018 to 2022 as basis determine characteristic patterns main influencing factors accidents. accident description are also analyzed using analysis method based on Term Frequency-Inverse Document Frequency (TF-IDF) mini...

Journal: :International journal on information and communication technology 2022

Delivery of justice with the help artificial intelligence is a current research interest. Machine learning natural language processing (NLP) can classify types sexual harassment experiences into quid pro quo (QPQ) and hostile work environments (HWE). However, imbalanced data are often present in classes classification on specific datasets. Data imbalance cause decrease classifier's performance ...

Journal: :Sustainability 2021

With the development of Web2.0 and mobile Internet, urban residents, a new type “sensor”, provide us with massive amounts volunteered geographic information (VGI). Quantifying spatial patterns VGI plays an increasingly important role in understanding functions. Using social media activity data, this article developed method to automatically extract identify functional zones. The is put forward ...

2003
Taeho Jo

Documents are unstructured data consisting of natural language. Document surrogate means the structured data converted from original documents to process them in computer systems. Document surrogate is usually represented into a list of words. Because not all words in a document reflect its content, it is necessary to select imp ortant words related with its content among them. Such important w...

Journal: :Applied sciences 2021

The spread of the Coronavirus pandemic has been accompanied by an infodemic. false information that is embedded in infodemic affects people’s ability to have access safety and follow proper procedures mitigate risks. This research aims target falsehood part infodemic, which prominently proliferates news articles medical publications. Here, we present NeoNet, a novel supervised machine learning ...

2006
Huaigu Cao Faisal Farooq Venu Govindaraju

The tasks of indexing and retrieval are specifically challenging for the erroneous output of handwriting recognition (HR) systems. This paper proposes an approach of indexing and retrieving degraded documents with very low recognition rates. We present a modified version of the popular Vector Model in information retrieval (IR). Our model incorporates top n candidates from a HR system into the ...

Journal: :Journal of Documentation 2004
Stephen E. Robertson

The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory appr...

2005
Gabriel L. Somlo

OF DISSERTATION AGENTS FOR PERSONALIZED CLIENT-SIDE INFORMATION GATHERING FROM THE WEB We present the design, implementation, and evaluation of a personalized Web information gathering agent, intended to address several shortcomings of today’s centralized search engines. The potential privacy issues are addressed by a standalone client-side implementation, placing the agent under its users’ adm...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید