A Similarity Measure for Text Document Using Term Cardinality

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure

Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...

متن کامل

Soft Cardinality: A Parameterized Similarity Function for Text Comparison

We present an approach for the construction of text similarity functions using a parameterized resemblance coefficient in combination with a softened cardinality function called soft cardinality. Our approach provides a consistent and recursive model, varying levels of granularity from sentences to characters. Therefore, our model was used to compare sentences divided into words, and in turn, w...

متن کامل

Automatic Term Extraction and Document Similarity in Special Text Corpora

This paper confirms that the performance of a state-of-the-art automatic term extraction method on a computer science corpus is similar to previously published performance data on a medical corpus. The extracted terms are then used to estimate the similarity of papers in the computer science corpus using the standard Vector Space Model. The precision of retrieval using a term-based representati...

متن کامل

Text-Line Extraction and Character Recognition of Document Headlines With Graphical Designs Using Complementary Similarity Measure

A method for recognizing characters on graphical designs is proposed. A new projection feature that separates text-line regions from backgrounds, and adaptive thresholding in displacement matching are introduced. Experimental results for newspaper headlines with graphical designs show a recognition rate of 97.7 percent.

متن کامل

Similarity Measures for Text Document Clustering

Clustering is a useful technique that organizes a large quantity of unordered text documents into a small number of meaningful and coherent clusters, thereby providing a basis for intuitive and informative navigation and browsing mechanisms. Partitional clustering algorithms have been recognized to be more suitable as opposed to the hierarchical clustering schemes for processing large datasets....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IOP Conference Series: Materials Science and Engineering

سال: 2021

ISSN: 1757-899X

DOI: 10.1088/1757-899x/1012/1/012059