Encoding Words into String Vectors for Word Categorization

نویسنده

  • Taeho Jo
چکیده

In this research, we propose the string vector based K Nearest Neighbor as the approach to the word categorization. In the previous works on the text categorization, it was successful to encode texts into string vectors, by preventing the demerits from encoding them into numerical vectors; it provides the motivation for doing this research. In this research, we encode words into string vectors instead of texts, define the semantic operation between the string vectors, and modify the K Nearest Neighbor into the string vector based version as the approach to the word categorization. As the benefits from this research, we expect the improved performance by avoiding problems in encoding texts or words into numerical vectors and more compact representations than numerical vectors. Hence, the goal of this research is to implement the word categorization system with its better performance and more compact representation of words.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using String Vector based KNN for Keyword Extraction

In this research, we propose the string vector based KNN as the approach to the keyword extraction. The keyword extraction may be viewed as an instance of word classification, encoding words into numerical vectors may cause the main problems, such as the huge dimensionality, the sparse distribution and the poor transparency, and the problems were solved by encoding texts into string vectors in ...

متن کامل

Representation of Texts into String Vectors for Text Categorization

In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based appr...

متن کامل

Inverted Index based Modified Version of KNN for Text Categorization

This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in...

متن کامل

String Vector based KNN for Index Optimization

In this research, we propose the string vector based KNN as the approach to the index optimization. The task may be viewed into an instance of word classification and the problems in encoding words or texts into numerical vectors were solved by encoding texts into string vectors in the previous works on text mining tasks. Influence by the previous works, we encode words into string vectors, as ...

متن کامل

String Vector based AHC as Approach to Word Clustering

In this research, we propose the string vector based AHC (Agglomerative Hierarchical Clustering) algorithm as the approach to the word clustering. In the previous works on text clustering, it was successful to encode texts into string vectors by improving the performance of text clustering; it provided the motivation of doing this research. In this research, we encode words into string vectors,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016