String Vector based AHC as Approach to Word Clustering
نویسنده
چکیده
In this research, we propose the string vector based AHC (Agglomerative Hierarchical Clustering) algorithm as the approach to the word clustering. In the previous works on text clustering, it was successful to encode texts into string vectors by improving the performance of text clustering; it provided the motivation of doing this research. In this research, we encode words into string vectors, define the semantic operation on string vectors, and modify the AHC algorithm into its string vector based version. As the benefits from this research, we expect the improved performance and more compact representations of words. Hence, the goal of this research is to implement the word clustering system with the benefits.
منابع مشابه
Fuzzy Clustering Approach Using Data Fusion Theory and its Application To Automatic Isolated Word Recognition
In this paper, utilization of clustering algorithms for data fusion in decision level is proposed. The results of automatic isolated word recognition, which are derived from speech spectrograph and Linear Predictive Coding (LPC) analysis, are combined with each other by using fuzzy clustering algorithms, especially fuzzy k-means and fuzzy vector quantization. Experimental results show that the...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملAssessment of Clustering Methods for Predicting Permeability in a Heterogeneous Carbonate Reservoir
Permeability, the ability of rocks to flow hydrocarbons, is directly determined from core. Due to high cost associated with coring, many techniques have been suggested to predict permeability from the easy-to-obtain and frequent properties of reservoirs such as log derived porosity. This study was carried out to put clustering methods (dynamic clustering (DC), ascending hierarchical clustering ...
متن کاملUPMC at MediaEval 2016 Retrieving Diverse Social Images Task
In the MediaEval 2016 Retrieving Diverse Social Images Task, we proposed a general framework based on agglomerative hierarchical clustering (AHC). We tested the provided credibility descriptors as a vector input for our AHC. The results on devset showed that this vector based on the credibility descriptors is the best feature, but unfortunately that is not confirmed on testset. To merge several...
متن کاملEncoding Words into String Vectors for Word Categorization
In this research, we propose the string vector based K Nearest Neighbor as the approach to the word categorization. In the previous works on the text categorization, it was successful to encode texts into string vectors, by preventing the demerits from encoding them into numerical vectors; it provides the motivation for doing this research. In this research, we encode words into string vectors ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016