OPTICS on Text Data: Experiments and Test Results

نویسندگان

  • Deepak P
  • Shourya Roy
چکیده

Clustering, particularly text clustering, in data mining has been attracting a lot of attention of late. There have been conventional techniques like K-means, which involve parameters that can’t be easily estimated. With the emergence of density-based clustering algorithms which have significant advantages, a lot of attention has been devoted to them. OPTICS [1] is the latest and most sophisticated technique in this direction, and has been shown to be considerably tolerant to value changes in parameters. To the best of our knowledge, this is the first study on the applicability of OPTICS on text data. We perform a variety of experiments towards this end using various feature selection techniques (which,as we show, assume greater significance in the context of density based clustering), quantify our results by way of explanations and list conclusions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization and Application of OPTICS Algorithm on Text Clustering

Text clustering is of great importance in data mining, information fusion, artificial intelligence and some other fields. There are many methods in literatures that can be used to classify text. Most of them require some parameters, such as the number of categories, which should be assigned in advance or estimated in classifying process. However, it is difficult to determine these quantities in...

متن کامل

The relationship between Iranian EFL learners’ gender and reading comprehension of three different types of text

The present study investigated the relationship between the reading comprehension of three types of text and the gender of Iranian EFL learners. To this end, several reading passages with the same length and readability were selected based on which a reading comprehension test was constructed on three different text types namely essay, history, and short story. After determining the validity an...

متن کامل

The Effect of Post-text Written Corrective Feedback on Written Grammatical Accuracy: Iranian intermediate EFL learners

The main role and responsibility of second language writing teachers is to help learners to write with minimal errors. To do so, teachers need to provide students with appropriate types of feedback. In this research, the researchers examined the effect of post-text written corrective feedback on written grammatical accuracy of Iranian intermediate EFL learners. In the first phase, Nelson Profic...

متن کامل

Impact of Density and Distribution of Unfamiliar Lexical Items on Iranian EFL Learners’ Successful Reading Comprehension Achievement

Density and distribution of Unfamiliar Lexical Items (ULIs) appear to influence learners’ Reading Comprehension Achievement (RCA). This study concerns the impact of these two variables on Iranian EFL learners’ RCA. For this, two groups of students timetabled for the experiments designed to assess learners’ RCA. To determine the participants’ levels of proficiency a Quick Proficiency Test was fi...

متن کامل

Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application

The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006