Word Distributions for Thematic Segmentation in a Support Vector Machine Approach
نویسندگان
چکیده
We investigate the appropriateness of using a technique based on support vector machines for identifying thematic structure of text streams. The thematic segmentation task is modeled as a binaryclassification problem, where the different classes correspond to the presence or the absence of a thematic boundary. Experiments are conducted with this approach by using features based on word distributions through text. We provide empirical evidence that our approach is robust, by showing good performance on three different data sets. In particular, substantial improvement is obtained over previously published results of worddistribution based systems when evaluation is done on a corpus of recorded and transcribed multi-party dialogs.
منابع مشابه
Multiple Sclerosis Lesions Segmentation in Magnetic Resonance Imaging using Ensemble Support Vector Machine (ESVM)
Background: Multiple Sclerosis (MS) syndrome is a type of Immune-Mediated disorder in the central nervous system (CNS) which destroys myelin sheaths, and results in plaque (lesion) formation in the brain. From the clinical point of view, investigating and monitoring information such as position, volume, number, and changes of these plaques are integral parts of the controlling process this dise...
متن کاملMODELING OF FLOW NUMBER OF ASPHALT MIXTURES USING A MULTI–KERNEL BASED SUPPORT VECTOR MACHINE APPROACH
Flow number of asphalt–aggregate mixtures as an explanatory factor has been proposed in order to assess the rutting potential of asphalt mixtures. This study proposes a multiple–kernel based support vector machine (MK–SVM) approach for modeling of flow number of asphalt mixtures. The MK–SVM approach consists of weighted least squares–support vector machine (WLS–SVM) integrating two kernel funct...
متن کاملA New Play-off Approach in League Championship Algorithm for Solving Large-Scale Support Vector Machine Problems
There are many numerous methods for solving large-scale problems in which some of them are very flexible and efficient in both linear and non-linear cases. League championship algorithm is such algorithm which may be used in the mentioned problems. In the current paper, a new play-off approach will be adapted on league championship algorithm for solving large-scale problems. The proposed algori...
متن کاملToward a Thorough Approach to Predicting Klinkenberg Permeability in a Tight Gas Reservoir: A Comparative Study
Klinkenberg permeability is an important parameter in tight gas reservoirs. There are conventional methods for determining it, but these methods depend on core permeability. Cores are few in number, but well logs are usually accessible for all wells and provide continuous information. In this regard, regression methods have been used to achieve reliable relations between log readings and Klinke...
متن کاملChinese Word Segmentation with Conditional Support Vector Inspired Markov Models
Character-based tagging method has achieved great success in Chinese Word Segmentation (CWS). This paper proposes a new approach to improve the CWS tagging accuracy by structured support vector machine (SVM) utilization of unlabeled text corpus. First, character N-grams in unlabeled text corpus are mapped into low-dimensional space by adopting SOM algorithm. Then new features extracted from the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006