text domain

Improving Domain Dictionary-based Text Categorization Using Self-partition Model

Journal: :Int. J. Comput. Proc. Oriental Lang. 2005

Wenliang Chen Jingbo Zhu Muhua Zhu Li Zhang Tianshun Yao

In this paper, we present a novel model for improving the performance of Domain Dictionary-based text categorization. The proposed model is named as Self-Partition Model(SPM). SPM can group the candidate words into the predefined clusters, which are generated according to the structure of Domain Dictionary. Using these learned clusters as features, we proposed a novel text representation. The e...

متن کامل

Estimation of Sobolev embedding constant on a domain dividable into bounded convex domains

2017

Makoto Mizuguchi Kazuaki Tanaka Kouta Sekine Shin’ichi Oishi

This paper is concerned with an explicit value of the embedding constant from [Formula: see text] to [Formula: see text] for a domain [Formula: see text] ([Formula: see text]), where [Formula: see text]. We previously proposed a formula for estimating the embedding constant on bounded and unbounded Lipschitz domains by estimating the norm of Stein's extension operator. Although this formula can...

متن کامل

Domain specific concept ontologies and text summarization as hierarchical fuzzy logic ranking indicator on malay text corpus

Journal: :Indonesian Journal of Electrical Engineering and Computer Science 2019

متن کامل

A method for ontology and knowledge-base assisted text mining for diabetes discussion forum

2015

Ahmad Issa

April 2015 WMG, University of Warwick V | P a g e Abstract Social media offers researchers vast amount of unstructured text as a source to discover hidden knowledge and insights. However, social media poses new challenges to text mining and knowledge discovery due to its short length, temporal nature and informal language. In order to identify the main requirements for analysing unstructured te...

متن کامل

iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text

2009

Benjamin Adrian Jörn Hees Ludger van Elst Andreas Dengel

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontologybased information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relev...

متن کامل

A public domain speech-to-text system

1999

Mark Ordowski Neeraj Deshmukh Aravind Ganapathiraju Jonathan Hamaker Joseph Picone

The lack of freely available state-of-the-art Speech-toText (STT) software has been a major hindrance to the development of new audio information processing technology. The high cost of the infrastructure required to conduct state-of-the-art speech recognition research prevents many small research groups from evaluating new ideas on large-scale tasks. In this paper, we present the core componen...

متن کامل

Domain Specific Text Processing for Speech Synthesis

2001

Viveka Heyman

In Text-to-Speech (TTS) synthesis there are words and expressions that pose problems because some semantic knowledge is required to determine how they should be read out. This work implements a domain filter, a pre-processing module that supports the TTS system by analysing text belonging to a certain semantic domain and rewriting problematic expressions so that they are read out better. The fi...

متن کامل

Cross-domain Text Classification using Wikipedia

Journal: :IEEE Intelligent Informatics Bulletin 2008

Pu Wang Carlotta Domeniconi Jian Hu

Traditional approaches to document classification requires labeled data in order to construct reliable and accurate classifiers. Unfortunately, labeled data are seldom available, and often too expensive to obtain, especially for large domains and fast evolving scenarios. Given a learning task for which training data are not available, abundant labeled data may exist for a different but related ...

متن کامل

Domain Based Punjabi Text Document Clustering

2012

Saurabh Sharma Vishal Gupta

Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text clustering treats document as conglomeration of words. The syntactic or semantic relations between words are not given any consideration. Many different algorithms ...

متن کامل

Domain adaptation for text dependent speaker verification

2014

Hagai Aronowitz Asaf Rendel

Recently we have investigated the use of state-of-the-art textdependent speaker verification algorithms for user authentication and obtained satisfactory results mainly by using a fair amount of text-dependent development data from the target domain. In this work we investigate the ability to build high accuracy text-dependent systems using no data at all from the target domain. Instead of usin...

متن کامل