arabic text classification

نتایج جستجو برای: arabic text classification

تعداد نتایج: 727070 فیلتر نتایج به سال:

Arabic Dialect Identification Using a Parallel Multidialectal Corpus

2015

Shervin Malmasi Eshrag Refaee Mark Dras

We present a study on sentence-level Arabic Dialect Identification using the newly developed Multidialectal Parallel Corpus of Arabic (MPCA) – the first experiments on such data. Using a set of surface features based on characters and words, we conduct three experiments with a linear Support Vector Machine classifier and a meta-classifier using stacked generalization – a method not previously a...

متن کامل

A Comparative Study for Arabic Text Classification Based on BOW and Mixed Words Representations

Journal: :IJCI. International Journal of Computers and Information 2016

متن کامل

OSMAN ― A Novel Arabic Readability Metric

2016

Mahmoud El-Haj Paul Rayson

We present OSMAN (Open Source Metric for Measuring Arabic Narratives) a novel open source Arabic readability metric and tool. It allows researchers to calculate readability for Arabic text with and without diacritics. OSMAN is a modified version of the conventional readability formulas such as Flesch and Fog. In our work we introduce a novel approach towards counting short, long and stress syll...

متن کامل

POS Tagging of Dialectal Arabic: A Minimally Supervised Approach

2005

Kevin Duh Katrin Kirchhoff

Natural language processing technology for the dialects of Arabic is still in its infancy, due to the problem of obtaining large amounts of text data for spoken Arabic. In this paper we describe the development of a part-of-speech (POS) tagger for Egyptian Colloquial Arabic. We adopt a minimally supervised approach that only requires raw text data from several varieties of Arabic and a morpholo...

متن کامل

An Evaluation of Methods for Arabic Character Recognition

2015

A. Lawgali

Off-line recognition of text plays a significant role in several applications such as the automatic sorting of postal mail or editing old documents. The recognition of Arabic handwriting characters is a difficult task owing to the similar appearance of some different characters. Most researchers have presented methods that recognise isolated characters. However, recognition of all shapes of Ara...

متن کامل

Automatic Arabic Text Summarization Approaches

Journal: :International Journal of Computer Applications 2017

متن کامل

A Hybrid Approach for Building Arabic Diacritizer

2009

Khaled Shaalan Hitham Mohamed Abo Bakr Ibrahim Ziedan

Modern standard Arabic is usually written without diacritics. This makes it difficult for performing Arabic text processing. Diacritization helps clarify the meaning of words and disambiguate any vague spellings or pronunciations, as some Arabic words are spelled the same but differ in meaning. In this paper, we address the issue of adding diacritics to undiacritized Arabic text using a hybrid ...

متن کامل

using fuzzy lr numbers in bayesian text classifier for classifying persian text documents

Journal: :international journal of information, security and systems management 0

text classification is an important research field in information retrieval and text mining. the main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. since word detection is a difficult and time consuming task in persian language, bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text

2009

Tarek Elghazaly Aly Fahmy

This paper provides a novel model for English/Arabic Query Translation to search Arabic text, and then expands the Arabic query to handle Arabic OCR-Degraded Text. This includes detection and translation of word collocations, translating single words, transliterating names, and disambiguating translation and transliteration through different approaches. It also expands the query with the expect...

متن کامل

Text Classification and Multilinguism: Getting at Words via N-grams of Characters

2002

Ismaïl Biskri Sylvain Delisle

Genuine numerical multilingual text classification is almost impossible if only words are treated as the privileged unit of information. Although text tokenization (in which words are considered as tokens) is relatively easy in English or French, it is much more difficult for other languages such as German or Arabic. Moreover, stemming, typically used to normalize and reduce the size of the lex...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید