Ensemble Classifier for Hindi Hostile Content Detection

نویسندگان

چکیده

Detection of hostile content from social media posts ( Facebook TM , Twitter etc.) is a demanding task in the field Natural Language Processing (NLP). Daily growing nature different electronic opened up new challenges language understanding. It becomes more difficult regional languages. AI-based solution required to identify on large scale. Though satisfactory amount researches has been carried out English language, finding languages still under progress due unavailability suitable datasets and tools. In terms number speakers, Hindi ranks third world first Indian Subcontinent. The objective article design detection system using coarse-grained (binary) classification fine-grained (multi-class, multi-label) classification. We noted that baseline learning method with pre-trained models perform differently. Using Constraint 2021 Dataset, this research proposes Bidirectional Encoder Representations Transformers (BERT) based contextual embedding technique concatenation emoji2vec Embedings classify Devanagari script as or non-hostile. Additionally, for tasks where are sub-categorized defamation, fake, hate, offensive, we develop an Ensemble Classifier varying methods models. With F1-Score 0.9721, it found our proposed Indic-BERT+emoji model outperforms other existing task. have also observed giving good results than 0.43, 0.82, 0.58 0.62 offensive classes respectively. code data available https://github.com/skarifahmed/hostile.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Accuracy in Intrusion Detection Systems Using Classifier Ensemble and Clustering

Recently by developing the technology, the number of network-based servicesis increasing, and sensitive information of users is shared through the Internet.Accordingly, large-scale malicious attacks on computer networks could causesevere disruption to network services so cybersecurity turns to a major concern fornetworks. An intrusion detection system (IDS) could be cons...

متن کامل

Fault Detection of Bearings Using a Rule-based Classifier Ensemble and Genetic Algorithm

This paper proposes a reduct construction method based on discernibility matrix simplification. The method works with genetic algorithm. To identify potential problems and prevent complete failure of bearings, a new method based on rule-based classifier ensemble is presented. Genetic algorithm is used for feature reduction. The generated rules of the reducts are used to build the candidate base...

متن کامل

Ensemble Classifier for Epileptic Seizure Detection for Imperfect EEG Data

Brain status information is captured by physiological electroencephalogram (EEG) signals, which are extensively used to study different brain activities. This study investigates the use of a new ensemble classifier to detect an epileptic seizure from compressed and noisy EEG signals. This noise-aware signal combination (NSC) ensemble classifier combines four classification models based on their...

متن کامل

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

A Classifier Ensemble of Binary Classifier Ensembles

This paper proposes an innovative combinational algorithm to improve the performance in multiclass classification domains. Because the more accurate classifier the better performance of classification, the researchers in computer communities have been tended to improve the accuracies of classifiers. Although obtaining the more accurate classifier is often aimed, there is an alternative option t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3591353