Machine Learning Framework for Hate Speech Detection in Iraqi Dialect YouTube Comments

safaa hameed; Asia Mehdi

Authors

safaa hameed college of computer science and information technology university of kerbala Kerbala, Iraq
Asia Mehdi college of computer science and information technology university of kerbala

Keywords:

Hate Speech, Iraqi Dialect, YouTube comments, FastText, GNN, RNN, BiLSTM

Abstract

Social media platforms like Facebook, YouTube, and Twitter have witnessed remarkable growth, and the type of data and information shared on these sites has evolved dramatically. Because users of all ages can readily access these platforms, this technological advancement has also been essential in encouraging the spread of hate speech and enhancing its impact on society. Researchers have sought to develop a range of strategies and technology models to detect and mitigate this growing threat.

Even though hate speech identification in English-language literature using Natural Language Processing (NLP) approaches has advanced significantly, research on the Arabic language, especially the Iraqi dialect, is still lacking. This research aims to identify hate speech in the Iraqi dialect by creating a database of more than 150,000 comments taken from YouTube videos about Iraqi topics that have sparked public debate. The gathered remarks were prepared and processed in several steps, including human cleaning. The comments were then divided into four major semantic classes: hate speech, abusive, offensive, and normal.

The efficiency of many machine learning models in processing texts written in the Iraqi dialect was evaluated. Graph neural networks (GNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Arabic Bidirectional Encoder Representations from Transformers (AraBERT) model, Bidirectional Long Short-Term Memory networks (BiLSTM), and the FastText model were among the models. The outcomes showed that these models performed differently when it came to digesting content in the Iraqi dialect. FastText, on the other hand, recorded a performance rate of 96.1% in both the training phase and in predicting previously unseen remarks, achieving the greatest Accuracy, Precision, Recall, and F1-Score. Therefore, despite its simplicity, the FastText model offers a practical solution for classifying hate speech in different Arabic dialects.

Machine Learning Framework for Hate Speech Detection in Iraqi Dialect YouTube Comments

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Current Issue

Information

Language