Twitter Cyberbullying detection   based on feature extraction and graph representation

Hiba Jabbar Aleqabie

المؤلفون

Hiba Jabbar Aleqabie جامعة كربلاء /كلية علوم الحاسوب وتكنولوجيا المعلومات

الكلمات المفتاحية:

cyberbulling، cyberbulling detection، NLP، Machine learning، graph mining

الملخص

Cyberbullying has become a severe problem as a result of the extensive use of social media platforms. It mainly refers to the act of utilizing digital methods to intentionally hurt, harass, or intimidate a person or group of people. Cyberbullying can take place on a variety of social media sites, including Facebook, Twitter, Instagram, and Snapchat. This form of bullying can have several negative impacts on individuals, including psychological distress, social isolation, academic problems, and even physical harm. In this research, an approach was proposed for detecting cyberbullying on Twitter. The approach involves the extraction of features from tweets and the construction of a graph representation of the Twitter network. The emphasis was placed on the techniques for extracting the features. In the initial use of the term frequency in standard textual features, the features of the embedding words (Word2Vec) and the features of the tweets' graph representation of the tweets were employed first, followed by the inverse document frequency (TF-IDF) and bag-of-words (BOW). In an effort to discover the most accurate classification results, mutual information and Chi2 feature selection techniques were employed. Based on the extracted features, we classify tweets as cyberbullying or non-cyberbullying using machine learning algorithms. Based on our experimental findings, our approach demonstrates exceptional accuracy in identifying cyberbullying tweets. Specifically, when utilizing the Random Forest model with positive feature, we achieved a perfect accuracy rate of 0.98%.

Twitter Cyberbullying detection based on feature extraction and graph representation

المؤلفون

الكلمات المفتاحية:

الملخص

التنزيلات

منشور

إصدار

القسم

العدد الحالي

المعلومات

اللغة