Twitter Cyberbullying detection based on feature extraction and graph representation
Keywords:cyberbulling, cyberbulling detection, NLP, Machine learning, graph mining
Cyberbullying has become a severe problem as a result of the extensive use of social media platforms. It mainly refers to the act of utilizing digital methods to intentionally hurt, harass, or intimidate a person or group of people. Cyberbullying can take place on a variety of social media sites, including Facebook, Twitter, Instagram, and Snapchat. This form of bullying can have several negative impacts on individuals, including psychological distress, social isolation, academic problems, and even physical harm. In this research, an approach was proposed for detecting cyberbullying on Twitter. The approach involves the extraction of features from tweets and the construction of a graph representation of the Twitter network. The emphasis was placed on the techniques for extracting the features. In the initial use of the term frequency in standard textual features, the features of the embedding words (Word2Vec) and the features of the tweets' graph representation of the tweets were employed first, followed by the inverse document frequency (TF-IDF) and bag-of-words (BOW). In an effort to discover the most accurate classification results, mutual information and Chi2 feature selection techniques were employed. Based on the extracted features, we classify tweets as cyberbullying or non-cyberbullying using machine learning algorithms. Based on our experimental findings, our approach demonstrates exceptional accuracy in identifying cyberbullying tweets. Specifically, when utilizing the Random Forest model with positive feature, we achieved a perfect accuracy rate of 0.98%.