Identification of Targets in Disinformation News Articles Using Supervised Machine Learning

Published in In the proceedings of Advanced Data Mining and Applications, 2025

Recommended citation: Sadam Hussain, Akmal Khattak, Rabeeh Abbasi, Tony Russell-Rose, Venkata Chinthalapati, "Identification of Targets in Disinformation News Articles Using Supervised Machine Learning." In the proceedings of Advanced Data Mining and Applications, 2025.

Access paper here

Fake news or disinformation spreads widely among various communities worldwide due to the advancement in technology involving social media platforms such as Facebook, Twitter, and Instagram in our daily lives. Disinformation news is designed to mislead and deceive the public against some entities, particularly countries, the public, religion, etc. This news is frequently disseminated by individuals, organizations, covert agencies, or nations to target particular governments and organizations to damage their international standing. In the past, academics concentrated on issues related to classification problems, identifying fake news, and detecting fake profiles. In the field mentioned above, locating hidden targets is a popular topic of investigation. In the proposed work, we have used the EU Disinfo Lab dataset to identify the targets within the disinformation news articles. The targets in disinformation news are identified using content features, unigram, bigram, unigram with bigram, and unigram with trigram. The proposed model is trained using supervised machine learning techniques such as the Linear Support Vector Classifier (LSVC) and Logistic Regression (LR), as well as three ensemble methods: Random Forest (RF), Passive Aggressive (PA), and extreme Gradient Boosting Classifier (XGB). For nine classes, the LSVC performed better on all four N-grams, including unigram, bigram, unigram with bigram, and unigram with trigram. This classifier also performed better for three classes except for unigram with bigram and unigram with trigram features; for these features, it was the second highest after LR. The targets were correctly identified using contents features by unigram with bigram and unigram with trigram, with a higher accuracy of 77{\%} for each.