Abstract
In this digital age, there is a widespread trend and desire for people to have a social presence across a variety of social media platforms. Nigeria, for instance, is a multilingual country that aspires to have a social presence in the media, such as Twitter, for important languages including Hausa, Igbo, and Yoruba. This does not come without creating a research challenge for the sentiment analysis (SA) algorithms that are already in use owing to the complex nature of text data and filtration strategy adopted. Thus, this research aims to use text-filtering approach to improve the accuracy of the current model. This study made use of the African Language Bidirectional Encoder Representations from Transformers (AfriBERTa) language model, which was created especially for African languages by eliminating terms that are common to several sentiment classes. The algorithm’s performance across the chosen languages is compared for both filtered and unfiltered datasets, and the results based on accuracy for Pigin for unfiltered is 0.69 and filtered is 0.75; accuracy for Hausa for unfiltered is 0.75 and filtered is 0.79. Similarly, accuracy for Yoruba for unfiltered is 0.75 and filtered is 0.80; while accuracy for Igbo for unfiltered is 0.77 and filtered is 0.76. These results show that the filtration strategy generally improves in terms of accuracy, precision, recall, and F1-scores. This implies that for efficient sentiment analysis in a variety of linguistic contexts, these customized data pretreatment approaches are essential because the proposed technique aids to improve sentiment classification. In addition to emphasizing the value of context-specific approaches in SA, this research lays the groundwork for future developments in multilingual sentiment analysis, which could find useful in a number of fields such as public opinion analysis and market research.
Key words: Text Filtering Approach, Sentiment analysis, Twitter, Multiple languages, AfriBERTa