ArWordVec: efficient word embedding models for Arabic tweets

Published in Soft Computing, 2020

Recommended citation: Mohammed Fouad, Ahmed Mahany, Naif Aljohani, Rabeeh Abbasi, Saeed-Ul Hassan, "ArWordVec: efficient word embedding models for Arabic tweets." Soft Computing, 2020. https://doi.org/10.1007/s00500-019-04153-6

Access paper here

One of the major advances in artificial intelligence nowadays is to understand, process and utilize the humans’ natural language. This has been achieved by employing the different natural language processing (NLP) techniques along with the aid of the various deep learning approaches and architectures. Using the distributed word representations to substitute the traditional bag-of-words approach has been utilized very efficiently in the last years for many NLP tasks. In this paper, we present the detailed steps of building a set of efficient word embedding models called ArWordVec that are generated from a huge repository of Arabic tweets. In addition, a new method for measuring Arabic word similarity is introduced that has been used in evaluating the performance of the generated ArWordVec models. The experimental results show that the performance of the ArWordVec models overcomes the recently available models on Arabic Twitter data for the word similarity task. In addition, two of the large Arabic tweets datasets are used to examine the performance of the proposed models in the multi-class sentiment analysis task. The results show that the proposed models are very efficient and help in achieving a classification accuracy ratio exceeding 73.86\% with a high average F1 value of 74.15.