INTWEEMS: A Framework for Incremental Clustering of Tweet Streams

Published in In the proceedings of Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, 2015

Recommended citation: Muhammad Minhas, Rabeeh Abbasi, Naif Aljohani, Aiiad Albeshri, Mubashar Mushtaq, "INTWEEMS: A Framework for Incremental Clustering of Tweet Streams." In the proceedings of Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services, 2015. http://doi.acm.org/10.1145/2837185.2843853

Access paper here

Twitter is a popular micro-blogging service for sharing short messages called tweets. Tweets provide public opinion on various topics. Currently twitter presents search results in form of a flat list, sorted either by popularity or by recency. These search results limit the possibility of identifying diverse latent topics covered by the tweets. One way to better understand the tweets is to cluster them where each cluster depicts a latent topic. Suitable clustering algorithms are required to cluster streaming data and map new data into existing clusters. To address this, we propose in this paper a framework called INTWEEMS (INcremental clustering of TWEEt streaMS) which clusters tweets in real-time, adjusts new tweets into existing clusters (incrementally), and provides visualization of clusters that helps in identifying latent topics and sub-topics within the tweets. This paper describes the INTWEEMS framework and its implementation.