Filtering spam in Weibo using ensemble imbalanced classification and knowledge expansion


Weibo has become an important information sharing platform in our daily life in China. Many applications utilize Weibo data to analyze hot topic and opinion evolution patterns to gain insights into user behavior. However, various spam messages degrade the performance of these applications and thus are essential to be filtered.

In this paper, we propose a unified spam detection approach, which utilizes external knowledge sources to expand keywords features and applies an ensemble under-sampling based strategy to handle the class-imbalance problem. The experimental results show the effectiveness and robustness of our approach in Weibo data.