Given a labeled graph containing fraudulent and legitimate nodes, which nodes group together? How can we use the riskiness of node groups to infer a future label for new members of a group? This paper focuses on social security fraud where companies are linked to the resources they use and share. The primary goal in social security fraud is to detect companies that intentionally fail to pay their contributions to the government.
We aim to detect fraudulent companies by (1) propagating a time-dependent exposure score for each node based on its relationships to known fraud in the network, (2) deriving cliques of companies and resources, and labeling these cliques in terms of their fraud and bankruptcy involvement, and (3) characterizing each company using a combination of intrinsic and relational features and its membership in suspicious cliques. We show that clique-based features boost the performance of traditional relational models.