How can we find useful patterns and anomalies in large scale real-world data with multiple attributes? For example, network intrusion logs, with (source-ip, target-ip, port-number, timestamp)? Tensors are suitable for modeling these multi-dimensional data, and widely used for the analysis of social networks, web data, network traffic, and in many other settings. However, current tensor decomposition methods do not scale for tensors with millions and billions of rows, columns and `fibers’, that often appear in real datasets.
In this paper, we propose HaTen2, a scalable distributed suite of tensor decomposition algorithms running on the MapReduce platform. By carefully reordering the operations, and exploiting the sparsity of real world tensors, HaTen2 dramatically reduces the intermediate data, and the number of jobs. As a result, using HaTen2, we analyze big real-world tensors that can not be handled by the current state of the art, and discover hidden concepts.