Mining billion-scale tensors: algorithms and discoveries

How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tens...

Full description

Saved in:

Bibliographic Details
Published in	The VLDB journal Vol. 25; no. 4; pp. 519 - 544
Main Authors	Jeon, Inah, Papalexakis, Evangelos E., Faloutsos, Christos, Sael, Lee, Kang, U.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2016
Subjects	Computer Science Database Management Regular Paper Big data Tensor Distributed computing Hadoop MapReduce
Online Access	Get full text
ISSN	1066-8888 0949-877X
DOI	10.1007/s00778-016-0427-4

Cover

More Information
Summary:	How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tensor decompositions are widely used in many data mining applications: detecting malicious attackers in network traffic logs (with source IP, destination IP, port-number, timestamp), finding telemarketers in a phone call history (with sender, receiver, date), and identifying interesting concepts in a knowledge base (with subject, object, relation). However, current tensor decomposition methods do not scale to large and sparse real-world tensors with millions of rows and columns and ‘fibers.’ In this paper, we propose HaTen2 , a distributed method for large-scale tensor decompositions that runs on the MapReduce framework. Our careful design and implementation of HaTen2 dramatically reduce the size of intermediate data and the number of jobs leading to achieve high scalability compared with the state-of-the-art method. Thanks to HaTen2 , we analyze big real-world sparse tensors that cannot be handled by the current state of the art, and discover hidden concepts.
ISSN:	1066-8888 0949-877X
DOI:	10.1007/s00778-016-0427-4