Mining billion-scale tensors: algorithms and discoveries
How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tens...
        Saved in:
      
    
          | Published in | The VLDB journal Vol. 25; no. 4; pp. 519 - 544 | 
|---|---|
| Main Authors | , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Berlin/Heidelberg
          Springer Berlin Heidelberg
    
        01.08.2016
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1066-8888 0949-877X  | 
| DOI | 10.1007/s00778-016-0427-4 | 
Cover
| Summary: | How can we analyze large-scale real-world data with various attributes? Many real-world data (e.g., network traffic logs, web data, social networks, knowledge bases, and sensor streams) with multiple attributes are represented as multi-dimensional arrays, called tensors. For analyzing a tensor, tensor decompositions are widely used in many data mining applications: detecting malicious attackers in network traffic logs (with source IP, destination IP, port-number, timestamp), finding telemarketers in a phone call history (with sender, receiver, date), and identifying interesting concepts in a knowledge base (with subject, object, relation). However, current tensor decomposition methods do not scale to large and sparse real-world tensors with millions of rows and columns and ‘fibers.’ In this paper, we propose
HaTen2
, a distributed method for large-scale tensor decompositions that runs on the
MapReduce
framework. Our careful design and implementation of
HaTen2
dramatically reduce the size of intermediate data and the number of jobs leading to achieve high scalability compared with the state-of-the-art method. Thanks to
HaTen2
, we analyze big real-world sparse tensors that cannot be handled by the current state of the art, and discover hidden concepts. | 
|---|---|
| ISSN: | 1066-8888 0949-877X  | 
| DOI: | 10.1007/s00778-016-0427-4 |