Big Data Analytics and Knowledge Discovery 20th International Conference, DaWaK 2018, Regensburg, Germany, September 3-6, 2018, Proceedings

This book constitutes the refereed proceedings of the 20th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2018, held in Regensburg, Germany, in September 2018.The 13 revised full papers and 17 short papers presented were carefully reviewed and selected from 76 submissi...

Full description

Saved in:

Bibliographic Details
Main Authors	International Conference on Data Warehousing and Knowledge Discovery, Ordonez, Carlos, Bellatreche, Ladjel
Format	eBook Book
Language	English
Published	Cham Springer Nature 2018 Springer Springer International Publishing AG
Edition	1
Series	Lecture Notes in Computer Science
Subjects	Big data > Congresses Computer science Data mining > Congresses Database management-Congresses Special computer methods
Online Access	Get full text
ISBN	3319985396 9783319985398 9783319985381 3319985388

Cover

Table of Contents:

7 Efficient Path Computation: A Utility of CoRe Network -- 8 Conclusion -- References -- E-Commerce Product Recommendation Using Historical Purchases and Clickstream Data -- 1 Introduction -- 1.1 Observations and Assumptions -- 1.2 Paper Contributions -- 1.3 Paper Outline -- 2 Related Work -- 3 Proposed HPCRec System -- 3.1 FN: Frequency Normalization -- 3.2 CSSM: Clickstream Sequence Similarity Measurement -- 3.3 TWFI: Transaction-Based Weighted Frequent Item -- 4 Evaluation and Comparative Analysis -- 4.1 An Example for Handling Infrequent User Cases -- 4.2 Experimental Design -- 4.3 Experimental Results -- 5 Conclusions and Future Work -- References -- Effective Classification of Ground Transportation Modes for Urban Data Mining in Smart Cities -- Abstract -- 1 Introduction -- 2 Related Works -- 3 Our Proposed Classification System -- 3.1 Dataset Collection Module -- 3.2 Trip Segmentation Module -- 3.3 Feature Extraction Module -- 3.4 Model Construction Module -- 3.5 Data Classification Module -- 4 Evaluation -- 5 Conclusions -- Acknowledgements -- References -- Location Prediction Using Sentiments of Twitter Users -- Abstract -- 1 Introduction -- 2 Related Works -- 3 Proposed Approach -- 3.1 Label Tweets with Location Categories -- 3.2 Build Training and Test Datasets -- 3.3 Sentiment Analysis -- 3.4 Location Prediction Using SLLDA -- 3.5 Additional Constraints -- 4 Experimental Results -- 4.1 Dataset Description -- 4.2 Experiments and Results -- 4.2.1 Comparison with Baseline Model -- 4.2.2 Comparison with Different Window Sizes -- 4.2.3 Comparison for Different Radii -- 5 Conclusion -- References -- Classification and Clustering -- A Clustering Model for Uncertain Preferences Based on Belief Functions -- 1 Introduction -- 2 Basic Notions -- 2.1 Preference Order -- 2.2 Dissimilarity Between Orders -- 2.3 Belief Functions
2.4 Distance on Belief Functions -- 3 Preference Model Under Uncertainty -- 3.1 Problem Setting -- 3.2 Preference Model on Belief Functions -- 4 Contribution: Agent Clustering Based on Their Preferences -- 4.1 Representation of Agents -- 4.2 Modeling of Mass Functions -- 4.3 Dissimilarity Between Different Agents -- 4.4 Unsupervised classifier-Ek-NN Denoeux:2015:EKN:2827375.2827662 -- 5 Experiments -- 5.1 Evaluation Criteria -- 5.2 Certain Preferences -- 5.3 Uncertain Preferences -- 6 Conclusion and Perspectives -- References -- A Novel Committee-Based Clustering Method -- 1 Introduction -- 2 Related Work -- 3 The 3-Stage Ensemble Clustering Method -- 4 Experiments and Results -- 5 Conclusion and Future Work -- References -- KMN - Removing Noise from K-Means Clustering Results -- 1 Introduction -- 1.1 Related Work -- 1.2 Contributions -- 2 The Algorithm -- 2.1 Finding the Voronoi Intersections -- 2.2 The MDL-Criterion -- 2.3 Finding the Voronoi Adjacencies -- 2.4 Pseudo Code -- 2.5 Performance on Running Example -- 3 Resilience Regarding k -- 4 Experiments -- 5 Outlook and Conclusion -- References -- Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification -- 1 Introduction -- 2 Background and Related Work -- 2.1 Extreme Classification Methods -- 2.2 LDA and LLDA -- 3 Subset LLDA -- 3.1 Time Complexity -- 4 Empirical Evaluation -- 4.1 Results -- 5 Conclusions and Future Work -- References -- Third Party Data Clustering Over Encrypted Data Without Data Owner Participation: Introducing the Encrypted Distance Matrix -- 1 Introduction -- 2 Previous Work -- 3 Preliminaries -- 3.1 Homomorphic Encryption: Liu's Scheme -- 3.2 Order Preserving Encryption -- 4 Encrypted Distance Matrix (EDM) Generation -- 5 Secure Nearest Neighbour Clustering -- 5.1 Data Owner Data Preparation -- 5.2 Third Party Clustering: SNNC Algorithm -- 6 Evaluation
4 The MPFPS Algorithm -- 4.1 A Detailed Example -- 5 Experimental Evaluation -- 5.1 Influence of minRa and maxStd -- 5.2 Influence of maxPr -- 5.3 Memory Used for Different Parameter Values -- 5.4 Discussion -- 6 Conclusion -- References -- Discovering Tight Space-Time Sequences -- 1 Introduction -- 2 Related Works -- 3 Formalization -- 4 STSM -- 4.1 STS Definitions -- 4.2 General Principle -- 4.3 Toy Example -- 5 Experimental Evaluation -- 5.1 Dataset -- 5.2 Exploratory Analysis -- 5.3 Discussions -- 6 Conclusion -- References -- Cloud and Database Systems -- CloudDBGuard: Enabling Sorting and Searching on Encrypted Data in NoSQL Cloud Databases -- 1 Introduction -- 2 Background and Related Work -- 2.1 Motivating Example -- 2.2 The Data Model of Wide Column Stores -- 2.3 Property-Preserving Encryption -- 2.4 Onion Layer Model -- 3 The CloudDBGuard Framework -- 3.1 API -- 3.2 Selective Encryption -- 3.3 Separation of Duties -- 3.4 Table Profiles -- 3.5 Unification of Data Models -- 4 Benchmark -- 5 Conclusion and Future Work -- References -- Query Processing on Large Graphs: Scalability Through Partitioning -- 1 Motivation -- 2 Related Work -- 3 Graphs, Queries, Plan Generation, and Partitioning -- 4 Partitioned Approach to Query Processing -- 4.1 PGQP System Architecture -- 4.2 Correctness of the Approach -- 5 Metrics and Heuristics for Partitioned Query Evaluation -- 5.1 Number of Query Plan Start/Continuation Nodes in a Partition -- 5.2 Total Number of Connected Components -- 5.3 Quantitative Measures for Evaluating the Heuristics -- 6 Implementation Summary -- 6.1 Management of Partial Results -- 7 Experimental Analysis -- 7.1 Evaluation of Start Node Heuristics -- 7.2 Evaluation of Connected Components Heuristics -- 8 Conclusions -- References -- Querying Heterogeneous Data in Graph-Oriented NoSQL Systems -- Abstract -- 1 Introduction
2 Problem Modeling
Intro -- Preface -- Organization -- Smart Aging: Topics, Applications, Technologies, and Agenda (Abstract of Keynote Speaker) -- Contents -- Graph Analytics -- Graph BI &amp -- Analytics: Current State and Future Challenges -- 1 Introduction -- 2 Graph Data Modeling -- 2.1 Graph Models -- 2.2 Graph Management -- 3 Graph Analytics -- 3.1 OLAP on Graphs -- 3.2 Graph Mining -- 3.3 Graph Processing -- 4 Future Research Directions -- 5 Conclusion -- References -- Community Detection in Who-calls-Whom Social Networks -- 1 Introduction -- 2 Related Work -- 3 Proposed Methodology -- 3.1 Graph Mining and Community Detection -- 3.2 Clustering Evaluation Methods -- 4 Implementation Methodology -- 4.1 Dataset -- 4.2 System Architecture -- 4.3 Queries -- 5 Performance Evaluation -- 5.1 Runtime Evaluation -- 5.2 Community Detection and Evaluation -- 6 Conclusions -- References -- FedS: Towards Traversing Federated RDF Graphs -- 1 Introduction -- 2 Motivation Scenario - Cancer Genomics -- 3 Preliminaries -- 4 Related Work -- 5 FedS -- 6 Results and Discussion -- 7 Conclusion and Future Work -- References -- Case Studies -- Adversarial Spiral Learning Approach to Strain Analysis for Bridge Damage Detection -- 1 Introduction -- 2 Bridge Analysis: The Influence Line -- 3 The Spiral Learning Approach -- 4 Training and Evaluation Data -- 5 Experimental Results -- 6 Discussion -- 7 Conclusion -- References -- CoRe: Generating a Computationally Representative Road Skeleton - Integrating AADT with Road Structure -- 1 Introduction -- 2 Preliminaries -- 3 Analysis of Annual Average Daily Traffic (AADT) Data -- 4 Skeleton Network: Knowledge Based Network Extraction -- 4.1 Edge Priority Computation -- 4.2 Skeleton Network Generation -- 5 Evaluation of Skeleton Network with AADT Data -- 6 CoRe Network: Integration of Skeleton Network and AADT Data
6.1 Data Owner Data Preparation Run Time Complexity -- 6.2 Clustering Efficiency -- 6.3 Clustering Accuracy -- 6.4 Security Analysis -- 7 Conclusion -- References -- Pre-proccesing -- An Efficient Prototype Selection Algorithm Based on Spatial Abstraction -- 1 Introduction -- 2 Related Works -- 3 Notation -- 4 The PSSA Algorithm -- 5 Experiments -- 6 Conclusion -- References -- Web Usage Data Cleaning -- Abstract -- 1 Introduction -- 2 Cleaning Methods Analysis -- 2.1 Data and Formalism -- 2.2 Related Methods -- 2.3 Limitation Analysis -- 3 Rule-Based Cleaning Approach -- 3.1 Targeted Contribution Concept and Method -- 3.2 Logging Structure Features Identification (Step 1) -- 3.3 Cleaning Rule (Step 2) -- 3.4 Ruel-Based Cleaning Algorithm (Step 3) -- 4 Experimentation and Results Discussion -- 4.1 Experimental Data and Validation Reference -- 4.2 Results Analysis -- 4.3 Results Evaluation -- 5 Conclusion -- References -- Anonymization of Multiple and Personalized Sensitive Attributes -- 1 Introduction -- 2 Literature Review -- 3 Preliminaries and Problem Statement -- 4 The Proposed (k, p)-anonymity Framework -- 4.1 Data Pre-processing Phase -- 4.2 Clustering Phase -- 4.3 Anonymization Phase -- 5 Experimental Evaluation -- 5.1 Information Loss -- 5.2 Runtime -- 6 Conclusions -- References -- TRANS-AM: Discovery Method of Optimal Input Vectors Corresponding to Objective Variables -- 1 Introduction -- 2 Related Work -- 3 TRANS-AM: Proposed Method -- 3.1 Notation -- 3.2 Split Input Space with Regression Tree -- 3.3 -Satisfactory Instance -- 3.4 Feature Transformation -- 4 Numerical Simulation and Evaluation -- 4.1 Experimental Setting -- 4.2 Result and Consideration -- 5 Conclusion -- References -- Sequences -- Discovering Periodic Patterns Common to Multiple Sequences -- 1 Introduction -- 2 Related Work -- 3 Definitions and Problem Statement