Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track European Conference, ECML PKDD 2021, Bilbao, Spain, September 13-17, 2021, Proceedings, Part IV

The multi-volume set LNAI 12975 until 12979 constitutes the refereed proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2021, which was held during September 13-17, 2021. The conference was originally planned to take place in Bilbao, Spain, but...

Full description

Saved in:

Bibliographic Details
Main Authors	Dong, Yuxiao, Kourtellis, Nicolas, Hammer, Barbara, Lozano, Jose A
Format	eBook
Language	English
Published	Cham Springer International Publishing AG 2021 Springer International Publishing
Edition	1
Series	Lecture Notes in Computer Science
Subjects	Machine learning-Congresses
Online Access	Get full text
ISBN	9783030865139 3030865134

Cover

Table of Contents:

2.4 Pairwise Deep Ranking -- 3 Experiments -- 3.1 Data Description -- 3.2 Experimental Settings -- 3.3 Pre-trained Word Embedding -- 3.4 Compared Methods -- 3.5 Experimental Results -- 3.6 Fine-Grained Analysis -- 3.7 Different Risk Measure Analysis -- 4 Discussions on Explainability -- 4.1 Financial Sentiment Terms Analysis -- 4.2 Financial Sentiment Sentences Analysis -- 5 Conclusion -- References -- Healthcare and Medical Applications (including Covid) -- Self-disclosure on Twitter During the COVID-19 Pandemic: A Network Perspective -- 1 Introduction -- 2 Dataset -- 3 Self-disclosure Measurements -- 3.1 Measurement Scale -- 3.2 Manual Annotations -- 3.3 Label Generation -- 4 Analysis -- 4.1 Self-disclosure Assortativity in Twitter Reply Networks -- 4.2 Persistent Groups and Self-disclosure -- 4.3 Characterizing Sensitive Disclosures in Temporally Persistent Social Connections -- 5 Discussion -- 6 Related Work -- 7 Conclusion -- References -- COVID Edge-Net: Automated COVID-19 Lung Lesion Edge Detection in Chest CT Images -- 1 Introduction -- 2 Related Works -- 2.1 COVID-19 Segmentation -- 2.2 Edge Detection -- 3 Methodology -- 3.1 Task Definition -- 3.2 Overview of COVID Edge-Net -- 3.3 The Edge Detection Backbone -- 3.4 Multi-scale Residual Dual Attention (MSRDA) Module -- 3.5 Canny Operator Module -- 3.6 Global Loss Function -- 4 Experiments and Discussions -- 4.1 Experimental Settings -- 4.2 Comparison with State-of-the-Arts -- 4.3 Ablation Study -- 4.4 Additional Experiments -- 5 Conclusions -- References -- Improving Ambulance Dispatching with Machine Learning and Simulation -- 1 Introduction -- 2 Related Work -- 3 The Data Set: Historic Dispatch Decisions -- 3.1 Feature Engineering -- 4 Capturing the Dispatch Policy with a Decision Tree -- 4.1 Performance Analysis of the Learned Decision Tree and Policy
Intro -- Preface -- Organization -- Contents - Part IV -- Anomaly Detection and Malware -- Anomaly Detection: How to Artificially Increase Your F1-Score with a Biased Evaluation Protocol -- 1 Introduction -- 2 Related Work -- 3 Issues When Using F1-Score and AVPR Metrics -- 3.1 Formalism and Problem Statement -- 3.2 Definition of the Metrics -- 3.3 Evaluation Protocols: Theory vs Practice -- 3.4 Metrics Sensitivity to the Contamination Rate of the Test Set -- 3.5 How to Artificially Increase Your F1-Score and AVPR -- 3.6 F1-Score Cannot Compare Datasets Difficulty -- 4 Call for Action -- 4.1 Use AUC -- 4.2 Do Not Waste Anomalous Samples -- 5 Conclusion -- References -- Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data -- 1 Introduction -- 2 Related Work -- 3 Definitions and Notation -- 4 System Architecture -- 4.1 Subspace Searching Module -- 4.2 Discord Mining Module -- 4.3 Discussion -- 5 Evaluation -- 5.1 Alternative Approaches -- 5.2 Synthetic Data -- 5.3 Real-World Transactional Data -- 6 Conclusion -- References -- AIMED-RL: Exploring Adversarial Malware Examples with Reinforcement Learning -- 1 Introduction -- 2 Related Work -- 2.1 Reinforcement Learning -- 2.2 Further Approaches -- 3 AIMED-RL -- 3.1 Framework and Notation -- 3.2 Experimental Setting -- 3.3 Environment -- 4 Experimental Results -- 4.1 Diversity of Perturbations -- 4.2 Evasion Rate -- 5 Availability -- 6 Conclusion -- References -- Learning Explainable Representations of Malware Behavior -- 1 Introduction -- 2 Related Work -- 3 Problem Setting and Operating Environment -- 3.1 Network Events -- 3.2 Identification of Threats -- 3.3 Data Collection and Quantitative Analysis -- 4 Models -- 4.1 Architectures -- 4.2 Unsupervised Pre-training -- 5 Experiments -- 5.1 Hyperparameter Optimization -- 5.2 Malware-Classification Performance
4.1 GemanSolarFarm and EuropeWindFarm Dataset -- 4.2 Evaluation Measures -- 4.3 MTL Experiment -- 4.4 Zero-Shot Learning Experiment -- 4.5 Inductive TL Experiment -- 5 Conclusion and Future Work -- References -- Generating Multi-type Temporal Sequences to Mitigate Class-Imbalanced Problem -- 1 Introduction -- 2 Related Work -- 2.1 GAN for Sequence Data -- 2.2 RL for GANs with Sequences of Discrete Tokens -- 2.3 Gumbel-Softmax Distribution for GANs with Sequences of Discrete Tokens -- 3 Methodology -- 3.1 Definitions -- 3.2 RL and Policy Improvement to Train GAN -- 3.3 An Approximation with Gumbel-Softmax Distribution -- 4 Data Experiments -- 4.1 Synthetic Dataset -- 4.2 Evaluation Metric -- 4.3 Experiment Setup -- 4.4 Experiment Results -- 5 Conclusions -- References -- Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network -- 1 Introduction -- 2 Related Work -- 2.1 Hand Pose and Gesture Representation -- 2.2 Hand Gesture Recognition -- 3 Problem Formulation -- 3.1 Definition -- 3.2 Embedding Representation for Skeletal Data -- 4 Our Model -- 4.1 Spatio-Temporal Feature Encoder -- 4.2 Attention Scorer -- 4.3 Network-Based Classifier -- 5 Experiments -- 5.1 Datasets and Preprocessing -- 5.2 Experimental Set-Ups and Baselines -- 5.3 Comparison Results on Publicly-Available Datasets -- 5.4 Comparisons Results on TaiChi2021 -- 5.5 Ablation Study -- 6 Conclusion -- References -- E-commerce and Finance -- Smurf-Based Anti-money Laundering in Time-Evolving Transaction Networks -- 1 Introduction -- 2 Related Work -- 3 Dataset Description -- 4 Extraction of Smurf-Like Motifs from Transaction Graph -- 4.1 Proposed Pipeline -- 4.2 Results -- 5 Conclusion -- References -- Spatio-Temporal Multi-graph Networks for Demand Forecasting in Online Marketplaces -- 1 Introduction -- 2 Prior Work -- 3 Proposed Method -- 3.1 Problem Formulation
3.2 Graph Construction -- 3.3 Graph Neural Networks -- 3.4 Sequential Model -- 4 Experimental Results -- 4.1 Implementation Details -- 4.2 Comparison with Baseline -- 4.3 Demand Forecasting for Multi-seller Products and Cold Start Offers -- 5 Conclusion -- References -- The Limit Order Book Recreation Model (LOBRM): An Extended Analysis -- 1 Introduction -- 2 Background and Related Work -- 2.1 The Limit Order Book (LOB) -- 2.2 Generating Synthetic LOB Data -- 3 Model Formulation -- 3.1 Motivation -- 3.2 Problem Description -- 3.3 Formalized Workflow of LOBRM -- 4 Experiment and Empirical Analysis -- 4.1 Data Preprocessing -- 4.2 Model Comparison -- 4.3 Ablation Study -- 4.4 Superiority of Sparse Encoding for TAQ -- 4.5 Is the Model Well-Trained? -- 5 Conclusion -- References -- Taking over the Stock Market: Adversarial Perturbations Against Algorithmic Traders -- 1 Introduction -- 2 Background -- 2.1 Algorithmic Trading -- 2.2 Adversarial Learning -- 3 Problem Description -- 3.1 Trading Setup -- 3.2 Threat Model -- 4 Proposed Attack -- 5 Evaluation Setup -- 5.1 Dataset -- 5.2 Feature Extraction -- 5.3 Models -- 5.4 Evaluation -- 6 White-Box Attack -- 7 Black-Box Attack -- 8 Mitigation -- 9 Conclusions -- References -- Continuous-Action Reinforcement Learning for Portfolio Allocation of a Life Insurance Company -- 1 Introduction -- 2 Problem Definition -- 2.1 Formalization -- 2.2 Implementation Details -- 2.3 Optimization Problem -- 3 Solution -- 3.1 Structural and Parametric Constraints -- 4 Experimental Evaluation -- 4.1 Three Assets Scenario. -- 4.2 Six Assets Scenario -- 5 Related Work -- 6 Conclusions -- References -- XRR: Explainable Risk Ranking for Financial Reports -- 1 Introduction -- 2 Methodology -- 2.1 Definitions and Problem Formulation -- 2.2 Post-event Return Volatility -- 2.3 Multilevel Explanation Structure
5.3 Indicators of Compromise -- 6 Conclusion -- References -- Strategic Mitigation Against Wireless Attacks on Autonomous Platoons -- 1 Introduction -- 1.1 Related Work -- 2 Message Falsification Attacks Against Platoons -- 2.1 Vehicular Platoon Control Policy -- 2.2 Attack Model -- 2.3 Attack Detection Algorithm -- 3 Security Game-Based Mitigation Framework -- 3.1 Numerical Example -- 4 Simulation Setup -- 5 Simulation Results and Discussion -- 5.1 Realistic Driving Scenario -- 6 Conclusion -- References -- DeFraudNet: An End-to-End Weak Supervision Framework to Detect Fraud in Online Food Delivery -- 1 Introduction -- 2 Related Work -- 3 The Framework: DeFraudNet -- 3.1 Problem Definition -- 3.2 Fraud Detection Pipeline -- 4 Data and Feature Processing -- 4.1 Dataset -- 4.2 Feature Engineering -- 5 Label Generation -- 5.1 Generating Noisy Labels Using LFs -- 5.2 Snorkel Generative Model -- 5.3 Class-Specific Autoencoders for Denoising -- 6 Discriminator Models -- 6.1 Multi Layer Perceptron -- 6.2 LSTM Sequence Model -- 7 Deployment and Serving Infrastructure -- 8 Ablation Experiments -- 8.1 Setup and Baseline -- 8.2 Experiments -- 9 Conclusion -- References -- Spatio-Temporal Data -- Time Series Forecasting with Gaussian Processes Needs Priors -- 1 Introduction -- 2 Gaussian Processes -- 2.1 Kernel Compositions -- 2.2 The Composition -- 2.3 Training Strategy -- 2.4 MAP Estimation -- 2.5 Forecasting -- 3 Experiments -- 4 Dealing with Multiple Seasonalities -- 5 Code and Replicability -- 6 Conclusions -- References -- Task Embedding Temporal Convolution Networks for Transfer Learning Problems in Renewable Power Time Series Forecast -- 1 Introduction -- 2 Related Work -- 3 Proposed Method -- 3.1 Definition of MTL, TL, and Zero-Shot Learning -- 3.2 Proposed Method -- 4 Experimental Evaluation of the Task-Temporal Convolution Network
4.2 The Penalty-Based Closest-Idle Policy