Data Science and Big Data

Saved in:
Bibliographic Details
Main Authors Pedrycz, Witold, Chen, Shyi-Ming
Format eBook
LanguageEnglish
Published Cham Springer International Publishing AG 2017
Edition1
Subjects
Online AccessGet full text
ISBN9783319534732
3319534734

Cover

Table of Contents:
  • 4.4 Methods of Data Computations -- 5 Conclusion -- References -- Online Anomaly Detection in Big Data: The First Line of Defense Against Intruders -- 1 Introduction -- 2 Data Abstraction Methods -- 3 Burst Learning -- 3.1 Estimating the Number of PCs -- 3.2 Anomaly Detection Using Hotelling's Statistic -- 3.3 PCA in ``Big Data'' Using NIPALS -- 3.4 Model Parameter Learning -- 4 Online Anomaly Detection -- 4.1 Batch Detection -- 4.2 Page's Test -- 4.3 Shiryaev's Test -- 4.4 Tagging Multiple Anomalies -- 5 Simulation Results -- 6 Conclusions -- References -- Developing Modified Classifier for Big Data Paradigm: An Approach Through Bio-Inspired Soft Computing -- 1 Introduction -- 1.1 Motivation and Background -- 2 Similar Works -- 3 Proposed Model -- 4 Discussion -- 5 Conclusion -- References -- 6 Unified Framework for Control of Machine Learning Tasks Towards Effective and Efficient Processing of Big Data -- Abstract -- 1 Introduction -- 2 Fundamentals of Machine Learning -- 3 Framework for Control of Machine Learning Tasks -- 3.1 Key Features -- 3.2 Justification -- 4 Experimental Studies -- 4.1 Measure of Learnability -- 4.2 Measure of Data Variability -- 5 Conclusion -- References -- 7 An Efficient Approach for Mining High Utility Itemsets Over Data Streams -- Abstract -- 1 Introduction -- 2 Related Work -- 3 Mining High Utility Itemsets in a Data Stream -- 3.1 The Algorithm HUIStream+ -- 3.2 The Algorithm HUIStream− -- 3.3 High Utility Itemset Generation -- 4 Experimental Results -- 5 Conclusion -- References -- Event Detection in Location-Based Social Networks -- 1 Introduction -- 2 Problem Definition -- 3 Background -- 3.1 DBSCAN -- 3.2 Mixture Models -- 4 Event Detection Techniques -- 4.1 Tweet-SCAN: A Data Mining Approach -- 4.2 Warble: A Machine Learning Approach -- 5 Experimental Setup and Results
  • 5.1 ``La Mercé'': A Dataset for Local Event Detection -- 5.2 Detection Performance Metrics -- 5.3 Assessment -- 6 Conclusions and Future Work -- 6.1 Conclusions -- 6.2 Future Work -- References -- Applications -- 9 Using Computational Intelligence for the Safety Assessment of Oil and Gas Pipelines: A Survey -- Abstract -- 1 Introduction -- 2 Safety Assessment in Oil and Gas Pipelines -- 2.1 Big Data Processing -- 2.2 Defect Detection -- 2.3 Determination of Defect Size -- 2.4 Assessment of Defect Severity -- 2.5 Repair Management -- 3 Computational Intelligence -- 3.1 Data Mining -- 3.1.1 K-Nearest Neighbor (KNN) -- 3.1.2 Support Vector Machine (SVM) -- 3.2 Artificial Neural Networks -- 3.3 Hybrid Neuro-Fuzzy Systems -- 4 Pipeline Safety Assessment Using Intelligent Techniques -- 4.1 Data Mining-Based Techniques -- 4.2 Neural Network-Based Techniques -- 4.3 Hybrid Neuro-Fuzzy Systems-Based Techniques -- 5 Conclusion -- References -- Big Data for Effective Management of Smart Grids -- 1 Introduction -- 2 Smart Grids and Smart Micro-Grids -- 3 Big Data Properties of Smart Grid -- 4 Research Lines and Contribution -- 4.1 Interoperability and Standardization -- 4.2 Big Data Storages -- 4.3 Big Data Analytic -- 4.4 Research Projects Networked with Companies -- 5 Conclusion -- References -- Distributed Machine Learning on Smart-Gateway Network Towards Real-Time Indoor Data Analytics -- 1 Introduction -- 1.1 Computational Intelligence -- 1.2 Distributed Machine Learning -- 1.3 Indoor Positioning -- 1.4 Network Intrusion Detection -- 1.5 Chapter Organizations -- 2 Distributed Data Analytics Platform on Smart Gateways -- 2.1 Smart Home Management System -- 2.2 Distributed Computation Platform -- 3 Distributed Machine Learning Based Indoor Positioning Data Analytics -- 3.1 Problem Formulation -- 3.2 Indoor Positioning by Distributed SVM
  • 3.3 Indoor Positioning by Distributed-neural-network -- 4 Distributed Machine Learning Based Network Intrusion Detection System -- 4.1 Problem Formulation and Analysis -- 4.2 Experimental Results -- 5 Conclusion -- References -- 12 Predicting Spatiotemporal Impacts of Weather on Power Systems Using Big Data Science -- Abstract -- 1 Introduction -- 2 Background -- 2.1 Power System Operation, Generation, Outage and Asset Management -- 2.2 Weather Data Parameters and Sources -- 2.3 Spatio-Temporal Correlation of Data -- 3 Weather Impact on Power System -- 3.1 Weather Impact on Outages -- 3.2 Renewable Generation -- 4 Predictive Data Analytics -- 4.1 Regression -- 4.1.1 Unstructured Regression -- 4.1.2 Structured Regression (Probabilistic Graphical Models) -- Probabilistic Graphical Models -- Conditional Random Fields -- 4.2 Gaussian Conditional Random Fields (GCRF) -- 4.2.1 Continuous Conditional Random Fields Model -- 4.2.2 Association and Interaction Potentials in the GCRF Model -- 4.2.3 Gaussian Canonical Form -- 4.2.4 Learning and Inference -- 4.2.5 GCRF Extensions -- 5 Applications and Results -- 5.1 Insulation Coordination -- 5.1.1 Introduction -- 5.1.2 Modeling -- Risk Based Insulation Coordination -- Lightning Hazard -- Prediction of Vulnerability -- Economic Impact -- 5.1.3 Test Setup and Results -- 5.2 Solar Generation Forecast -- 5.2.1 Introduction -- 5.2.2 Modeling -- Solar Generation Versus Solar Irradiance -- Temporal Correlation Modeling -- Spatial Correlation Modeling -- 5.2.3 Test Setup and Results -- 6 Conclusions -- References -- Index
  • Intro -- Preface -- Contents -- Fundamentals -- Large-Scale Clustering Algorithms -- 1 Introduction -- 2 Notation -- 3 Standard Clustering Approaches -- 3.1 Spectral Clustering -- 3.2 K-Means -- 4 Fixed-Size Kernel Spectral Clustering (FSKSC) -- 4.1 Related Work -- 4.2 KSC Overview -- 4.3 Fixed-Size KSC Approach -- 4.4 Computational Complexity -- 5 Regularized Stochastic K-Means (RSKM) -- 5.1 Related Work -- 5.2 Generalities -- 5.3 l2-Regularization -- 5.4 l1-Regularization -- 5.5 Influence of Outliers -- 5.6 Theoretical Guarantees -- 6 Experiments -- 7 Conclusions -- References -- On High Dimensional Searching Spaces and Learning Methods -- 1 Introduction -- 1.1 Classification and Clustering -- 2 Membership Function -- 2.1 Challenges on Learning Methods -- 2.2 Bounded Fuzzy Possibilistic Method (BFPM) -- 2.3 Numerical Example -- 3 Similarity Functions -- 3.1 Challenges on Similarity Functions -- 3.2 Weighted Feature Distances -- 4 Data Types -- 4.1 Data Objects Taxonomies -- 4.2 Complex and Advanced Objects -- 4.3 Outlier and Outstanding Objects -- 5 Experimental Results -- 6 Conclusion -- References -- 3 Enhanced Over_Sampling Techniques for Imbalanced Big Data Set Classification -- Abstract -- 1 Introduction -- 1.1 Basics of Data Mining -- 1.2 Classification -- 1.3 Clustering -- 2 Mapreduce Framework and Classification of Imbalanced Data Sets -- 2.1 MapReduce Framework -- 2.2 Classification of Imbalanced Datasets -- 3 Methodology -- 3.1 Architecture -- 3.1.1 Input Pre-processing and Similarity Based Parallel Clustering of Streaming Data -- 3.1.2 Enhanced Over_Sampling Techniques for Imbalanced Dataset -- 3.2 Sampling Design and Assumptions -- 3.3 Evaluation Parameters -- 4 Conceptual Framework -- 4.1 Pre-processing and Efficient Parallel Clustering Architecture -- 4.2 Conceptual Flow of Experimentation -- 4.3 Data Sets