Guide to Intelligent Data Science How to Intelligently Make Use of Real Data

Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of...

Full description

Saved in:
Bibliographic Details
Main Authors Berthold, Michael R, Borgelt, Christian, Höppner, Frank, Klawonn, Frank, Silipo, Rosaria
Format eBook Book
LanguageEnglish
Published Cham Springer Nature 2020
Springer
Springer International Publishing AG
Springer International Publishing
Edition2
SeriesTexts in Computer Science
Subjects
Online AccessGet full text
ISBN9783030455743
3030455742
9783030455736
3030455734
ISSN1868-0941
1868-095X
DOI10.1007/978-3-030-45574-3

Cover

Abstract Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of: human intuition in combination with computational power; sound background knowledge with computer-aided modelling; and critical reflection of the obtained insights and results.Substantially updating the previous edition, then entitled Guide to Intelligent Data Analysis, this core textbook continues to provide a hands-on instructional approach to many data science techniques, and explains how these are used to solve real world problems. The work balances the practical aspects of applying and using data science techniques with the theoretical and algorithmic underpinnings from mathematics and statistics. Major updates on techniques and subject coverage (including deep learning) are included.Topics and features: guides the reader through the process of data science, following the interdependent steps of project understanding, data understanding, data blending and transformation, modeling, as well as deployment and monitoring; includes numerous examples using the open source KNIME Analytics Platform, together with an introductory appendix; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; integrates illustrations and case-study-style examples to support pedagogical exposition; supplies further tools and information at an associated website.This practical and systematic textbook/reference is a "need-to-have" tool for graduate and advanced undergraduate students and essential reading for all professionals who face data science problems. Moreover, it is a "need to use, need to keep" resource following one's exploration of the subject.
AbstractList Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of: human intuition in combination with computational power; sound background knowledge with computer-aided modelling; and critical reflection of the obtained insights and results.Substantially updating the previous edition, then entitled Guide to Intelligent Data Analysis, this core textbook continues to provide a hands-on instructional approach to many data science techniques, and explains how these are used to solve real world problems. The work balances the practical aspects of applying and using data science techniques with the theoretical and algorithmic underpinnings from mathematics and statistics. Major updates on techniques and subject coverage (including deep learning) are included.Topics and features: guides the reader through the process of data science, following the interdependent steps of project understanding, data understanding, data blending and transformation, modeling, as well as deployment and monitoring; includes numerous examples using the open source KNIME Analytics Platform, together with an introductory appendix; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; integrates illustrations and case-study-style examples to support pedagogical exposition; supplies further tools and information at an associated website.This practical and systematic textbook/reference is a "need-to-have" tool for graduate and advanced undergraduate students and essential reading for all professionals who face data science problems. Moreover, it is a "need to use, need to keep" resource following one's exploration of the subject.
Author Höppner, Frank
Silipo, Rosaria
Klawonn, Frank
Borgelt, Christian
Berthold, Michael R
Author_xml – sequence: 1
  fullname: Berthold, Michael R
– sequence: 2
  fullname: Borgelt, Christian
– sequence: 3
  fullname: Höppner, Frank
– sequence: 4
  fullname: Klawonn, Frank
– sequence: 5
  fullname: Silipo, Rosaria
BackLink https://cir.nii.ac.jp/crid/1130858596795074717$$DView record in CiNii
BookMark eNqF0E1PwzAMBuDwKRjsB3AbCAk4FOw4adIjjK9JSBxAiFuUpdkolHY0Bf4-2YqQ4MIlkV4_duT02GpVV56xHYRjBFAnmdIJJUCQCCmVSGiJ9WNGMVkEtMw2Uac6gUw-rvyprf7UBK6zHqKmTCoNaoP1Q3gGAC44oaRNtnv1XuR-0NaDUdX6siymvmoH57a1gztX-Mr5bbY2sWXw_e97iz1cXtwPr5Ob26vR8PQmsZjGx5OJnVgUgM4pwJxrq2isbO64QyHGMlOYcQ6eI5fEvfd5nvMUMuUQOSnhaYsddYNtePGf4aku22A-Sj-u65dgfi0Y7Ulnw6wpqqlvTKcQzPz75tqQid4sGsy846DrmDX127sPrVkMdnHdxpbm4myYci2EwCgP_5EEKk01pZHud7QqCuOK-YlIoKWWWaoyCUooVJHtdczZYMvIzGtd1dPGzp6CkSS5AEFfw2CJig
ContentType eBook
Book
Copyright Springer Nature Switzerland AG 2020
Copyright_xml – notice: Springer Nature Switzerland AG 2020
DBID I4C
RYH
DEWEY 004
DOI 10.1007/978-3-030-45574-3
DatabaseName Casalini Torrossa eBooks Institutional Catalogue
CiNii Complete
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783030455743
3030455742
EISSN 1868-095X
Edition 2
2nd ed. 2020
2nd Edition 2020
Editor Höppner, Frank
Silipo, Rosaria
Klawonn, Frank
Borgelt, Christian
Editor_xml – sequence: 1
  fullname: Borgelt, Christian
– sequence: 2
  fullname: Höppner, Frank
– sequence: 3
  fullname: Klawonn, Frank
– sequence: 4
  fullname: Silipo, Rosaria
ExternalDocumentID 9783030455743
159964
EBC6284441
EBC30766836
BD00498359
5352404
GroupedDBID 38.
AABBV
ACGCR
AEHEY
AEJLV
AEJNW
AEKFX
AIYYB
ALMA_UNASSIGNED_HOLDINGS
AVCSZ
AZTDL
BBABE
CYNQG
CZZ
DACMV
ESBCR
I4C
IEZ
OAOFD
OPOMJ
SBO
TPJZQ
Z5O
Z7R
Z7U
Z7W
Z7X
Z7Z
Z81
Z83
Z84
Z85
Z87
Z88
AAJYQ
AATVQ
ABBUY
ABCYT
ACDTA
ACDUY
AHNNE
ATJMZ
RYH
ABZKH
Z7Y
ID FETCH-LOGICAL-a16783-fafa1401cc701d28a73b7adc2c144b59719220e212532eeeddd26097c112374e3
ISBN 9783030455743
3030455742
9783030455736
3030455734
ISSN 1868-0941
IngestDate Fri Nov 08 04:32:11 EST 2024
Tue Jul 29 20:38:53 EDT 2025
Fri May 30 23:00:50 EDT 2025
Fri May 30 21:57:31 EDT 2025
Thu Jun 26 22:08:35 EDT 2025
Tue Sep 09 06:57:10 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCallNum_Ident QA76.9.D343
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a16783-fafa1401cc701d28a73b7adc2c144b59719220e212532eeeddd26097c112374e3
Notes Includes bibliographical references (p. 409) and index
OCLC 1183957807
PQID EBC30766836
PageCount 427
ParticipantIDs askewsholts_vlebooks_9783030455743
springer_books_10_1007_978_3_030_45574_3
proquest_ebookcentral_EBC6284441
proquest_ebookcentral_EBC30766836
nii_cinii_1130858596795074717
casalini_monographs_5352404
PublicationCentury 2000
PublicationDate 2020
c2020
20200807
2020-08-06
PublicationDateYYYYMMDD 2020-01-01
2020-08-07
2020-08-06
PublicationDate_xml – year: 2020
  text: 2020
PublicationDecade 2020
PublicationPlace Cham
PublicationPlace_xml – name: Netherlands
– name: Cham
PublicationSeriesTitle Texts in Computer Science
PublicationSeriesTitleAlternate Texts in Computer Science (formerly: Graduate Texts Comp. Sc.)
PublicationYear 2020
Publisher Springer Nature
Springer
Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer Nature
– name: Springer
– name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hazzan, Orit
Gries, David
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Gries
  fullname: Gries, David
– sequence: 2
  givenname: Orit
  surname: Hazzan
  fullname: Hazzan, Orit
SSID ssj0002423153
ssj0000615341
Score 2.1823115
Snippet Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it...
SourceID askewsholts
springer
proquest
nii
casalini
SourceType Aggregation Database
Publisher
SubjectTerms Artificial intelligence
Big Data/Analytics
Computer Science
Data mining
Data Mining and Knowledge Discovery
Data processing Computer science
Machine Learning
Mathematical statistics
Mathematical statistics -- Data processing
Subtitle How to Intelligently Make Use of Real Data
TableOfContents Intro -- Guide to Intelligent Data Science -- Preface -- Contents -- Symbols -- 1 Introduction -- 1.1 Motivation -- 1.1.1 Data and Knowledge -- 1.1.2 Tycho Brahe and Johannes Kepler -- 1.1.3 Intelligent Data Science -- 1.2 The Data Science Process -- 1.3 Methods, Tasks, and Tools -- 1.4 How to Read This Book -- References -- 2 Practical Data Science: An Example -- 2.1 The Setup -- 2.2 Data Understanding and Pattern Finding -- 2.3 Explanation Finding -- 2.4 Predicting the Future -- 2.5 Concluding Remarks -- 3 Project Understanding -- 3.1 Determine the Project Objective -- 3.2 Assess the Situation -- 3.3 Determine Analysis Goals -- 3.4 Further Reading -- References -- 4 Data Understanding -- 4.1 Attribute Understanding -- 4.2 Data Quality -- 4.3 Data Visualization -- 4.4 Correlation Analysis -- 4.5 Outlier Detection -- 4.5.1 Outlier Detection for Single Attributes -- 4.5.2 Outlier Detection for Multidimensional Data -- 4.6 Missing Values -- 4.7 A Checklist for Data Understanding -- 4.8 Data Understanding in Practice -- 4.8.1 Visualizing the Iris Data -- References -- 5 Principles of Modeling -- 5.1 Model Classes -- 5.2 Fitting Criteria and Score Functions -- 5.3 Algorithms for Model Fitting -- 5.3.1 Closed-Form Solutions -- 5.3.2 Gradient Method -- 5.4 Types of Errors -- 5.5 Model Validation -- 5.5.1 Training and Test Data -- 5.5.2 Cross-Validation -- 5.5.3 Bootstrapping -- 5.6 Model Errors and Validation in Practice -- 5.6.1 Scoring Models for Classification -- 5.7 Further Reading -- References -- 6 Data Preparation -- 6.1 Select Data -- 6.1.1 Feature Selection -- 6.2 Clean Data -- 6.2.1 Improve Data Quality -- 6.2.2 Missing Values -- 6.3 Construct Data -- 6.3.1 Provide Operability -- 6.4 Complex Data Types -- 6.5 Data Integration -- 6.5.1 Vertical Data Integration -- 6.5.2 Horizontal Data Integration -- 6.6 Data Preparation in Practice
10.3.1 Deployment to a Dashboard -- References -- A Statistics -- A.1 Terms and Notation -- A.2 Descriptive Statistics -- A.2.1 Tabular Representations -- A.3 Probability Theory -- A.3.1 Probability -- A.3.1.1 Intuitive Notions of Probability -- A.3.1.2 The Formal Definition of Probability -- A.3.2 Basic Methods and Theorems -- A.3.2.1 Combinatorial Methods -- A.3.2.2 Geometric Probabilities -- A.3.2.3 Conditional Probability and Independent Events -- A.3.2.4 Total Probability and Bayes' Rule -- A.3.2.5 Bernoulli's Law of Large Numbers -- A.3.3 Random Variables -- A.3.3.1 Real-Valued Random Variables -- A.3.3.2 Discrete Random Variables -- A.3.3.3 Continuous Random Variables -- A.3.3.4 Random Vectors -- A.4 Inferential Statistics -- A.4.1 Random Samples -- A.4.2 Parameter Estimation -- A.4.2.1 Point Estimation -- A.4.2.2 Point Estimation Examples -- A.4.2.3 Maximum Likelihood Estimation -- A.4.2.4 Maximum Likelihood Estimation Example -- A.4.2.5 Maximum A Posteriori Estimation -- A.4.2.6 Maximum A Posteriori Estimation Example -- A.4.2.7 Interval Estimation -- A.4.2.8 Interval Estimation Examples -- A.4.3 Hypothesis Testing -- A.4.3.1 Error Types and Significance Level -- A.4.3.2 Parameter Test -- A.4.3.3 Parameter Test Example -- A.4.3.4 Power of a Hypothesis Test -- A.4.3.5 Goodness-of-Fit Test -- A.4.3.6 Goodness-of-Fit Test Example -- A.4.3.7 (In)Dependence Test -- B KNIME -- B.1 Installation and Overview -- B.2 Building Workflows -- B.3 Example Workflow -- References -- Index
6.6.1 Removing Empty or Almost Empty Attributes and Records in a Data Set -- 6.7 Further Reading -- References -- 7 Finding Patterns -- 7.1 Hierarchical Clustering -- 7.2 Notion of (Dis-)Similarity -- 7.3 Prototype- and Model-Based Clustering -- 7.3.1 Overview -- 7.4 Density-Based Clustering -- 7.4.1 Overview -- 7.5 Self-organizing Maps -- 7.5.1 Overview -- 7.6 Frequent Pattern Mining and Association Rules -- 7.6.1 Overview -- 7.6.2 Construction -- 7.7 Deviation Analysis -- 7.7.1 Overview -- 7.7.2 Construction -- 7.8 Finding Patterns in Practice -- 7.8.1 Hierarchical Clustering -- 7.9 Further Reading -- References -- 8 Finding Explanations -- 8.1 Decision Trees -- 8.1.1 Overview -- 8.2 Bayes Classifiers -- 8.2.1 Overview -- 8.2.2 Construction -- 8.3 Regression -- 8.3.1 Overview -- 8.4 Rule learning -- 8.4.1 Propositional Rules -- 8.4.1.1 Extracting Rules from Decision Trees -- 8.4.1.2 Extracting Propositional Rules -- 8.5 Finding Explanations in Practice -- 8.5.1 Decision Trees -- 8.6 Further Reading -- References -- 9 Finding Predictors -- 9.1 Nearest-Neighbor Predictors -- 9.1.1 Overview -- 9.2 Artificial Neural Networks -- 9.2.1 Overview -- 9.3 Deep Learning -- 9.3.1 Recurrent Neural Networks and Long-Short Term Memory Units -- 9.4 Support Vector Machines -- 9.5 Ensemble Methods -- 9.5.1 Overview -- 9.5.2 Construction -- 9.5.3 Variations and Issues -- 9.5.3.1 Tree Ensembles and Random Forests (Bagging) -- 9.6 Finding Predictors in Practice -- 9.6.1 k Nearest Neighbor (kNN) -- 9.7 Further Reading -- References -- 10 Deployment and Model Management -- 10.1 Model Deployment -- 10.1.1 Interactive Applications -- 10.1.2 Model Scoring as a Service -- 10.1.3 Model Representation Standards -- 10.1.4 Frequent Causes for Deployment Failures -- 10.2 Model Management -- 10.2.1 Model Updating and Retraining -- 10.3 Model Deployment and Management in Practice
Title Guide to Intelligent Data Science
URI http://digital.casalini.it/9783030455743
https://cir.nii.ac.jp/crid/1130858596795074717
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=30766836
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6284441
http://link.springer.com/10.1007/978-3-030-45574-3
https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9783030455743
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELZYeWEv_BZhDBnEA1KVqYmTOOGNjaJpgj2NaW-WnbgiWmmlJkXa_vp9lzhpWkAIXqI2vTrOfY7z3dl3x9i71ISY96z2dYSHPNIm9LXMU98UIjZRIfAjxTt_PU9Ov0VnV_HVpipmE11Sm6P89rdxJf-DKs4BV4qS_Qdk-0ZxAp-BL45AGMcd8tt_dZn612XR1Lwo-4ya9Zg2e467MB2y9L9TubgtmfnN-Ie-tuN168FfUVZhF53monVW5CZtCw0fEft02-o7h4HzD4STHf9A5x_cshtFs0Aat7v-f51FhxsnIOmTaOSLzSuj38h3_IlMDLC4bI_tSZmO2P2P07Mvl72bi-gaJlWKqumu2aZjHPShW2x2-X63rrnP9nV1jfkeN19XRB50pSlmFFxgUZZbdsHOUnbDEC4esRFFjTxm9-ziCXvY1crgbup8ys4byHi95AM4OOmeO8j4Bw7AdiTmN5wA4wCML2ecAGv-9Ixdfp5enJz6rpaFrwPwAeHP9EyTMZvnEsM_TLUURuoiD3OYtAZmHah2OLFgErEILZhLUcDUzGQOQixkZMVzNlosF_YF4zPQ1CKwuSD-Z2yhkwCSKVFzmZk089jbgc7Uz3mz7l6pgdIj4bGDTpUKj0WbH71SlO0nmkQeO4R2VV7SMQDPoeXjDM3HVHAhkB570-ldNa27rcRqenyCt0aSpCLxGP-jTAJWBCLusfcdZqrtY5dFG31VQqG3qumuEi__0qMD9mAz-l-xUb1a20Pwxdq8dmPyDtB0Wsg
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Guide+to+intelligent+data+science+%3A+how+to+intelligently+make+use+of+real+data&rft.au=Berthold%2C+M.+%28Michael%29&rft.date=2020-01-01&rft.pub=Springer&rft.isbn=9783030455736&rft_id=info:doi/10.1007%2F978-3-030-45574-3&rft.externalDocID=BD00498359
thumbnail_m http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97830304%2F9783030455743.jpg
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-3-030-45574-3