Guide to Intelligent Data Science How to Intelligently Make Use of Real Data
Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of...
Saved in:
Main Authors | , , , , |
---|---|
Format | eBook Book |
Language | English |
Published |
Cham
Springer Nature
2020
Springer Springer International Publishing AG Springer International Publishing |
Edition | 2 |
Series | Texts in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783030455743 3030455742 9783030455736 3030455734 |
ISSN | 1868-0941 1868-095X |
DOI | 10.1007/978-3-030-45574-3 |
Cover
Abstract | Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of: human intuition in combination with computational power; sound background knowledge with computer-aided modelling; and critical reflection of the obtained insights and results.Substantially updating the previous edition, then entitled Guide to Intelligent Data Analysis, this core textbook continues to provide a hands-on instructional approach to many data science techniques, and explains how these are used to solve real world problems. The work balances the practical aspects of applying and using data science techniques with the theoretical and algorithmic underpinnings from mathematics and statistics. Major updates on techniques and subject coverage (including deep learning) are included.Topics and features: guides the reader through the process of data science, following the interdependent steps of project understanding, data understanding, data blending and transformation, modeling, as well as deployment and monitoring; includes numerous examples using the open source KNIME Analytics Platform, together with an introductory appendix; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; integrates illustrations and case-study-style examples to support pedagogical exposition; supplies further tools and information at an associated website.This practical and systematic textbook/reference is a "need-to-have" tool for graduate and advanced undergraduate students and essential reading for all professionals who face data science problems. Moreover, it is a "need to use, need to keep" resource following one's exploration of the subject. |
---|---|
AbstractList | Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it seems at least in principle possible to solve any problem. However, successful data science projects result from the intelligent application of: human intuition in combination with computational power; sound background knowledge with computer-aided modelling; and critical reflection of the obtained insights and results.Substantially updating the previous edition, then entitled Guide to Intelligent Data Analysis, this core textbook continues to provide a hands-on instructional approach to many data science techniques, and explains how these are used to solve real world problems. The work balances the practical aspects of applying and using data science techniques with the theoretical and algorithmic underpinnings from mathematics and statistics. Major updates on techniques and subject coverage (including deep learning) are included.Topics and features: guides the reader through the process of data science, following the interdependent steps of project understanding, data understanding, data blending and transformation, modeling, as well as deployment and monitoring; includes numerous examples using the open source KNIME Analytics Platform, together with an introductory appendix; provides a review of the basics of classical statistics that support and justify many data analysis methods, and a glossary of statistical terms; integrates illustrations and case-study-style examples to support pedagogical exposition; supplies further tools and information at an associated website.This practical and systematic textbook/reference is a "need-to-have" tool for graduate and advanced undergraduate students and essential reading for all professionals who face data science problems. Moreover, it is a "need to use, need to keep" resource following one's exploration of the subject. |
Author | Höppner, Frank Silipo, Rosaria Klawonn, Frank Borgelt, Christian Berthold, Michael R |
Author_xml | – sequence: 1 fullname: Berthold, Michael R – sequence: 2 fullname: Borgelt, Christian – sequence: 3 fullname: Höppner, Frank – sequence: 4 fullname: Klawonn, Frank – sequence: 5 fullname: Silipo, Rosaria |
BackLink | https://cir.nii.ac.jp/crid/1130858596795074717$$DView record in CiNii |
BookMark | eNqF0E1PwzAMBuDwKRjsB3AbCAk4FOw4adIjjK9JSBxAiFuUpdkolHY0Bf4-2YqQ4MIlkV4_duT02GpVV56xHYRjBFAnmdIJJUCQCCmVSGiJ9WNGMVkEtMw2Uac6gUw-rvyprf7UBK6zHqKmTCoNaoP1Q3gGAC44oaRNtnv1XuR-0NaDUdX6siymvmoH57a1gztX-Mr5bbY2sWXw_e97iz1cXtwPr5Ob26vR8PQmsZjGx5OJnVgUgM4pwJxrq2isbO64QyHGMlOYcQ6eI5fEvfd5nvMUMuUQOSnhaYsddYNtePGf4aku22A-Sj-u65dgfi0Y7Ulnw6wpqqlvTKcQzPz75tqQid4sGsy846DrmDX127sPrVkMdnHdxpbm4myYci2EwCgP_5EEKk01pZHud7QqCuOK-YlIoKWWWaoyCUooVJHtdczZYMvIzGtd1dPGzp6CkSS5AEFfw2CJig |
ContentType | eBook Book |
Copyright | Springer Nature Switzerland AG 2020 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2020 |
DBID | I4C RYH |
DEWEY | 004 |
DOI | 10.1007/978-3-030-45574-3 |
DatabaseName | Casalini Torrossa eBooks Institutional Catalogue CiNii Complete |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9783030455743 3030455742 |
EISSN | 1868-095X |
Edition | 2 2nd ed. 2020 2nd Edition 2020 |
Editor | Höppner, Frank Silipo, Rosaria Klawonn, Frank Borgelt, Christian |
Editor_xml | – sequence: 1 fullname: Borgelt, Christian – sequence: 2 fullname: Höppner, Frank – sequence: 3 fullname: Klawonn, Frank – sequence: 4 fullname: Silipo, Rosaria |
ExternalDocumentID | 9783030455743 159964 EBC6284441 EBC30766836 BD00498359 5352404 |
GroupedDBID | 38. AABBV ACGCR AEHEY AEJLV AEJNW AEKFX AIYYB ALMA_UNASSIGNED_HOLDINGS AVCSZ AZTDL BBABE CYNQG CZZ DACMV ESBCR I4C IEZ OAOFD OPOMJ SBO TPJZQ Z5O Z7R Z7U Z7W Z7X Z7Z Z81 Z83 Z84 Z85 Z87 Z88 AAJYQ AATVQ ABBUY ABCYT ACDTA ACDUY AHNNE ATJMZ RYH ABZKH Z7Y |
ID | FETCH-LOGICAL-a16783-fafa1401cc701d28a73b7adc2c144b59719220e212532eeeddd26097c112374e3 |
ISBN | 9783030455743 3030455742 9783030455736 3030455734 |
ISSN | 1868-0941 |
IngestDate | Fri Nov 08 04:32:11 EST 2024 Tue Jul 29 20:38:53 EDT 2025 Fri May 30 23:00:50 EDT 2025 Fri May 30 21:57:31 EDT 2025 Thu Jun 26 22:08:35 EDT 2025 Tue Sep 09 06:57:10 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
LCCallNum_Ident | QA76.9.D343 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-a16783-fafa1401cc701d28a73b7adc2c144b59719220e212532eeeddd26097c112374e3 |
Notes | Includes bibliographical references (p. 409) and index |
OCLC | 1183957807 |
PQID | EBC30766836 |
PageCount | 427 |
ParticipantIDs | askewsholts_vlebooks_9783030455743 springer_books_10_1007_978_3_030_45574_3 proquest_ebookcentral_EBC6284441 proquest_ebookcentral_EBC30766836 nii_cinii_1130858596795074717 casalini_monographs_5352404 |
PublicationCentury | 2000 |
PublicationDate | 2020 c2020 20200807 2020-08-06 |
PublicationDateYYYYMMDD | 2020-01-01 2020-08-07 2020-08-06 |
PublicationDate_xml | – year: 2020 text: 2020 |
PublicationDecade | 2020 |
PublicationPlace | Cham |
PublicationPlace_xml | – name: Netherlands – name: Cham |
PublicationSeriesTitle | Texts in Computer Science |
PublicationSeriesTitleAlternate | Texts in Computer Science (formerly: Graduate Texts Comp. Sc.) |
PublicationYear | 2020 |
Publisher | Springer Nature Springer Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer Nature – name: Springer – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Hazzan, Orit Gries, David |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Gries fullname: Gries, David – sequence: 2 givenname: Orit surname: Hazzan fullname: Hazzan, Orit |
SSID | ssj0002423153 ssj0000615341 |
Score | 2.1823115 |
Snippet | Making use of data is not anymore a niche project but central to almost every project. With access to massive compute resources and vast amounts of data, it... |
SourceID | askewsholts springer proquest nii casalini |
SourceType | Aggregation Database Publisher |
SubjectTerms | Artificial intelligence Big Data/Analytics Computer Science Data mining Data Mining and Knowledge Discovery Data processing Computer science Machine Learning Mathematical statistics Mathematical statistics -- Data processing |
Subtitle | How to Intelligently Make Use of Real Data |
TableOfContents | Intro -- Guide to Intelligent Data Science -- Preface -- Contents -- Symbols -- 1 Introduction -- 1.1 Motivation -- 1.1.1 Data and Knowledge -- 1.1.2 Tycho Brahe and Johannes Kepler -- 1.1.3 Intelligent Data Science -- 1.2 The Data Science Process -- 1.3 Methods, Tasks, and Tools -- 1.4 How to Read This Book -- References -- 2 Practical Data Science: An Example -- 2.1 The Setup -- 2.2 Data Understanding and Pattern Finding -- 2.3 Explanation Finding -- 2.4 Predicting the Future -- 2.5 Concluding Remarks -- 3 Project Understanding -- 3.1 Determine the Project Objective -- 3.2 Assess the Situation -- 3.3 Determine Analysis Goals -- 3.4 Further Reading -- References -- 4 Data Understanding -- 4.1 Attribute Understanding -- 4.2 Data Quality -- 4.3 Data Visualization -- 4.4 Correlation Analysis -- 4.5 Outlier Detection -- 4.5.1 Outlier Detection for Single Attributes -- 4.5.2 Outlier Detection for Multidimensional Data -- 4.6 Missing Values -- 4.7 A Checklist for Data Understanding -- 4.8 Data Understanding in Practice -- 4.8.1 Visualizing the Iris Data -- References -- 5 Principles of Modeling -- 5.1 Model Classes -- 5.2 Fitting Criteria and Score Functions -- 5.3 Algorithms for Model Fitting -- 5.3.1 Closed-Form Solutions -- 5.3.2 Gradient Method -- 5.4 Types of Errors -- 5.5 Model Validation -- 5.5.1 Training and Test Data -- 5.5.2 Cross-Validation -- 5.5.3 Bootstrapping -- 5.6 Model Errors and Validation in Practice -- 5.6.1 Scoring Models for Classification -- 5.7 Further Reading -- References -- 6 Data Preparation -- 6.1 Select Data -- 6.1.1 Feature Selection -- 6.2 Clean Data -- 6.2.1 Improve Data Quality -- 6.2.2 Missing Values -- 6.3 Construct Data -- 6.3.1 Provide Operability -- 6.4 Complex Data Types -- 6.5 Data Integration -- 6.5.1 Vertical Data Integration -- 6.5.2 Horizontal Data Integration -- 6.6 Data Preparation in Practice 10.3.1 Deployment to a Dashboard -- References -- A Statistics -- A.1 Terms and Notation -- A.2 Descriptive Statistics -- A.2.1 Tabular Representations -- A.3 Probability Theory -- A.3.1 Probability -- A.3.1.1 Intuitive Notions of Probability -- A.3.1.2 The Formal Definition of Probability -- A.3.2 Basic Methods and Theorems -- A.3.2.1 Combinatorial Methods -- A.3.2.2 Geometric Probabilities -- A.3.2.3 Conditional Probability and Independent Events -- A.3.2.4 Total Probability and Bayes' Rule -- A.3.2.5 Bernoulli's Law of Large Numbers -- A.3.3 Random Variables -- A.3.3.1 Real-Valued Random Variables -- A.3.3.2 Discrete Random Variables -- A.3.3.3 Continuous Random Variables -- A.3.3.4 Random Vectors -- A.4 Inferential Statistics -- A.4.1 Random Samples -- A.4.2 Parameter Estimation -- A.4.2.1 Point Estimation -- A.4.2.2 Point Estimation Examples -- A.4.2.3 Maximum Likelihood Estimation -- A.4.2.4 Maximum Likelihood Estimation Example -- A.4.2.5 Maximum A Posteriori Estimation -- A.4.2.6 Maximum A Posteriori Estimation Example -- A.4.2.7 Interval Estimation -- A.4.2.8 Interval Estimation Examples -- A.4.3 Hypothesis Testing -- A.4.3.1 Error Types and Significance Level -- A.4.3.2 Parameter Test -- A.4.3.3 Parameter Test Example -- A.4.3.4 Power of a Hypothesis Test -- A.4.3.5 Goodness-of-Fit Test -- A.4.3.6 Goodness-of-Fit Test Example -- A.4.3.7 (In)Dependence Test -- B KNIME -- B.1 Installation and Overview -- B.2 Building Workflows -- B.3 Example Workflow -- References -- Index 6.6.1 Removing Empty or Almost Empty Attributes and Records in a Data Set -- 6.7 Further Reading -- References -- 7 Finding Patterns -- 7.1 Hierarchical Clustering -- 7.2 Notion of (Dis-)Similarity -- 7.3 Prototype- and Model-Based Clustering -- 7.3.1 Overview -- 7.4 Density-Based Clustering -- 7.4.1 Overview -- 7.5 Self-organizing Maps -- 7.5.1 Overview -- 7.6 Frequent Pattern Mining and Association Rules -- 7.6.1 Overview -- 7.6.2 Construction -- 7.7 Deviation Analysis -- 7.7.1 Overview -- 7.7.2 Construction -- 7.8 Finding Patterns in Practice -- 7.8.1 Hierarchical Clustering -- 7.9 Further Reading -- References -- 8 Finding Explanations -- 8.1 Decision Trees -- 8.1.1 Overview -- 8.2 Bayes Classifiers -- 8.2.1 Overview -- 8.2.2 Construction -- 8.3 Regression -- 8.3.1 Overview -- 8.4 Rule learning -- 8.4.1 Propositional Rules -- 8.4.1.1 Extracting Rules from Decision Trees -- 8.4.1.2 Extracting Propositional Rules -- 8.5 Finding Explanations in Practice -- 8.5.1 Decision Trees -- 8.6 Further Reading -- References -- 9 Finding Predictors -- 9.1 Nearest-Neighbor Predictors -- 9.1.1 Overview -- 9.2 Artificial Neural Networks -- 9.2.1 Overview -- 9.3 Deep Learning -- 9.3.1 Recurrent Neural Networks and Long-Short Term Memory Units -- 9.4 Support Vector Machines -- 9.5 Ensemble Methods -- 9.5.1 Overview -- 9.5.2 Construction -- 9.5.3 Variations and Issues -- 9.5.3.1 Tree Ensembles and Random Forests (Bagging) -- 9.6 Finding Predictors in Practice -- 9.6.1 k Nearest Neighbor (kNN) -- 9.7 Further Reading -- References -- 10 Deployment and Model Management -- 10.1 Model Deployment -- 10.1.1 Interactive Applications -- 10.1.2 Model Scoring as a Service -- 10.1.3 Model Representation Standards -- 10.1.4 Frequent Causes for Deployment Failures -- 10.2 Model Management -- 10.2.1 Model Updating and Retraining -- 10.3 Model Deployment and Management in Practice |
Title | Guide to Intelligent Data Science |
URI | http://digital.casalini.it/9783030455743 https://cir.nii.ac.jp/crid/1130858596795074717 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=30766836 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=6284441 http://link.springer.com/10.1007/978-3-030-45574-3 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9783030455743 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELZYeWEv_BZhDBnEA1KVqYmTOOGNjaJpgj2NaW-WnbgiWmmlJkXa_vp9lzhpWkAIXqI2vTrOfY7z3dl3x9i71ISY96z2dYSHPNIm9LXMU98UIjZRIfAjxTt_PU9Ov0VnV_HVpipmE11Sm6P89rdxJf-DKs4BV4qS_Qdk-0ZxAp-BL45AGMcd8tt_dZn612XR1Lwo-4ya9Zg2e467MB2y9L9TubgtmfnN-Ie-tuN168FfUVZhF53monVW5CZtCw0fEft02-o7h4HzD4STHf9A5x_cshtFs0Aat7v-f51FhxsnIOmTaOSLzSuj38h3_IlMDLC4bI_tSZmO2P2P07Mvl72bi-gaJlWKqumu2aZjHPShW2x2-X63rrnP9nV1jfkeN19XRB50pSlmFFxgUZZbdsHOUnbDEC4esRFFjTxm9-ziCXvY1crgbup8ys4byHi95AM4OOmeO8j4Bw7AdiTmN5wA4wCML2ecAGv-9Ixdfp5enJz6rpaFrwPwAeHP9EyTMZvnEsM_TLUURuoiD3OYtAZmHah2OLFgErEILZhLUcDUzGQOQixkZMVzNlosF_YF4zPQ1CKwuSD-Z2yhkwCSKVFzmZk089jbgc7Uz3mz7l6pgdIj4bGDTpUKj0WbH71SlO0nmkQeO4R2VV7SMQDPoeXjDM3HVHAhkB570-ldNa27rcRqenyCt0aSpCLxGP-jTAJWBCLusfcdZqrtY5dFG31VQqG3qumuEi__0qMD9mAz-l-xUb1a20Pwxdq8dmPyDtB0Wsg |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Guide+to+intelligent+data+science+%3A+how+to+intelligently+make+use+of+real+data&rft.au=Berthold%2C+M.+%28Michael%29&rft.date=2020-01-01&rft.pub=Springer&rft.isbn=9783030455736&rft_id=info:doi/10.1007%2F978-3-030-45574-3&rft.externalDocID=BD00498359 |
thumbnail_m | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97830304%2F9783030455743.jpg |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-3-030-45574-3 |