Statistics for Data Scientists

Data Scientists need to study statistics as well as computer science to be effective in their analytical journey. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential s...

Full description

Saved in:
Bibliographic Details
Published inSAS for R Users pp. 159 - 182
Main Author Ohri, Ajay
Format Book Chapter
LanguageEnglish
Published United States John Wiley & Sons, Incorporated 2019
John Wiley & Sons, Inc
Subjects
Online AccessGet full text
ISBN1119256410
9781119256410
DOI10.1002/9781119256441.ch11

Cover

More Information
Summary:Data Scientists need to study statistics as well as computer science to be effective in their analytical journey. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g. observational errors, sampling variation). A frequency distribution is an overview of all distinct values in some variable and the number of times they occur. That is, a frequency distribution tells how frequencies are distributed over values. Unlike descriptive analytics which describe data of the past predictive analytics deals with forecasts for the future. An algorithm is a set of rules to be followed in calculations of other problem‐solving operations, especially by a computer. SAS has many data mining algorithms in SAS Enterprise Miner. R has a caret package that has many such modeling functions.
ISBN:1119256410
9781119256410
DOI:10.1002/9781119256441.ch11