Statistics for Data Scientists
Data Scientists need to study statistics as well as computer science to be effective in their analytical journey. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential s...
Saved in:
| Published in | SAS for R Users pp. 159 - 182 |
|---|---|
| Main Author | |
| Format | Book Chapter |
| Language | English |
| Published |
United States
John Wiley & Sons, Incorporated
2019
John Wiley & Sons, Inc |
| Subjects | |
| Online Access | Get full text |
| ISBN | 1119256410 9781119256410 |
| DOI | 10.1002/9781119256441.ch11 |
Cover
| Summary: | Data Scientists need to study statistics as well as computer science to be effective in their analytical journey. Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g. observational errors, sampling variation). A frequency distribution is an overview of all distinct values in some variable and the number of times they occur. That is, a frequency distribution tells how frequencies are distributed over values. Unlike descriptive analytics which describe data of the past predictive analytics deals with forecasts for the future. An algorithm is a set of rules to be followed in calculations of other problem‐solving operations, especially by a computer. SAS has many data mining algorithms in SAS Enterprise Miner. R has a caret package that has many such modeling functions. |
|---|---|
| ISBN: | 1119256410 9781119256410 |
| DOI: | 10.1002/9781119256441.ch11 |