An introduction to Random Forests for modeling tasks - theory and practice in python with public health case studies

RFs are a powerful, volatile, and easy to use method, making it an excellent benchmark method for different types of analysis, notably in the public health domain. RFs offer outstanding performance with minimal effort from the user (given its low sensitivity to its hyperparameters) and can be used f...

Full description

Saved in:

Bibliographic Details
Published in	European journal of public health Vol. 31; no. Supplement_3
Main Authors	Assouline, D, Le Pogam, M-A, Pittet, V
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 20.10.2021
Subjects	Algorithms Classification Clustering Data analysis Decision trees Grants Handles Languages Mathematical models Nominal measurement Outliers (statistics) Public health Statistical analysis Statistical models
Online Access	Get full text
ISSN	1101-1262 1464-360X 1464-360X
DOI	10.1093/eurpub/ckab164.570

Cover

More Information
Summary:	RFs are a powerful, volatile, and easy to use method, making it an excellent benchmark method for different types of analysis, notably in the public health domain. RFs offer outstanding performance with minimal effort from the user (given its low sensitivity to its hyperparameters) and can be used for many different modeling tasks (classification, regression, clustering, outlier detection). In practice, RFs are easily applicable through libraries in Python and R languages, allowing users to benefit from their capabilities with minimal coding knowledge. In addition to their pure performance abilities, RFs have some practical advantages when compared to many classical statistical models: it does not require any normalization of the data, handles very large datasets (in population or variables) and all kinds of data types (e.g., binary, categorical, continuous); it handles outliers and is insensitive to multicollinearity within the input variables. Their main limitation is their less straightforward interpretation of the final model they build. However, they offer additional tools, such as variable importance and proximity metrics, that improve the understanding of their results and potentially provide insights that traditional models cannot. Finally, its construction based on decision trees grants it additional capabilities, notably the possibility to extract prediction intervals and to handle efficiently imbalanced data problems with a variant of the algorithm called Balanced Random Forests.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1101-1262 1464-360X 1464-360X
DOI:	10.1093/eurpub/ckab164.570