A Statistical Method for Determining Importance of Variables in an Information System

A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified da...

Full description

Saved in:

Bibliographic Details
Published in	Lecture notes in computer science pp. 557 - 566
Main Authors	Rudnicki, Witold R., Kierczak, Marcin, Koronacki, Jacek, Komorowski, Jan
Format	Book Chapter Conference Proceeding
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg 2006 Springer
Series	Lecture Notes in Computer Science
Subjects	Apparent Importance Applied sciences Artificial intelligence Attribute Importance Bootstrap Sample Computer science; control theory; systems Decision Attribute Exact sciences and technology Learning and adaptive systems Random Forest Rough set theory Randomization Statistical analysis Probabilistic approach Information system Hierarchical classification Iterative method Supervised classification Aggregate model
Online Access	Get full text
ISBN	3540476938 9783540476931 3540498427 9783540498421
ISSN	0302-9743 1611-3349
DOI	10.1007/11908029_58

Cover

More Information
Summary:	A new method for estimation of attributes’ importance for supervised classification, based on the random forest approach, is presented. Essentially, an iterative scheme is applied, with each step consisting of several runs of the random forest program. Each run is performed on a suitably modified data set: values of each attribute found unimportant at earlier steps are randomly permuted between objects. At each step, apparent importance of an attribute is calculated and the attribute is declared unimportant if its importance is not uniformly better than that of the attributes earlier found unimportant. The procedure is repeated until only attributes scoring better than the randomized ones are retained. Statistical significance of the results so obtained is verified. This method has been applied to 12 data sets of biological origin. The method was shown to be more reliable than that based on standard application of a random forest to assess attributes’ importance.
ISBN:	3540476938 9783540476931 3540498427 9783540498421
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11908029_58