Extending attribute-oriented induction algorithm for major values and numeric values

Attribute-oriented induction (AOI) uses concept hierarchies to discover hidden patterns from a huge amount of data and presents the concise patterns as a general description of the original data. It is an effective data analysis and data reduction technique. Researchers have recently proposed some e...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 27; no. 2; pp. 187 - 202
Main Author Hsu, Chung-Chian
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.08.2004
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2004.01.002

Cover

More Information
Summary:Attribute-oriented induction (AOI) uses concept hierarchies to discover hidden patterns from a huge amount of data and presents the concise patterns as a general description of the original data. It is an effective data analysis and data reduction technique. Researchers have recently proposed some extensions of the original method. However, there are still problems. When an attribute has major values, the traditional approach cannot preserve and present these major patterns. In addition, the construction of concept hierarchies for numeric attributes is sometimes subjective, and the generalization of border values near the cutting points of discretization can easily result in misconception. This paper proposes an extended AOI, which generalizes the traditional approach by introducing an additional major values threshold and thereby preserves as well as presents major values. Moreover, we suggest an alternative for processing numeric attributes: computing and presenting the average and deviation of aggregated tuples, which avoids constructing subjectively a numeric concept hierarchy and the generalization of border values. A synthetic data set and a real data set are used for experiments and the results show that the proposed methods are feasible and can induce more precise rules out of the raw data.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2004.01.002