Features of Big Data and sparsest solution in high confidence set

This chapter summarizes some of the unique features of Big Data analysis. These features are shared neither by low-dimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for understanding population heterogeneity as in personalized medicine or servi...

Full description

Saved in:
Bibliographic Details
Published inPast, Present, and Future of Statistical Science pp. 531 - 548
Format Book Chapter
LanguageEnglish
Published Chapman and Hall/CRC 2014
Subjects
Online AccessGet full text
DOI10.1201/b16720-50

Cover

More Information
Summary:This chapter summarizes some of the unique features of Big Data analysis. These features are shared neither by low-dimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for understanding population heterogeneity as in personalized medicine or services. High dimensionality introduces spurious correlations, incidental endogeneity, noise accumulation, and measurement error. These unique features are very distinguished and statistical procedures should be designed with these issues in mind. To illustrate, a method called a sparsest solution in highconfidence set is introduced which is generally applicable to high-dimensional statistical inference. This method, whose properties are briefly examined, is natural as the information about parameters contained in the data is summarized by high-confident sets and the sparsest solution is a way to deal with the noise accumulation issue.
DOI:10.1201/b16720-50