Features of Big Data and sparsest solution in high confidence set
This chapter summarizes some of the unique features of Big Data analysis. These features are shared neither by low-dimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for understanding population heterogeneity as in personalized medicine or servi...
        Saved in:
      
    
          | Published in | Past, Present, and Future of Statistical Science pp. 531 - 548 | 
|---|---|
| Format | Book Chapter | 
| Language | English | 
| Published | 
            Chapman and Hall/CRC
    
        2014
     | 
| Subjects | |
| Online Access | Get full text | 
| DOI | 10.1201/b16720-50 | 
Cover
| Summary: | This chapter summarizes some of the unique features of Big Data analysis.
These features are shared neither by low-dimensional data nor by small samples. Big Data pose new computational challenges and hold great promises for
understanding population heterogeneity as in personalized medicine or services. High dimensionality introduces spurious correlations, incidental endogeneity, noise accumulation, and measurement error. These unique features are
very distinguished and statistical procedures should be designed with these
issues in mind. To illustrate, a method called a sparsest solution in highconfidence set is introduced which is generally applicable to high-dimensional
statistical inference. This method, whose properties are briefly examined, is
natural as the information about parameters contained in the data is summarized by high-confident sets and the sparsest solution is a way to deal with
the noise accumulation issue. | 
|---|---|
| DOI: | 10.1201/b16720-50 |