The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform
The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the sm...
        Saved in:
      
    
          | Published in | Chemical engineering transactions Vol. 51 | 
|---|---|
| Main Author | |
| Format | Journal Article | 
| Language | English | 
| Published | 
            AIDIC Servizi S.r.l
    
        01.01.2016
     | 
| Online Access | Get full text | 
| ISSN | 2283-9216 | 
| DOI | 10.3303/CET1651065 | 
Cover
| Abstract | The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly. | 
    
|---|---|
| AbstractList | The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly. | 
    
| Author | C. Li | 
    
| Author_xml | – sequence: 1 fullname: C. Li  | 
    
| BookMark | eNo1jMtKw0AYRgdRsNZufIJ5gehcMpPMssTWFgqK1HX459ZOSTIxky769garqwPnO3wP6LaLnUPoiZJnzgl_qVZ7KgUlUtygGWMlzxSj8h4tUjoRQhgtaZnLGfreHx1e9n0TDIwhdjh6fAyHY2ZD67o0GWjwK4yAqwZSCv6_0xf8CZ2NLV7HwaURa0jO4mnZgI2xn_p4triKbX8eQ3fAHw2MPg7tI7rz0CS3-OMcfa1X-2qT7d7fttVyl1nG6JjBhCL31EGupGLGaG04aMaFNaIwheHaFcRSkOAFl9RaUohcKEMcVYUnfI62118b4VT3Q2hhuNQRQv0r4nCoYRiDaVwtRa5KWSrjSsiJ9qUhwjPmmCBKOO35D5QuaaU | 
    
| ContentType | Journal Article | 
    
| DBID | DOA | 
    
| DOI | 10.3303/CET1651065 | 
    
| DatabaseName | DOAJ Directory of Open Access Journals | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Open Access Full Text url: https://www.doaj.org/ sourceTypes: Open Website  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISSN | 2283-9216 | 
    
| ExternalDocumentID | oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf | 
    
| GroupedDBID | ACCJX ALMA_UNASSIGNED_HOLDINGS GROUPED_DOAJ OK1  | 
    
| ID | FETCH-LOGICAL-d221t-ad2274f1ea49692ccbbc3ab235dc57c7c3be70d1a6af5361dd075459c0e197f03 | 
    
| IEDL.DBID | DOA | 
    
| IngestDate | Fri Oct 03 12:53:05 EDT 2025 | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-d221t-ad2274f1ea49692ccbbc3ab235dc57c7c3be70d1a6af5361dd075459c0e197f03 | 
    
| OpenAccessLink | https://doaj.org/article/65498689ce8a40bf8c05f22e25095ebf | 
    
| ParticipantIDs | doaj_primary_oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2016-01-01 | 
    
| PublicationDateYYYYMMDD | 2016-01-01 | 
    
| PublicationDate_xml | – month: 01 year: 2016 text: 2016-01-01 day: 01  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | Chemical engineering transactions | 
    
| PublicationYear | 2016 | 
    
| Publisher | AIDIC Servizi S.r.l | 
    
| Publisher_xml | – name: AIDIC Servizi S.r.l | 
    
| SSID | ssj0002181846 | 
    
| Score | 2.000876 | 
    
| Snippet | The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest... | 
    
| SourceID | doaj | 
    
| SourceType | Open Website | 
    
| Title | The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform | 
    
| URI | https://doaj.org/article/65498689ce8a40bf8c05f22e25095ebf | 
    
| Volume | 51 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Open Access Full Text databaseCode: DOA dateStart: 20090101 customDbUrl: isFulltext: true eissn: 2283-9216 dateEnd: 99991231 titleUrlDefault: https://www.doaj.org/ omitProxy: true ssIdentifier: ssj0002181846 providerName: Directory of Open Access Journals  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwGA2ykxdRVPxNDl7LmjZJk-OcG0NQRDbYreTnSdup3cH_ft_XFNzNi6dCG1rIa7_3vjS8R8i9CUE7KUQmLa5WcRsytBXPhC4Z91KF2Mf5PL_IxYo_rcV6L-oL94Qle-A0cWMJDYySSrugDM9tVC4XsSgCULcWwUasvrnSe80U1mAkLmDW5EcKLXs5ns6WTMILiCyy587f08j8mBwN-o9O0nNPyEFoTskngEUnv7-SaRsp-ghnHr33k28GfTSdoX2GJe7uSePsD30zjW8_KEZsfncUSclTuAIVpW03ML7depqiG4Ck6Ou76VCmnpHVfLacLrIhCyHzRcG6zMCh4pEFw7XUhXPWutLYohTeicpVrrShyj0z0kRRSuY9aAEutMsD01XMy3MyatomXBBamCACShGmLHcAEfCTdkLDx22gf1GX5AHnp94ku4saDaj7EwBLPcBS_wXL1X_c5Jocgj4ZVjxuyKj72oZb0ACdvevh3gHgSbCH | 
    
| linkProvider | Directory of Open Access Journals | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Application+of+high-dimensional+Data+Classification+by+Random+Forest+based+on+Hadoop+Cloud+Computing+Platform&rft.jtitle=Chemical+engineering+transactions&rft.au=C.+Li&rft.date=2016-01-01&rft.pub=AIDIC+Servizi+S.r.l&rft.eissn=2283-9216&rft.volume=51&rft_id=info:doi/10.3303%2FCET1651065&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf |