The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform

The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the sm...

Full description

Saved in:
Bibliographic Details
Published inChemical engineering transactions Vol. 51
Main Author C. Li
Format Journal Article
LanguageEnglish
Published AIDIC Servizi S.r.l 01.01.2016
Online AccessGet full text
ISSN2283-9216
DOI10.3303/CET1651065

Cover

Abstract The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly.
AbstractList The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest algorithm is a ensemble classifier method, and composed of numerous weak classifiers. It can overcome a number of practical problems, such as the small sample size, over-learning, nonlinearity, the curse of dimensionality and local minima, and it has a good application prospect in the field of high-dimensional data classification. In order to improve the classification accuracy and computational efficiency, a neval classification method based on the Hadoop cloud computing platform is proposed. Firstly, the processing of Bagging algorithm is done with the data sets to get the different data subsets. Secondly, the Random Forest is completed by training of the decision tree under the MapReuce architecture. Finally, the processing of data sets classification is done by the Random Forest. In our experiment, the three high-dimensional data sets are used as the subjects. The experimental results show that the classification accuracy of proposed method is higher than that of stand-alone Random Forest, and the computational efficiency is improved significantly.
Author C. Li
Author_xml – sequence: 1
  fullname: C. Li
BookMark eNo1jMtKw0AYRgdRsNZufIJ5gehcMpPMssTWFgqK1HX459ZOSTIxky769garqwPnO3wP6LaLnUPoiZJnzgl_qVZ7KgUlUtygGWMlzxSj8h4tUjoRQhgtaZnLGfreHx1e9n0TDIwhdjh6fAyHY2ZD67o0GWjwK4yAqwZSCv6_0xf8CZ2NLV7HwaURa0jO4mnZgI2xn_p4triKbX8eQ3fAHw2MPg7tI7rz0CS3-OMcfa1X-2qT7d7fttVyl1nG6JjBhCL31EGupGLGaG04aMaFNaIwheHaFcRSkOAFl9RaUohcKEMcVYUnfI62118b4VT3Q2hhuNQRQv0r4nCoYRiDaVwtRa5KWSrjSsiJ9qUhwjPmmCBKOO35D5QuaaU
ContentType Journal Article
DBID DOA
DOI 10.3303/CET1651065
DatabaseName DOAJ Directory of Open Access Journals
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Open Access Full Text
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
EISSN 2283-9216
ExternalDocumentID oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf
GroupedDBID ACCJX
ALMA_UNASSIGNED_HOLDINGS
GROUPED_DOAJ
OK1
ID FETCH-LOGICAL-d221t-ad2274f1ea49692ccbbc3ab235dc57c7c3be70d1a6af5361dd075459c0e197f03
IEDL.DBID DOA
IngestDate Fri Oct 03 12:53:05 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-d221t-ad2274f1ea49692ccbbc3ab235dc57c7c3be70d1a6af5361dd075459c0e197f03
OpenAccessLink https://doaj.org/article/65498689ce8a40bf8c05f22e25095ebf
ParticipantIDs doaj_primary_oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf
PublicationCentury 2000
PublicationDate 2016-01-01
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – month: 01
  year: 2016
  text: 2016-01-01
  day: 01
PublicationDecade 2010
PublicationTitle Chemical engineering transactions
PublicationYear 2016
Publisher AIDIC Servizi S.r.l
Publisher_xml – name: AIDIC Servizi S.r.l
SSID ssj0002181846
Score 2.000876
Snippet The high-dimensional data has a number of uncertain factors, such as sparse features, repeated features and computational complexity. The random forest...
SourceID doaj
SourceType Open Website
Title The Application of high-dimensional Data Classification by Random Forest based on Hadoop Cloud Computing Platform
URI https://doaj.org/article/65498689ce8a40bf8c05f22e25095ebf
Volume 51
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Open Access Full Text
  databaseCode: DOA
  dateStart: 20090101
  customDbUrl:
  isFulltext: true
  eissn: 2283-9216
  dateEnd: 99991231
  titleUrlDefault: https://www.doaj.org/
  omitProxy: true
  ssIdentifier: ssj0002181846
  providerName: Directory of Open Access Journals
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwGA2ykxdRVPxNDl7LmjZJk-OcG0NQRDbYreTnSdup3cH_ft_XFNzNi6dCG1rIa7_3vjS8R8i9CUE7KUQmLa5WcRsytBXPhC4Z91KF2Mf5PL_IxYo_rcV6L-oL94Qle-A0cWMJDYySSrugDM9tVC4XsSgCULcWwUasvrnSe80U1mAkLmDW5EcKLXs5ns6WTMILiCyy587f08j8mBwN-o9O0nNPyEFoTskngEUnv7-SaRsp-ghnHr33k28GfTSdoX2GJe7uSePsD30zjW8_KEZsfncUSclTuAIVpW03ML7depqiG4Ck6Ou76VCmnpHVfLacLrIhCyHzRcG6zMCh4pEFw7XUhXPWutLYohTeicpVrrShyj0z0kRRSuY9aAEutMsD01XMy3MyatomXBBamCACShGmLHcAEfCTdkLDx22gf1GX5AHnp94ku4saDaj7EwBLPcBS_wXL1X_c5Jocgj4ZVjxuyKj72oZb0ACdvevh3gHgSbCH
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Application+of+high-dimensional+Data+Classification+by+Random+Forest+based+on+Hadoop+Cloud+Computing+Platform&rft.jtitle=Chemical+engineering+transactions&rft.au=C.+Li&rft.date=2016-01-01&rft.pub=AIDIC+Servizi+S.r.l&rft.eissn=2283-9216&rft.volume=51&rft_id=info:doi/10.3303%2FCET1651065&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_65498689ce8a40bf8c05f22e25095ebf