Random forest regressor applied in prediction of percentages of calibers in mango production

The importance of identifying the caliber in advance is in knowing the exact quantity of mangos, by weight, that a determined crop season (complete periods of the mango cycle from growth up to fruit harvest) will provide. This study uses Random Forest method to predict the percentage distribution of...

Full description

Saved in:
Bibliographic Details
Published inInformation processing in agriculture Vol. 12; no. 3; pp. 370 - 383
Main Authors Ramos Collin, Bernard Roger, de Lima Alves Xavier, Danilo, Amaral, Thiago Magalhães, Castro Silva, Ana Cristina G., dos Santos Costa, Daniel, Amaral, Fernanda Magalhães, Oliva, Jefferson Tales
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2025
Subjects
Online AccessGet full text
ISSN2214-3173
2214-3173
DOI10.1016/j.inpa.2024.12.002

Cover

More Information
Summary:The importance of identifying the caliber in advance is in knowing the exact quantity of mangos, by weight, that a determined crop season (complete periods of the mango cycle from growth up to fruit harvest) will provide. This study uses Random Forest method to predict the percentage distribution of the calibers of four mango varieties from Brazil’s largest exporter and producer. Our proposed approach was conducted in the following steps: data collection; data preprocessing; predictive model building; and model evaluation. The data correspond to three crop seasons, namely those of 2019, 2020, and 2021. Each data line corresponds to a plot with the percentage of a determined caliber at the end of a crop season. The number of rows in the dataset is 5503, with 37.33 %, 31.47 %, 22.76 %, and 8.44 % corresponding to the Keitt, Tommy Atkins, Kent, and Palmer varieties, respectively. The variables are Productivity, (N) Nitrogen, Number of plants (units), Plants/hectare, Month of floral induction, (Zn) Zinc, (S) Sulfur, (B) Boron, Caliber, and Percentage of caliber. The Python programming language was used to preprocess the data, do exploratory analysis, develop the algorithms of the Random Forest Regressor, and compile the lines of the code in Visual Studio Code. Python libraries were used during the study, such as pandas for data handling and Scipy for removing outliers to avoid any biases in the data. The YellowBrick library was used for the feature selection process. Four regression models were created using Random Forest (RF), one for each variety of fruit that composes the dataset. The algorithms showed satisfactory results for Kent, Keitt, Tommy Atkins, and Palmer mangoes, with the following R2 of the models: 87.29 %, 74.37 %, 87.69 %, and 62.75 %, respectively. During the Feature Selectionstep, nitrogen (N) was perceived to be highly important in all the models, highlighting the representative nature of this element in fruit formation. From the models created, it is possible to predict the percentage distribution of the calibers of mangos from each growing area 6 months in advance, using data that characterize each area and information on the presence of leaf nutrients as input.
ISSN:2214-3173
2214-3173
DOI:10.1016/j.inpa.2024.12.002