Ensemble Learning Model for Industrial Policy Classification Using Automated Hyperparameter Optimization

The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. They...

Full description

Saved in:

Bibliographic Details
Published in	Electronics (Basel) Vol. 14; no. 20; p. 3974
Main Author	Jang, Hee-Seon
Format	Journal Article
Language	English
Published	Basel MDPI AG 10.10.2025
Subjects	Accuracy Artificial intelligence Classification COVID-19 Data collection Data mining Economic growth Ensemble learning GDP Gross Domestic Product Industrial policy Jurisdiction Labeling Optimization Performance evaluation R&D Regression analysis Regression models Research & development Semiconductors Subject specialists Tariffs United States > US China
Online Access	Get full text
ISSN	2079-9292 2079-9292
DOI	10.3390/electronics14203974

Cover

More Information
Summary:	The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. They use both offensive and defensive tools such as tariffs, trade barriers, investment restrictions, and financial support measures. To evaluate how these policy announcements may affect national interests, many countries have implemented logistic regression models to automatically classify them as either IP or non-IP. This study proposes ensemble models—widely recognized for their superior performance in binary classification—as a more effective alternative. The random forest model (a bagging technique) and boosting methods (gradient boosting, XGBoost, and LightGBM) are proposed, and their performance is compared with that of logistic regression. For evaluation, a dataset of 2000 randomly selected policy documents was compiled and labeled by domain experts. Following data preprocessing, hyperparameter optimization was performed using the Optuna library in Python 3.10. To enhance model robustness, cross-validation was applied, and performance was evaluated using key metrics such as accuracy, precision, and recall. The analytical results demonstrate that ensemble models consistently outperform logistic regression in both baseline (default hyperparameters) and optimized configurations. Compared to logistic regression, LightGBM and random forest showed baseline accuracy improvements of 3.5% and 3.8%, respectively, with hyperparameter optimization yielding additional performance gains of 2.4–3.3% across ensemble methods. In particular, the analysis based on alternative performance indicators confirmed that the LightGBM and random forest models yielded the most reliable predictions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics14203974