PWFS: A scalable parallel Python module for wrapper feature selection

In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requi...

Full description

Saved in:
Bibliographic Details
Published inJournal of Innovative Engineering and Natural Science Vol. 5; no. 2; pp. 704 - 719
Main Authors Eren, Hakan Alp, Okyay, Savaş, Adar, Nihat
Format Journal Article
LanguageEnglish
Published 31.07.2025
Online AccessGet full text
ISSN2791-7630
2791-7630
DOI10.61112/jiens.1639780

Cover

More Information
Summary:In the field of machine learning, the feature selection process is a crucial step, and it can significantly impact the performance of predictive models. Despite the existence of various time-efficient algorithms, the only method that guarantees problem optimization is exhaustive search, but it requires an enormous computational load. Although the exhaustive search ensures the best feature selection, a lifetime would not be enough after certain large feature counts. This study proposes a generic, scalable open-source parallel Python module to find the best wrapper feature subset in a fully optimized execution time, especially for reasonable feature counts. This parallel wrapper feature selection module, PWFS, is independent of machine learning algorithms and can function with user-defined methods. The framework promises maximum benefit on the machine learning side by empowering parallel performance and efficiency. The system design is built on the most efficient message-passing communication, where the framework distributes the computational load equally among the parallel agents via feature masking. The module is validated on two workstations, one of which is hyper-threading capable. An overall performance gain of 19.77% is achieved with hyper-threading. Various scenarios and experiments yield different speedups and efficiencies up to 96.74%, validating the flexible design of the proposed parallel framework. The source code of the module is available at https://github.com/haeren/parallel-feature-selector and https://pypi.org/project/parallel-feature-selector/.
ISSN:2791-7630
2791-7630
DOI:10.61112/jiens.1639780