Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs

Data preprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework al...

Full description

Saved in:
Bibliographic Details
Published in2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) pp. 206 - 211
Main Authors Kazi, Saif, Vakharia, Priyesh, Shah, Parth, Gupta, Riya, Tailor, Yash, Mantry, Palak, Rathod, Jash
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2022
Subjects
Online AccessGet full text
DOI10.1109/CDMA54072.2022.00039

Cover

More Information
Summary:Data preprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework also provides low-level functions to build custom pipelines. The paper gives a brief overview of the features and the high-level APIs of Preprocessy along with a performance comparison against Scikit-learn and Pandas on two datasets. Preprocessy provides functions for handling missing data and outliers, data normalisation, feature selection and data sampling. The goal of Preprocessy is to be easy to use, flexible and performant. Preprocessy helps beginners and experts alike by making data preprocessing an easier and faster task.
DOI:10.1109/CDMA54072.2022.00039