Preprocessy: A Customisable Data Preprocessing Framework with High-Level APIs

Data preprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework al...

Full description

Saved in:

Bibliographic Details
Published in	2022 7th International Conference on Data Science and Machine Learning Applications (CDMA) pp. 206 - 211
Main Authors	Kazi, Saif, Vakharia, Priyesh, Shah, Parth, Gupta, Riya, Tailor, Yash, Mantry, Palak, Rathod, Jash
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2022
Subjects	Data mining Data preprocessing data preprocessing pipelines Data science Feature extraction Machine learning Multiaccess communication Pipelines python
Online Access	Get full text
DOI	10.1109/CDMA54072.2022.00039

Cover

More Information
Summary:	Data preprocessing is an important prerequisite for data mining and machine learning. In this paper, we introduce Preprocessy, a Python framework that provides customisable data preprocessing pipelines for processing structured data. Preprocessy pipelines come with sane defaults and the framework also provides low-level functions to build custom pipelines. The paper gives a brief overview of the features and the high-level APIs of Preprocessy along with a performance comparison against Scikit-learn and Pandas on two datasets. Preprocessy provides functions for handling missing data and outliers, data normalisation, feature selection and data sampling. The goal of Preprocessy is to be easy to use, flexible and performant. Preprocessy helps beginners and experts alike by making data preprocessing an easier and faster task.
DOI:	10.1109/CDMA54072.2022.00039