Data mining algorithm for pre-processing biopharmaceutical drug product manufacturing records

•The preprocessing algorithm removes the noise from a continuously measured dataset.•The dataset is visualized as a DNA-strain and the process recipe as a gene sequence.•The real-time integration of the algorithm in the operations assures data integrity.•The outcome is a noise-free and structured da...

Full description

Saved in:
Bibliographic Details
Published inComputers & chemical engineering Vol. 124; pp. 253 - 269
Main Authors Casola, Gioele, Siegmund, Christian, Mattern, Markus, Sugiyama, Hirokazu
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 08.05.2019
Subjects
Online AccessGet full text
ISSN0098-1354
1873-4375
DOI10.1016/j.compchemeng.2018.12.001

Cover

More Information
Summary:•The preprocessing algorithm removes the noise from a continuously measured dataset.•The dataset is visualized as a DNA-strain and the process recipe as a gene sequence.•The real-time integration of the algorithm in the operations assures data integrity.•The outcome is a noise-free and structured data suitable for making decisions.•A new depiction of root causes provides a fast and quantitative decision-making. The quality of data plays a crucial role in providing a reliable decision-making process when improving processes and operations under uncertainty. We present a data mining-based algorithm for robustly pre-processing the manufacturing records of biopharmaceutical batch processes. The algorithm can identify the time intervals in which the process is in commercial operation, and can characterize process failures automatically. An approximate string-matching algorithm, a decision tree classifier and a constrained clustering is applied to sequence the raw data, to classify the noise and identify each single batches; finally process failure are characterized. The algorithm was applied to the records of the process named as “cleaning- and sterilizing-in-place”, which is an essential process in manufacturing environment, in a case study. The algorithm was training on state of the art manual pre-processing outcome and was applied reducing the execution time of the activity down to 11.7% while maintaining high data quality and integrity.
ISSN:0098-1354
1873-4375
DOI:10.1016/j.compchemeng.2018.12.001