Data mining algorithm for pre-processing biopharmaceutical drug product manufacturing records

•The preprocessing algorithm removes the noise from a continuously measured dataset.•The dataset is visualized as a DNA-strain and the process recipe as a gene sequence.•The real-time integration of the algorithm in the operations assures data integrity.•The outcome is a noise-free and structured da...

Full description

Saved in:

Bibliographic Details
Published in	Computers & chemical engineering Vol. 124; pp. 253 - 269
Main Authors	Casola, Gioele, Siegmund, Christian, Mattern, Markus, Sugiyama, Hirokazu
Format	Journal Article
Language	English
Published	Elsevier Ltd 08.05.2019
Subjects	GMP Ishikawa fishbone diagram Language recognition Noise Filtering Semi-supervised machine learning Supervised machine learning Supervised machine learning FDA Semi-supervised machine learning CIP/SIP sETS ETS Noise Filtering MSO cGMP GMP DP Ishikawa fishbone diagram WF DT eETS RCA QbD Language recognition PN
Online Access	Get full text
ISSN	0098-1354 1873-4375
DOI	10.1016/j.compchemeng.2018.12.001

Cover

More Information
Summary:	•The preprocessing algorithm removes the noise from a continuously measured dataset.•The dataset is visualized as a DNA-strain and the process recipe as a gene sequence.•The real-time integration of the algorithm in the operations assures data integrity.•The outcome is a noise-free and structured data suitable for making decisions.•A new depiction of root causes provides a fast and quantitative decision-making. The quality of data plays a crucial role in providing a reliable decision-making process when improving processes and operations under uncertainty. We present a data mining-based algorithm for robustly pre-processing the manufacturing records of biopharmaceutical batch processes. The algorithm can identify the time intervals in which the process is in commercial operation, and can characterize process failures automatically. An approximate string-matching algorithm, a decision tree classifier and a constrained clustering is applied to sequence the raw data, to classify the noise and identify each single batches; finally process failure are characterized. The algorithm was applied to the records of the process named as “cleaning- and sterilizing-in-place”, which is an essential process in manufacturing environment, in a case study. The algorithm was training on state of the art manual pre-processing outcome and was applied reducing the execution time of the activity down to 11.7% while maintaining high data quality and integrity.
ISSN:	0098-1354 1873-4375
DOI:	10.1016/j.compchemeng.2018.12.001