Building extensible frameworks for data processing: The case of MDP, Modular toolkit for Data Processing

► MDP is an open-source library for scientific data processing in Python. ► It provides a framework for reoccurring tasks, like combining multiple algorithms. ► MDP includes a flexible extension mechanism, allowing for instance convenient parallelization for selected algorithms. Data processing is a...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational science Vol. 4; no. 5; pp. 345 - 351
Main Authors Wilbert, Niko, Zito, Tiziano, Schuppner, Rike-Benjamin, Jędrzejewski-Szmek, Zbigniew, Wiskott, Laurenz, Berkes, Pietro
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.09.2013
Subjects
Online AccessGet full text
ISSN1877-7503
1877-7511
DOI10.1016/j.jocs.2011.10.005

Cover

More Information
Summary:► MDP is an open-source library for scientific data processing in Python. ► It provides a framework for reoccurring tasks, like combining multiple algorithms. ► MDP includes a flexible extension mechanism, allowing for instance convenient parallelization for selected algorithms. Data processing is a ubiquitous task in scientific research, and much energy is spent on the development of appropriate algorithms. It is thus relatively easy to find software implementations of the most common methods. On the other hand, when building concrete applications, developers are often confronted with several additional chores that need to be carried out beside the individual processing steps. These include for example training and executing a sequence of several algorithms, writing code that can be executed in parallel on several processors, or producing a visual description of the application. The Modular toolkit for Data Processing (MDP) is an open source Python library that provides an implementation of several widespread algorithms and offers a unified framework to combine them to build more complex data processing architectures. In this paper we concentrate on some of the newer features of MDP, focusing on the choices made to automatize repetitive tasks for users and developers. In particular, we describe the support for parallel computing and how this is implemented via a flexible extension mechanism. We also briefly discuss the support for algorithms that require bi-directional data flow.
ISSN:1877-7503
1877-7511
DOI:10.1016/j.jocs.2011.10.005