Automating Biomedical Data Science Through Tree-Based Pipeline Optimization
Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of...
Saved in:
Published in | Applications of Evolutionary Computation pp. 123 - 137 |
---|---|
Main Authors | , , , , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
2016
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3319312030 9783319312033 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-31204-0_9 |
Cover
Abstract | Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design. |
---|---|
AbstractList | Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning—pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators—such as synthetic feature constructors—that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design. |
Author | Andrews, Peter C. Kidd, La Creis Lavender, Nicole A. Moore, Jason H. Olson, Randal S. Urbanowicz, Ryan J. |
Author_xml | – sequence: 1 givenname: Randal S. surname: Olson fullname: Olson, Randal S. email: olsonran@upenn.edu organization: Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA – sequence: 2 givenname: Ryan J. surname: Urbanowicz fullname: Urbanowicz, Ryan J. organization: Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA – sequence: 3 givenname: Peter C. surname: Andrews fullname: Andrews, Peter C. organization: Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA – sequence: 4 givenname: Nicole A. surname: Lavender fullname: Lavender, Nicole A. organization: University of Louisville, Louisville, USA – sequence: 5 givenname: La Creis surname: Kidd fullname: Kidd, La Creis organization: University of Louisville, Louisville, USA – sequence: 6 givenname: Jason H. surname: Moore fullname: Moore, Jason H. organization: Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, USA |
BookMark | eNpFkN1OwzAMhQMMiW3wBNzkBQJ20i3N5TZ-xaQhUa6jtHW3wNZMTXfD0xMGEhc-lm3p6PgbsUEbWmLsGuEGAfSt0blQQqFJJSETYM0JG6m0OM7mlA1xiiiUyszZ_0HBgA2TSmF0pi7YKMYPAJDayCF7mR36sHO9b9d87sOOal-5Lb9zveNvlae2Il5sunBYb3jREYm5i1TzV7-nrW-Jr_a93_mvZBDaS3beuG2kq78-Zu8P98XiSSxXj8-L2VJENLoXTaPyhipwNKnLSWYQKSulmiqqJZGmGmrtwEEzUTlIBCzRgKkgd40zOTk1ZvjrG_ddyk2dLUP4jBbB_nCyiZNVNn1vj1xs4qS-AUSpWU0 |
ContentType | Book Chapter |
Copyright | Springer International Publishing Switzerland 2016 |
Copyright_xml | – notice: Springer International Publishing Switzerland 2016 |
DOI | 10.1007/978-3-319-31204-0_9 |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 3319312049 9783319312040 |
EISSN | 1611-3349 |
Editor | Burelli, Paolo Squillero, Giovanni |
Editor_xml | – sequence: 1 givenname: Giovanni surname: Squillero fullname: Squillero, Giovanni email: giovanni.squillero@polito.it – sequence: 2 givenname: Paolo surname: Burelli fullname: Burelli, Paolo email: pabu@create.aau.dk |
EndPage | 137 |
GroupedDBID | -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-s197t-ff38fec0ae5db54911e4b2363ed2ee7ed0d7a0a0f53802101b1909c08afa98ea3 |
ISBN | 3319312030 9783319312033 |
ISSN | 0302-9743 |
IngestDate | Wed Sep 17 03:01:49 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-s197t-ff38fec0ae5db54911e4b2363ed2ee7ed0d7a0a0f53802101b1909c08afa98ea3 |
PageCount | 15 |
ParticipantIDs | springer_books_10_1007_978_3_319_31204_0_9 |
PublicationCentury | 2000 |
PublicationDate | 2016 |
PublicationDateYYYYMMDD | 2016-01-01 |
PublicationDate_xml | – year: 2016 text: 2016 |
PublicationDecade | 2010 |
PublicationPlace | Cham |
PublicationPlace_xml | – name: Cham |
PublicationSeriesSubtitle | Theoretical Computer Science and General Issues |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 19th European Conference, EvoApplications 2016, Porto, Portugal, March 30 -- April 1, 2016, Proceedings, Part I |
PublicationTitle | Applications of Evolutionary Computation |
PublicationYear | 2016 |
Publisher | Springer International Publishing |
Publisher_xml | – name: Springer International Publishing |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, United Kingdom – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, United Kingdom – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: CNB H 104.2, ETH Zürich, Zürich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology Madr, Chennai, India – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: Fakultät Informatik, TU Dortmund, Dortmund, Germany – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: Los Angeles, USA – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max Planck Institute for Informatic, Saarbrücken, Germany |
SSID | ssj0002792 ssj0001657029 |
Score | 2.34593 |
Snippet | Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business,... |
SourceID | springer |
SourceType | Publisher |
StartPage | 123 |
SubjectTerms | Data science Genetic programming Hyperparameter optimization Machine learning Pipeline optimization |
Title | Automating Biomedical Data Science Through Tree-Based Pipeline Optimization |
URI | http://link.springer.com/10.1007/978-3-319-31204-0_9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9NAEF6l5YI4lKd4ywdOWBt5vY4fBw6hFFUltAgS1Ju13geqRJ0qcUDwn_iPzHh3bSflUi5WtGvZznzr2fE8viHklVQmzlOZ0zytFE2EklQUTNF8IqJYa22Mwtrhj6fp8SI5OZ-cj0Z_BllLm6Yay9__rCv5H1RhDHDFKtkbINtdFAbgN-ALR0AYjjvG77ab1aYXD0LPbX7gD3cvzIOzzRq2ouxnHS8jOg--h1_GfmaxqkS9_HkhW2fy519Y0dRN2ozHdZfLGx52UzNsRa8s5u2K0uF0PFyD002zRIu4_oY9Ly9dTOidaESnU-auT9B8pTV9C1uqCj9dXOnW-D0DdXbp6kSt9kNW5vWbmYt7nC6bNp0s9K0p_FWHrgy268rwrswdZ2jvj9v69uWgPDiLI8uj4WvAQL_DF5Id0lalp0jUyC0xqlPTLOaDHZ9Z2plrm8kwfwRrvfBuCY3KYo_sZRmo01vTo5PZ196lh2lEaG06QwC5GW0Qyz4Ulhb5h3Z0Y_2f6BixLOnxzh2vxelb82d-l9zBkpgAa1VAZPfISNf3yYGXeuCk_oB86AEPesADBNyfFDjAgx7wwAMeDAF_SBbvj-aHx9R166BrVmQNNYbnRstI6ImqJglsojqpYp5yreClz7SKVCYiERnYYtHRwCqwRQsZ5cKIIteCPyL79bLWj0lQFJnEmu6WX08IJlIjeCpjsLcyzSR_Ql57aZT4_q1LT74Noit5CaIrW9GVILqnNzn5GbndL8znZL9ZbfQLsDqb6qVD-y_6GHws |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Applications+of+Evolutionary+Computation&rft.au=Olson%2C+Randal+S.&rft.au=Urbanowicz%2C+Ryan+J.&rft.au=Andrews%2C+Peter+C.&rft.au=Lavender%2C+Nicole+A.&rft.atitle=Automating+Biomedical+Data+Science+Through+Tree-Based+Pipeline+Optimization&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2016-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783319312033&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=123&rft.epage=137&rft_id=info:doi/10.1007%2F978-3-319-31204-0_9 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |