SeAlM: A Query Cache Optimization Technique for Next Generation Sequence Alignment
Genetic data from next-generation sequencing (NGS) technology is being produced at an ever increasing rate - already outpacing the well known Moore's Law. Due to this pace of NGS data generation, new methods are necessary in order to facilitate rapid sequence analysis at the enormous scale requ...
Saved in:
| Published in | IEEE ... International Conference on Data Mining workshops pp. 958 - 965 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.11.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2375-9259 |
| DOI | 10.1109/ICDMW.2019.00139 |
Cover
| Abstract | Genetic data from next-generation sequencing (NGS) technology is being produced at an ever increasing rate - already outpacing the well known Moore's Law. Due to this pace of NGS data generation, new methods are necessary in order to facilitate rapid sequence analysis at the enormous scale required. The need for such methods is further compounded by the dropping financial cost of sequencing, leading to the normalization of large-scale genome studies spanning entire populations. A key process in the genomic data analysis pipeline, and one that is often most time consuming, is read mapping or so-called alignment. This paper introduces Sequence Alignment Memorizer (SeAlM), a technique that reduces the number of redundant alignments to enable population-scale workloads. SeAlM uses a novel method for reordering alignment queries from multiple sources to create batches with increased likelihood of containing redundant queries that can be de-duplicated before alignment, while also ordering those batches to improve the ability to cache queries effectively. We show that our technique can improve the average throughput of alignment for a single human sample by 6.5% and a population of 10 human subjects by 13.6% -18.8% depending on the type of genetic data used. |
|---|---|
| AbstractList | Genetic data from next-generation sequencing (NGS) technology is being produced at an ever increasing rate - already outpacing the well known Moore's Law. Due to this pace of NGS data generation, new methods are necessary in order to facilitate rapid sequence analysis at the enormous scale required. The need for such methods is further compounded by the dropping financial cost of sequencing, leading to the normalization of large-scale genome studies spanning entire populations. A key process in the genomic data analysis pipeline, and one that is often most time consuming, is read mapping or so-called alignment. This paper introduces Sequence Alignment Memorizer (SeAlM), a technique that reduces the number of redundant alignments to enable population-scale workloads. SeAlM uses a novel method for reordering alignment queries from multiple sources to create batches with increased likelihood of containing redundant queries that can be de-duplicated before alignment, while also ordering those batches to improve the ability to cache queries effectively. We show that our technique can improve the average throughput of alignment for a single human sample by 6.5% and a population of 10 human subjects by 13.6% -18.8% depending on the type of genetic data used. |
| Author | Stene, Evan Banaei-Kashani, Farnoush |
| Author_xml | – sequence: 1 givenname: Evan surname: Stene fullname: Stene, Evan organization: University of Colorado, Denver – sequence: 2 givenname: Farnoush surname: Banaei-Kashani fullname: Banaei-Kashani, Farnoush organization: University of Colorado, Denver |
| BookMark | eNotkE1Lw0AYhFdRsK29C172DyS-u5vdZL2FWGuhtWgDHks-3rUrybYmKVh_vQt1Ls9hhmGYMblye4eE3DEIGQP9sMieVh8hB6ZDACb0BRmzmCcsSrSCSzLiIpaB5lLfkGnff4EPaRFpzUfkfYNps3qkKX07YneiWVHtkK4Pg23tbzHYvaM5Vjtnv49Izb6jr_gz0Dk67M7uBr3jKqRpYz9di264JdemaHqc_nNC8udZnr0Ey_V8kaXLwHIQQyAxKWO_vqqjCBITg-Gl9KpELUtUkeIxcoSax7UyBdegdB2VxghZKvCYkPtzrUXE7aGzbdGdtomWUvkP_gD8FFDI |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICDMW.2019.00139 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics Computer Science |
| EISBN | 1728148960 9781728148960 |
| EISSN | 2375-9259 |
| EndPage | 965 |
| ExternalDocumentID | 8955601 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i203t-5e8b7109cd4408f70f2b5555c3d5be64627e2e0d27d6fa29069d4bff35b60ff3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:33:46 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-5e8b7109cd4408f70f2b5555c3d5be64627e2e0d27d6fa29069d4bff35b60ff3 |
| PageCount | 8 |
| ParticipantIDs | ieee_primary_8955601 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Nov. |
| PublicationDateYYYYMMDD | 2019-11-01 |
| PublicationDate_xml | – month: 11 year: 2019 text: 2019-Nov. |
| PublicationDecade | 2010 |
| PublicationTitle | IEEE ... International Conference on Data Mining workshops |
| PublicationTitleAbbrev | ICDMW |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001934992 |
| Score | 1.706546 |
| Snippet | Genetic data from next-generation sequencing (NGS) technology is being produced at an ever increasing rate - already outpacing the well known Moore's Law. Due... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 958 |
| SubjectTerms | alignment Bioinformatics Genomics Indexes next generation sequencing query cache optimization read mapping Sequential analysis Sociology Statistics |
| Title | SeAlM: A Query Cache Optimization Technique for Next Generation Sequence Alignment |
| URI | https://ieeexplore.ieee.org/document/8955601 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGP8CnDihgPGdHjw6GF33qDeCEjQZPsDIjazrV0NEMGQ76F9vuwcY48Fe1mxLurRZv6_t7wFwoQSNVNRzLEWVsBjvCUt4yrY4UzTW4SuImSE4h2Nv9MzuZu6sApdbLgwiZuAz7JhqdpYv13Fqtsq6AXfNAqIKVT_wcq7Wbj-FOzp5p-VJpM27t4Pr8MWAtzJFSuMG_sM_JQsfwwaEZcM5auStkyaiE3_90mT875ftQXtH1CMP2xC0DxVcNaFROjWQ4sdtQt3klLkkcwueJthfhlekTx5T3HySgdF0Jvd67ngvSJlkWiq7Ep3TkrGewEmuT509nRTwa9JfLl4zNEEbpsOb6WBkFdYK1oLaTmK5GAiDwoylcZxWvq2ocHWJHekK9JhHfaRoS-pLT0VGEp5LJpRyXOHZ-nIAtdV6hYdAhCNsoV9mLkoWURWowJfIuOJGTJD1jqBlumv-kYtnzIueOv779gnUzYDlZL9TqCWbFM901E_EeTbc39J6ra8 |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LT8IwGP-CeJATChjf9uDRwejaPbwRlIAyfDAjN7KurSEiGLId9K-33QOM8WAva7YlXdqs39f29wC4kAyHMmxbhsSSGcRrM4PZ0jQ8InGkwpcbEU1w9kd2_5ncTuikBJdrLowQIgWfiaaupmf5fBklequs5XpULyC2YJsSQmjG1trsqHiWSt9xcRZpeq1B99p_0fCtVJNS-4H_cFBJA0ivCn7RdIYbeWsmMWtGX79UGf_7bbvQ2FD10MM6CO1BSSxqUC28GlD-69agorPKTJS5Dk9j0Zn7V6iDHhOx-kRdreqM7tXs8Z7TMlFQaLsildWikZrCUaZQnT4d5wBs1JnPXlM8QQOC3k3Q7Ru5uYIxw6YVG1S4TOMwI649p6VjSsyoKpHFKRM2sbEjsDA5drgtQy0K73HCpLQos0112YfyYrkQB4CYxUymXiZUcBJi6UrX4YJ40tNygqR9CHXdXdOPTD5jmvfU0d-3z2GnH_jD6XAwujuGih68jPp3AuV4lYhTlQPE7Cwd-m8ZaLD8 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+...+International+Conference+on+Data+Mining+workshops&rft.atitle=SeAlM%3A+A+Query+Cache+Optimization+Technique+for+Next+Generation+Sequence+Alignment&rft.au=Stene%2C+Evan&rft.au=Banaei-Kashani%2C+Farnoush&rft.date=2019-11-01&rft.pub=IEEE&rft.eissn=2375-9259&rft.spage=958&rft.epage=965&rft_id=info:doi/10.1109%2FICDMW.2019.00139&rft.externalDocID=8955601 |