FastQuery: A Parallel Indexing System for Scientific Data
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is th...
        Saved in:
      
    
          | Published in | 2011 IEEE International Conference on Cluster Computing pp. 455 - 464 | 
|---|---|
| Main Authors | , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.09.2011
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781457713552 1457713551  | 
| ISSN | 1552-5244 | 
| DOI | 10.1109/CLUSTER.2011.86 | 
Cover
| Abstract | Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores. | 
    
|---|---|
| AbstractList | Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores. | 
    
| Author | Prabhat, P. Kesheng Wu Chou, J.  | 
    
| Author_xml | – sequence: 1 givenname: J. surname: Chou fullname: Chou, J. email: jchou@lbl.gov – sequence: 2 surname: Kesheng Wu fullname: Kesheng Wu email: kwu@lbl.gov – sequence: 3 givenname: P. surname: Prabhat fullname: Prabhat, P. email: prabhat@lbl.gov  | 
    
| BookMark | eNotjctKw0AUQEesYFuzduFmfiDx3nmPuxJbLQR8pF2XSTIjI2kqSQTz9xb0bM7unAWZdafOE3KLkCGCvc-Lfblbv2cMEDOjLsgCtLJSSFTykiRWGxRSa-RSshmZ41mpZEJck2QYPuGMUsZYmBO7ccP49u376YGu6KvrXdv6lm67xv_E7oOW0zD6Iw2nnpZ19N0YQ6zpoxvdDbkKrh188u8l2W_Wu_w5LV6etvmqSCNqOaZWCCcVd03QAYKszucKdAOiBiPBOTAcGLLGWOEsZ8p4WwVmVEDOUDWML8ndXzd67w9ffTy6fjooUIhc8F-MbUl_ | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/CLUSTER.2011.86 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISBN | 0769545165 9780769545165  | 
    
| EndPage | 464 | 
    
| ExternalDocumentID | 6061134 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS  | 
    
| ID | FETCH-LOGICAL-i175t-944a563adf7f0f5b066b07d04c0850aa0830212d894a93268e9bf286f13216d23 | 
    
| IEDL.DBID | RIE | 
    
| ISBN | 9781457713552 1457713551  | 
    
| ISSN | 1552-5244 | 
    
| IngestDate | Wed Aug 27 03:00:07 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i175t-944a563adf7f0f5b066b07d04c0850aa0830212d894a93268e9bf286f13216d23 | 
    
| PageCount | 10 | 
    
| ParticipantIDs | ieee_primary_6061134 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2011-Sept. | 
    
| PublicationDateYYYYMMDD | 2011-09-01 | 
    
| PublicationDate_xml | – month: 09 year: 2011 text: 2011-Sept.  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2011 IEEE International Conference on Cluster Computing | 
    
| PublicationTitleAbbrev | cluster | 
    
| PublicationYear | 2011 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0000668890 ssib015832626 ssj0037306  | 
    
| Score | 2.0242434 | 
    
| Snippet | Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 455 | 
    
| SubjectTerms | Arrays Data models Indexing Layout Libraries  | 
    
| Title | FastQuery: A Parallel Indexing System for Scientific Data | 
    
| URI | https://ieeexplore.ieee.org/document/6061134 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGH7ZdvI0dRO_ycGj3Zrmo6k3mY4pTiY62G0kbQKibDLbg_5689FNEQ_e2kJK8_I2eZK8z_MAnJk4M1ms80hLRSIqjYpEQdIoTbgkFvIy4-3exvd8NKW3MzZrwPmGC6O19sVnuucu_Vl-scwrt1XWt2AbY0Kb0EwFD1ytde5gZlOT11AijMJceI5nGJWJzWTPNGLMLb4o9SQvljqHOobX2k_1fVJrAOE46w_upo8WWAaxT8e4_mHC4uegYRvG668PpScvvapUvfzzl7Djf7u3Dd1vth-abOaxHWjoxS6013YPqP77O5AN5Xv5UOnVxwW6RBO5cjYsr-jG6S3ahiiInyOLgkMbX4WErmQpuzAdXj8NRlHtvBA9WzhRRhmlknEiC5Oa2DBlg6nitIhp7hTupIydahhOCpFR6QCg0JkyieDGrm0xLxKyB63FcqH3AVHbJWHfZXJJKCmwsDhZ5zxVTHCpjD6AjovE_C2Ia8zrIBz-_fgItsKmrivyOoZWuar0iUUFpTr16fAF0fer4g | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGG4QD3pCBeO3PXh0uK4f67wZlIACwQgJN9JubWI0YHA76K-3Hxsa48HbtqTL-uZd-7R9n-cB4EKHiU5ClQZKSBwQoWXAMxwHccQENpCXamf3Nhyx3pTcz-isBi7XXBillCs-U2176c7ys2Va2K2yKwO2EcJkA2xSQgj1bK0qexA1yclKMOHHYcYdy9OPy9jksuMaUWqXX4Q4mheNrUcdRZX6U3kflSpAKEyuOoPpk4GWXu7Tcq5_2LC4WajbAMPq-33xyUu7yGU7_fwl7fjfDu6A1jffD47XM9kuqKnFHmhUhg-w_P-bIOmK9_yxUKuPa3gDx2JljVheYd8qLpqG0MufQ4ODfRtXhwRvRS5aYNq9m3R6Qem9EDwbQJEHCSGCMiwyHetQU2mCKcM4C0lqNe6ECK1uGIoynhBhISBXidQRZ9qsbhHLIrwP6ovlQh0ASEyXuHmXTgUmOEPcIGWVslhSzoTU6hA0bSTmb15eY14G4ejvx-dgqzcZDuaD_ujhGGz7LV5b8nUC6vmqUKcGI-TyzKXGFw7Vry8 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=FastQuery%3A+A+Parallel+Indexing+System+for+Scientific+Data&rft.au=Chou%2C+J.&rft.au=Kesheng+Wu&rft.au=Prabhat%2C+P.&rft.date=2011-09-01&rft.pub=IEEE&rft.isbn=9781457713552&rft.issn=1552-5244&rft.spage=455&rft.epage=464&rft_id=info:doi/10.1109%2FCLUSTER.2011.86&rft.externalDocID=6061134 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1552-5244&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1552-5244&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1552-5244&client=summon |