FastQuery: A Parallel Indexing System for Scientific Data

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is th...

Full description

Saved in:
Bibliographic Details
Published in2011 IEEE International Conference on Cluster Computing pp. 455 - 464
Main Authors Chou, J., Kesheng Wu, Prabhat, P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2011
Subjects
Online AccessGet full text
ISBN9781457713552
1457713551
ISSN1552-5244
DOI10.1109/CLUSTER.2011.86

Cover

Abstract Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.
AbstractList Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can significantly improve accesses to these datasets by augmenting the user data with indexes and other secondary information. However, a challenge is that the indexes assume the relational data model but the scientific data generally follows the array data model. To match the two data models, we design a generic mapping mechanism and implement an efficient input and output interface for reading and writing the data and their corresponding indexes. To take advantage of the emerging many-core architectures, we also develop a parallel strategy for indexing using threading technology. This approach complements our on-going MPI-based parallelization efforts. We demonstrate the flexibility of our software by applying it to two of the most commonly used scientific data formats, HDF5 and NetCDF. We present two case studies using data from a particle accelerator model and a global climate model. We also conducted a detailed performance study using these scientific datasets. The results show that FastQuery speeds up the query time by a factor of 2.5x to 50x, and it reduces the indexing time by a factor of 16 on 24 cores.
Author Prabhat, P.
Kesheng Wu
Chou, J.
Author_xml – sequence: 1
  givenname: J.
  surname: Chou
  fullname: Chou, J.
  email: jchou@lbl.gov
– sequence: 2
  surname: Kesheng Wu
  fullname: Kesheng Wu
  email: kwu@lbl.gov
– sequence: 3
  givenname: P.
  surname: Prabhat
  fullname: Prabhat, P.
  email: prabhat@lbl.gov
BookMark eNotjctKw0AUQEesYFuzduFmfiDx3nmPuxJbLQR8pF2XSTIjI2kqSQTz9xb0bM7unAWZdafOE3KLkCGCvc-Lfblbv2cMEDOjLsgCtLJSSFTykiRWGxRSa-RSshmZ41mpZEJck2QYPuGMUsZYmBO7ccP49u376YGu6KvrXdv6lm67xv_E7oOW0zD6Iw2nnpZ19N0YQ6zpoxvdDbkKrh188u8l2W_Wu_w5LV6etvmqSCNqOaZWCCcVd03QAYKszucKdAOiBiPBOTAcGLLGWOEsZ8p4WwVmVEDOUDWML8ndXzd67w9ffTy6fjooUIhc8F-MbUl_
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CLUSTER.2011.86
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 0769545165
9780769545165
EndPage 464
ExternalDocumentID 6061134
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i175t-944a563adf7f0f5b066b07d04c0850aa0830212d894a93268e9bf286f13216d23
IEDL.DBID RIE
ISBN 9781457713552
1457713551
ISSN 1552-5244
IngestDate Wed Aug 27 03:00:07 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-944a563adf7f0f5b066b07d04c0850aa0830212d894a93268e9bf286f13216d23
PageCount 10
ParticipantIDs ieee_primary_6061134
PublicationCentury 2000
PublicationDate 2011-Sept.
PublicationDateYYYYMMDD 2011-09-01
PublicationDate_xml – month: 09
  year: 2011
  text: 2011-Sept.
PublicationDecade 2010
PublicationTitle 2011 IEEE International Conference on Cluster Computing
PublicationTitleAbbrev cluster
PublicationYear 2011
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000668890
ssib015832626
ssj0037306
Score 2.0242434
Snippet Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies such as FastBit can...
SourceID ieee
SourceType Publisher
StartPage 455
SubjectTerms Arrays
Data models
Indexing
Layout
Libraries
Title FastQuery: A Parallel Indexing System for Scientific Data
URI https://ieeexplore.ieee.org/document/6061134
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8MwGH7ZdvI0dRO_ycGj3Zrmo6k3mY4pTiY62G0kbQKibDLbg_5689FNEQ_e2kJK8_I2eZK8z_MAnJk4M1ms80hLRSIqjYpEQdIoTbgkFvIy4-3exvd8NKW3MzZrwPmGC6O19sVnuucu_Vl-scwrt1XWt2AbY0Kb0EwFD1ytde5gZlOT11AijMJceI5nGJWJzWTPNGLMLb4o9SQvljqHOobX2k_1fVJrAOE46w_upo8WWAaxT8e4_mHC4uegYRvG668PpScvvapUvfzzl7Djf7u3Dd1vth-abOaxHWjoxS6013YPqP77O5AN5Xv5UOnVxwW6RBO5cjYsr-jG6S3ahiiInyOLgkMbX4WErmQpuzAdXj8NRlHtvBA9WzhRRhmlknEiC5Oa2DBlg6nitIhp7hTupIydahhOCpFR6QCg0JkyieDGrm0xLxKyB63FcqH3AVHbJWHfZXJJKCmwsDhZ5zxVTHCpjD6AjovE_C2Ia8zrIBz-_fgItsKmrivyOoZWuar0iUUFpTr16fAF0fer4g
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGG4QD3pCBeO3PXh0uK4f67wZlIACwQgJN9JubWI0YHA76K-3Hxsa48HbtqTL-uZd-7R9n-cB4EKHiU5ClQZKSBwQoWXAMxwHccQENpCXamf3Nhyx3pTcz-isBi7XXBillCs-U2176c7ys2Va2K2yKwO2EcJkA2xSQgj1bK0qexA1yclKMOHHYcYdy9OPy9jksuMaUWqXX4Q4mheNrUcdRZX6U3kflSpAKEyuOoPpk4GWXu7Tcq5_2LC4WajbAMPq-33xyUu7yGU7_fwl7fjfDu6A1jffD47XM9kuqKnFHmhUhg-w_P-bIOmK9_yxUKuPa3gDx2JljVheYd8qLpqG0MufQ4ODfRtXhwRvRS5aYNq9m3R6Qem9EDwbQJEHCSGCMiwyHetQU2mCKcM4C0lqNe6ECK1uGIoynhBhISBXidQRZ9qsbhHLIrwP6ovlQh0ASEyXuHmXTgUmOEPcIGWVslhSzoTU6hA0bSTmb15eY14G4ejvx-dgqzcZDuaD_ujhGGz7LV5b8nUC6vmqUKcGI-TyzKXGFw7Vry8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+IEEE+International+Conference+on+Cluster+Computing&rft.atitle=FastQuery%3A+A+Parallel+Indexing+System+for+Scientific+Data&rft.au=Chou%2C+J.&rft.au=Kesheng+Wu&rft.au=Prabhat%2C+P.&rft.date=2011-09-01&rft.pub=IEEE&rft.isbn=9781457713552&rft.issn=1552-5244&rft.spage=455&rft.epage=464&rft_id=info:doi/10.1109%2FCLUSTER.2011.86&rft.externalDocID=6061134
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1552-5244&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1552-5244&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1552-5244&client=summon