Combining Vertex-Centric Graph Processing with SPARQL for Large-Scale RDF Data Analytics
Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support either declarative SPARQL queries, or generic graph processing, but not both. We bridge the gap by introducing Spartex, a versatile framework fo...
Saved in:
| Published in | IEEE transactions on parallel and distributed systems Vol. 28; no. 12; pp. 3374 - 3388 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.12.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1045-9219 1558-2183 |
| DOI | 10.1109/TPDS.2017.2720174 |
Cover
| Abstract | Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support either declarative SPARQL queries, or generic graph processing, but not both. We bridge the gap by introducing Spartex, a versatile framework for complex RDF analytics. Spartex extends SPARQL to combine seamlessly generic graph algorithms (e.g., PageRank, Shortest Paths, etc.) with SPARQL queries. Spartex builds on existing vertex-centric graph processing frameworks, such as Graphlab or Pregel. It implements a generic SPARQL operator as a vertex-centric program that interprets SPARQL queries and executes them efficiently using a built-in optimizer. In addition, any graph algorithm implemented in the underlying vertex-centric framework, can be executed in Spartex. We present various scenarios where our framework simplifies significantly the implementation of complex RDF data analytics programs. We demonstrate that Spartex scales to datasets with billions of edges, and show that our core SPARQL engine is at least as fast as the state-of-the-art specialized RDF engines. For complex analytical tasks that combine generic graph processing with SPARQL, Spartex is at least an order of magnitude faster than existing alternatives. |
|---|---|
| AbstractList | Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support either declarative SPARQL queries, or generic graph processing, but not both. We bridge the gap by introducing Spartex, a versatile framework for complex RDF analytics. Spartex extends SPARQL to combine seamlessly generic graph algorithms (e.g., PageRank, Shortest Paths, etc.) with SPARQL queries. Spartex builds on existing vertex-centric graph processing frameworks, such as Graphlab or Pregel. It implements a generic SPARQL operator as a vertex-centric program that interprets SPARQL queries and executes them efficiently using a built-in optimizer. In addition, any graph algorithm implemented in the underlying vertex-centric framework, can be executed in Spartex. We present various scenarios where our framework simplifies significantly the implementation of complex RDF data analytics programs. We demonstrate that Spartex scales to datasets with billions of edges, and show that our core SPARQL engine is at least as fast as the state-of-the-art specialized RDF engines. For complex analytical tasks that combine generic graph processing with SPARQL, Spartex is at least an order of magnitude faster than existing alternatives. |
| Author | Abdelaziz, Ibrahim Salihoglu, Semih Kalnis, Panos Harbi, Razen |
| Author_xml | – sequence: 1 givenname: Ibrahim surname: Abdelaziz fullname: Abdelaziz, Ibrahim email: brahim.abdelaziz@kaust.edu.sa organization: King Abdullah Univ. of Sci. & Technol., Thuwal, Saudi Arabia – sequence: 2 givenname: Razen surname: Harbi fullname: Harbi, Razen email: razen.harbi@aramco.com organization: Saudi Aramco, Thuwal, Saudi Arabia – sequence: 3 givenname: Semih surname: Salihoglu fullname: Salihoglu, Semih email: semih.salihoglu@uwaterloo.ca organization: Univ. of Waterloo, Waterloo, ON, Canada – sequence: 4 givenname: Panos surname: Kalnis fullname: Kalnis, Panos email: panos.kalnis@kaust.edu.sa organization: King Abdullah Univ. of Sci. & Technol., Thuwal, Saudi Arabia |
| BookMark | eNp9kEFPAjEQhRuDiYD-AOOliefFdtvdbo8EBE02EQGNt023dKFk2WJbovx7u4F48OBpZvLem8x8PdBpTKMAuMVogDHiD8vZeDGIEWaDmLWFXoAuTpIsinFGOqFHNIl4jPkV6Dm3RQjTBNEu-BiZXakb3azhu7JefUcj1XirJZxasd_AmTVSOdfqX9pv4GI2nL_msDIW5sKuVbSQolZwPp7AsfACDhtRH72W7hpcVqJ26uZc--Bt8rgcPUX5y_R5NMwjGXPiIyw4q4jCXGYESSlLmciYiFQyhngQsjJBFRV0Fb5M0ypehRErKquKE5aViPTB_Wnv3prPg3K-2JqDDVe4Ig4YaMpISoOLnVzSGuesqgqpvfDahF-FrguMihZj0WIsWn7FGWNI4j_JvdU7YY__Zu5OGa2U-vUznvCUYvIDbzp-qQ |
| CODEN | ITDSEO |
| CitedBy_id | crossref_primary_10_14778_3151106_3151109 crossref_primary_10_3390_analytics2010004 crossref_primary_10_1145_3186728_3164144 crossref_primary_10_3390_machines13010058 |
| Cites_doi | 10.1145/1999299.1999303 10.1109/ICDM.2009.14 10.1007/s00778-009-0165-y 10.14778/2536349.2536352 10.1145/1807167.1807184 10.1145/1940747.1940751 10.1007/978-3-319-11915-1_6 10.1007/978-3-319-13186-3_58 10.1109/ICDE.2015.7113332 10.1186/2041-1480-4-33 10.1186/1471-2105-10-S5-S4 10.1145/2484838.2484843 10.1109/ICDE.2014.6816681 10.1109/BigData.2014.7004371 10.1109/TKDE.2011.103 10.1145/2588555.2610511 10.14778/2733004.2733057 10.1109/BigData.2013.6691601 10.1007/s00778-016-0420-y 10.14778/2556549.2556573 10.14778/2212351.2212354 10.1145/1327452.1327492 10.1145/1367497.1367578 10.1145/2213836.2213895 10.1007/s00778-013-0337-7 10.14778/2535570.2488333 10.1109/ICDE.2011.5767868 10.1145/2463676.2467799 10.1145/1772690.1772696 10.14778/2556549.2556571 10.14778/2556549.2556572 10.14778/2977797.2977806 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TPDS.2017.2720174 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 1558-2183 |
| EndPage | 3388 |
| ExternalDocumentID | 10_1109_TPDS_2017_2720174 7959641 |
| Genre | orig-research |
| GroupedDBID | --Z -~X .DC 0R~ 29I 4.4 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS TN5 TWZ UHB AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c293t-1a97f3e19c830cccbc5c23a6c7709f3e8b50f4a4d11066f2d50f1e4cff9378b03 |
| IEDL.DBID | RIE |
| ISSN | 1045-9219 |
| IngestDate | Sun Oct 05 00:17:17 EDT 2025 Wed Oct 01 02:59:17 EDT 2025 Thu Apr 24 23:02:07 EDT 2025 Wed Aug 27 02:52:21 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 12 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c293t-1a97f3e19c830cccbc5c23a6c7709f3e8b50f4a4d11066f2d50f1e4cff9378b03 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-1449-5115 |
| PQID | 2174467364 |
| PQPubID | 85437 |
| PageCount | 15 |
| ParticipantIDs | ieee_primary_7959641 crossref_primary_10_1109_TPDS_2017_2720174 crossref_citationtrail_10_1109_TPDS_2017_2720174 proquest_journals_2174467364 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Dec.-1 2017-12-1 20171201 |
| PublicationDateYYYYMMDD | 2017-12-01 |
| PublicationDate_xml | – month: 12 year: 2017 text: 2017-Dec.-1 day: 01 |
| PublicationDecade | 2010 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on parallel and distributed systems |
| PublicationTitleAbbrev | TPDS |
| PublicationYear | 2017 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref13 ref34 ref12 (ref20) 0 ref31 ref33 ref32 ref10 gallego (ref24) 2011 zaharia (ref37) 2010; 10 ref2 ref1 ref39 ref38 ref16 schätzle (ref26) 2016; 9 goodman (ref36) 2014 techentin (ref15) 2014 xin (ref30) 2014 ref23 deweese (ref40) 2013 papailiou (ref11) 2013 ref25 ref42 ref41 gonzalez (ref17) 2012 wang (ref18) 2013; 13 ref22 ref44 ref21 ref43 ref28 ref27 (ref19) 0 ref29 ref8 ref7 ref9 ref4 qi (ref14) 2013 ref3 ref6 ref5 |
| References_xml | – ident: ref32 doi: 10.1145/1999299.1999303 – ident: ref31 doi: 10.1109/ICDM.2009.14 – ident: ref5 doi: 10.1007/s00778-009-0165-y – start-page: 25 year: 2014 ident: ref36 article-title: Using vertex-centric programming platforms to implement SPARQL queries on large graphs publication-title: Proc 4th Workshop Irregular Appl Archit Algorithms – ident: ref6 doi: 10.14778/2536349.2536352 – ident: ref16 doi: 10.1145/1807167.1807184 – ident: ref28 doi: 10.1145/1940747.1940751 – ident: ref2 doi: 10.1007/978-3-319-11915-1_6 – ident: ref3 doi: 10.1007/978-3-319-13186-3_58 – ident: ref27 doi: 10.1109/ICDE.2015.7113332 – ident: ref4 doi: 10.1186/2041-1480-4-33 – ident: ref1 doi: 10.1186/1471-2105-10-S5-S4 – ident: ref21 doi: 10.1145/2484838.2484843 – ident: ref41 doi: 10.1109/ICDE.2014.6816681 – ident: ref13 doi: 10.1109/BigData.2014.7004371 – ident: ref29 doi: 10.1109/TKDE.2011.103 – ident: ref8 doi: 10.1145/2588555.2610511 – ident: ref44 doi: 10.14778/2733004.2733057 – ident: ref42 doi: 10.1109/BigData.2013.6691601 – start-page: 236 year: 2013 ident: ref14 article-title: Clustering remote RDF data using SPARQL update queries publication-title: Proc IEEE 29th Int Conf Data Eng Workshops – ident: ref9 doi: 10.1007/s00778-016-0420-y – year: 0 ident: ref20 – ident: ref43 doi: 10.14778/2556549.2556573 – ident: ref22 doi: 10.14778/2212351.2212354 – ident: ref33 doi: 10.1145/1327452.1327492 – year: 2014 ident: ref30 article-title: Graphx: Unifying data-parallel and graph-parallel analytics publication-title: arXiv 1402 2394 – ident: ref38 doi: 10.1145/1367497.1367578 – year: 2011 ident: ref24 article-title: An empirical study of real-world SPARQL queries publication-title: USEWOD – ident: ref35 doi: 10.1145/2213836.2213895 – ident: ref7 doi: 10.1007/s00778-013-0337-7 – ident: ref12 doi: 10.14778/2535570.2488333 – start-page: 255 year: 2013 ident: ref11 article-title: H2RDF+: High-performance distributed joins over large-scale RDF graphs publication-title: Proc IEEE Int Conf Big Data – year: 0 ident: ref19 – start-page: 930 year: 2013 ident: ref40 article-title: Graph Clustering in SPARQL publication-title: Proc SIAM Workshop Netw Sci – ident: ref39 doi: 10.1109/ICDE.2011.5767868 – volume: 13 start-page: 1 year: 2013 ident: ref18 article-title: Asynchronous large-scale graph processing made easy publication-title: CIDR – start-page: 216 year: 2014 ident: ref15 article-title: Implementing Iterative Algorithms with SPARQL publication-title: EDBT/ICDT Workshops – start-page: 17 year: 2012 ident: ref17 article-title: PowerGraph: Distributed graph-parallel computation on natural graphs publication-title: Proc 10th USENIX Conf Oper Syst Des Implementation – ident: ref23 doi: 10.1145/2463676.2467799 – ident: ref25 doi: 10.1145/1772690.1772696 – volume: 10 year: 2010 ident: ref37 article-title: Spark: Cluster computing with working sets publication-title: Proc HotCloud – ident: ref10 doi: 10.14778/2556549.2556571 – ident: ref34 doi: 10.14778/2556549.2556572 – volume: 9 start-page: 804 year: 2016 ident: ref26 publication-title: Proc VLDB Endowment doi: 10.14778/2977797.2977806 |
| SSID | ssj0014504 |
| Score | 2.3063908 |
| Snippet | Modern applications require sophisticated analytics on RDF graphs that combine structural queries with generic graph computations. Existing systems support... |
| SourceID | proquest crossref ieee |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 3374 |
| SubjectTerms | Algorithm design and analysis Algorithms Analytics Data analysis Filtering algorithms graph analytics Graphical models Matched filters Mathematical analysis Pattern matching Query processing RDF data Resource description framework Search engines SPARQL State of the art Task complexity vertex-centric |
| Title | Combining Vertex-Centric Graph Processing with SPARQL for Large-Scale RDF Data Analytics |
| URI | https://ieeexplore.ieee.org/document/7959641 https://www.proquest.com/docview/2174467364 |
| Volume | 28 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 1558-2183 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014504 issn: 1045-9219 databaseCode: RIE dateStart: 19900101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT9swFH5inLbDCgVEN5h82GnCxW6cOD5WlIImOhUKqLcodmwJgVrUpdLEX4-f41ZjQ4hbIttRlM-On9-P7wP4XvlNS5WlpjZhmqK8Ns1TWSHvbSKdy0ob_B2jX9n5jfg5TacbcLSuhbHWhuQz28XLEMuv5maJrrJj1MXOsEr9g8yzplZrHTEQaZAK9KeLlCq_DGMEkzN1fD0eTDCJS3Yx6MileLEHBVGV__7EYXsZtmC0erEmq-S-u6x11zz9w9n43jffgs_RziT9ZmJsw4adtaG10nAgcUm34dNfhIQ7MPUddJCMILd2Uds_NHh_7ww5Q2JrEssKsB0duGQy7l9dXhBv-JILTCmnEw-5JVeDIRmUdUkC5QkSQe_CzfD0-uScRu0FarwBUFNeKukSy5XJE2aM0SY1vaTMjJRM-YZcp8yJUlT-C2eZ61X-llthnPP2Tq5Zsgebs_nM7gNxLuF-LLdVooXWVjGnqp7kXGmd9kzaAbZCozCRmBz1MR6KcEBhqkAAC0SuiAB24Md6yGPDyvFW5x0EZN0xYtGBgxXkRVy3vws8oAlMdRNfXh_1FT7iQ5uElgPYrBdLe-jNklp_C_PxGTmA3Qc |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT9swFH5C7DA4wMYPUcaGD5wQLnZjJ_URresKtAhoQb1FsWNL01CLIJXQ_vr5OW4FA03cEtlWonx2_Px-fB_AQek3LVUUmtqEaYry2rQtsxJ5b5PMubSwwd8xuEh7N-JsLMdLcLSohbHWhuQz28TLEMsvp2aGrrJj1MVOsUr9gxRCyLpaaxEzEDKIBfrzhaTKL8QYw-RMHY8uO0NM48qaGHbkmXixCwVZlVf_4rDBdNdhMH-1Oq_kd3NW6ab58w9r43vf_ROsRUuTnNRT4zMs2ckGrM9VHEhc1Buw-oyScBPGvoMOohHk1j5U9okG_-8vQ34itTWJhQXYji5cMrw8ub7qE2_6kj4mldOhB92S606XdIqqIIH0BKmgt-Cm-2P0vUej-gI13gSoKC9U5hLLlWknzBijjTStpEhNljHlG9paMicKUfovnKauVfpbboVxzls8bc2SbVieTCd2B4hzCfdjuS0TLbS2ijlVtjLOldayZWQD2ByN3ERqclTIuMvDEYWpHAHMEbk8AtiAw8WQ-5qX43-dNxGQRceIRQP25pDnceU-5nhEE5jsJnbfHrUPH3ujQT_vn16cf4EVfECd3rIHy9XDzH71Rkqlv4W5-Rf1iOBU |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Combining+Vertex-Centric+Graph+Processing+with+SPARQL+for+Large-Scale+RDF+Data+Analytics&rft.jtitle=IEEE+transactions+on+parallel+and+distributed+systems&rft.au=Abdelaziz%2C+Ibrahim&rft.au=Harbi%2C+Razen&rft.au=Salihoglu%2C+Semih&rft.au=Kalnis%2C+Panos&rft.date=2017-12-01&rft.issn=1045-9219&rft.volume=28&rft.issue=12&rft.spage=3374&rft.epage=3388&rft_id=info:doi/10.1109%2FTPDS.2017.2720174&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPDS_2017_2720174 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1045-9219&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1045-9219&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1045-9219&client=summon |