Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic Literature

Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of research data, it is essential to articulate how data are actually used in research. We leveraged a combination of computational methods and...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM Joint Conference on Digital Libraries (Online) pp. 215 - 227
Main Authors Fan, Lizhou, Lafia, Sara, Wofford, Morgan, Thomer, Andrea, Yakel, Elizabeth, Hemphill, Libby
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2023
Subjects
Online AccessGet full text
ISSN2575-8152
DOI10.1109/JCDL57899.2023.00039

Cover

Abstract Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of research data, it is essential to articulate how data are actually used in research. We leveraged a combination of computational methods and human analysis to characterize different types of data use by mining semantic relations from the phrases where data are referenced in academic literature. In particular, we investigated references to data in the bibliography of a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). After retrieving and extracting semantic relations as subject-relation-object triples, we used rule-based methods to classify them. We then annotated samples from 11 frequent classes of data reference triples and found that they vary primarily along two dimensions of data use: proximity and function. Proximity describes the distance between the author and the data they reference (e.g., direct or indirect engagement). Function describes the role that data plays in each reference (e.g., describing interaction or providing context). These semantic relationships between authors and data reveal the ways data are used in scientific publications. Evidence of the variety of ways data are used can help stakeholders in research data curation and stewardship - including data providers, data curators, and data users - recognize the myriad ways that their investments in data sharing are realized.
AbstractList Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of research data, it is essential to articulate how data are actually used in research. We leveraged a combination of computational methods and human analysis to characterize different types of data use by mining semantic relations from the phrases where data are referenced in academic literature. In particular, we investigated references to data in the bibliography of a large social science data archive, the Inter-university Consortium for Political and Social Research (ICPSR). After retrieving and extracting semantic relations as subject-relation-object triples, we used rule-based methods to classify them. We then annotated samples from 11 frequent classes of data reference triples and found that they vary primarily along two dimensions of data use: proximity and function. Proximity describes the distance between the author and the data they reference (e.g., direct or indirect engagement). Function describes the role that data plays in each reference (e.g., describing interaction or providing context). These semantic relationships between authors and data reveal the ways data are used in scientific publications. Evidence of the variety of ways data are used can help stakeholders in research data curation and stewardship - including data providers, data curators, and data users - recognize the myriad ways that their investments in data sharing are realized.
Author Fan, Lizhou
Wofford, Morgan
Yakel, Elizabeth
Thomer, Andrea
Hemphill, Libby
Lafia, Sara
Author_xml – sequence: 1
  givenname: Lizhou
  surname: Fan
  fullname: Fan, Lizhou
  email: lizhouf@umich.edu
  organization: School of Information, University of Michigan,Ann Arbor,Michigan,USA
– sequence: 2
  givenname: Sara
  surname: Lafia
  fullname: Lafia, Sara
  email: slafia@umich.edu
  organization: Inter-university Consortium for Political and Social Research, University of Michigan,Ann Arbor,Michigan,USA
– sequence: 3
  givenname: Morgan
  surname: Wofford
  fullname: Wofford, Morgan
  email: mwofford@umich.edu
  organization: School of Information, University of Michigan,Ann Arbor,Michigan,USA
– sequence: 4
  givenname: Andrea
  surname: Thomer
  fullname: Thomer, Andrea
  email: athomer@arizona.edu
  organization: School of Information, University of Michigan,Ann Arbor,Michigan,USA
– sequence: 5
  givenname: Elizabeth
  surname: Yakel
  fullname: Yakel, Elizabeth
  email: yakel@umich.edu
  organization: School of Information, University of Michigan,Ann Arbor,Michigan,USA
– sequence: 6
  givenname: Libby
  surname: Hemphill
  fullname: Hemphill, Libby
  email: libbyh@umich.edu
  organization: Inter-university Consortium for Political and Social Research, University of Michigan,Ann Arbor,Michigan,USA
BookMark eNotjMtOwzAURA0CiVLyB134B1L8iB17WbU8FYRU6Lq6ca6pUeogxyz4e4LKakZHc-aaXMQhIiELzpacM3v7vN40qjbWLgUTcskYk_aMFLa2RqqpW8nNOZkJVavScCWuSDGOn38zwXmt5Iz0LyGG-EHf8AgxB0e32EMOQxxpiHQDGSbiMWF0ONI80F3sMI0ZYkfzAel26Cc--Gk1IiR3ODmTu3LQ4XF6bELGBPk74Q259NCPWPznnOzu797Xj2Xz-vC0XjUlCKNzKYRzteQeKg1SIkoBTJmWG-RtWxnuBTKnwTlulK9aNB20XWW0kL6uK6XlnCxOvwER918pHCH97DkTWjNp5C-sJFxG
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/JCDL57899.2023.00039
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
Accès INSA - IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Library & Information Science
EISBN 9798350399318
EISSN 2575-8152
EndPage 227
ExternalDocumentID 10266038
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  grantid: 1930645,2121789
  funderid: 10.13039/1000000010
GroupedDBID 6IE
6IL
6IN
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a286t-22cc731fa46a33ee32a058b18e1bb481f2e0c6acc185f4be8dabd48623f774563
IEDL.DBID RIE
IngestDate Wed Aug 27 02:49:56 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a286t-22cc731fa46a33ee32a058b18e1bb481f2e0c6acc185f4be8dabd48623f774563
PageCount 13
ParticipantIDs ieee_primary_10266038
PublicationCentury 2000
PublicationDate 2023-June
PublicationDateYYYYMMDD 2023-06-01
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June
PublicationDecade 2020
PublicationTitle IEEE/ACM Joint Conference on Digital Libraries (Online)
PublicationTitleAbbrev JCDL
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211753
ssib057256041
Score 1.8851234
Snippet Research data serves important roles in scientific discovery and academic innovation. To appropriately assign credit for data work and to measure the value of...
SourceID ieee
SourceType Publisher
StartPage 215
SubjectTerms Bibliographies
Data analysis
information extraction
knowledge discovery
Libraries
Particle measurements
research data management
semantic triples
Semantics
Social sciences
Technological innovation
text mining
Title Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic Literature
URI https://ieeexplore.ieee.org/document/10266038
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA1uTz75NfFrch_Et840adP0eXOMMYeog72NJE1gqJ1o9-Kv96ZfiiD4Vgo3hOSm57Y951xCrhzL4gxTIYg05UHkhA40FS6IMob1hnMpxnm2xVxMFtF0GS9rsXqphbHWluQzO_CX5b_8bGO2_lMZnnCEE8plh3QSKSqxVpM8ceLBu8Z2_xjmrHShrOVyIU1vpsPRDBM09foU5o1NqW8R_qOpSokp4z0yb2ZTUUmeB9tCD8znL6PGf093n_S-5Xtw3wLTAdmx-SHp1woFuIZaguS3BOqzfURe7spWEfBoX3Gx1wZamhyscxipQkFrSvsBxQYWrSwGsIiEB-8MBRsHDZevisHYhoIPs9bCuUcW49un4SSoWzEEiklRBIwZk_DQqUgozq3lTNFY6lDaUOtIho5ZaoQyBuHfRdrKTOkswrcl7rC-jAU_Jt18k9sTAgmGKSoM97412mG9gqNo7UKZOiYsOyU9v5Srt8ptY9Ws4tkf98_Jrt_Oir51QbrF-9b2sVAo9GWZIF-Hn7xf
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFA06H_TJr4lf0_sgvnWmSZq1z5tjzm6IbrC3kaQJDLUV7V789Sb9UgTBt1K4ISQ3vaftOecidGVIEiQ2FTwmMfWY4dKTmBuPJcTiDWMiG-fYFlM-mrPxIlhUYvVCC6O1Lshnuusui3_5SabW7lOZPeG2nGAabqKtgDEWlHKtOn2CnivfVXV3D2JKCh_KSjDn4-hm3B_ENkUjp1AhztoUuybhP9qqFFVluIum9XxKMslzd53Lrvr8ZdX47wnvofa3gA8emtK0jzZ0eoA6lUYBrqESIblNgep0H6KXSdEsAp70q13ulYKGKAerFAYiF9DY0n5AnsG8EcaAhZHw6LyhIDNQs_nKGBtbk_Ahbkyc22g-vJ31R17VjMETJOS5R4hSPeobwbigVGtKBA5C6Yfal5KFviEaKy6UsgDAMKnDRMiE2fclaizCDDg9Qq00S_Uxgp4NE5gr6pxrpLGIxY4ipfHDyBCuyQlqu6VcvpV-G8t6FU__uH-JtkezSbyM76b3Z2jHbW1J5jpHrfx9rTsWNuTyokiWL5Inv6w
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+Joint+Conference+on+Digital+Libraries+%28Online%29&rft.atitle=Mining+Semantic+Relations+in+Data+References+to+Understand+the+Roles+of+Research+Data+in+Academic+Literature&rft.au=Fan%2C+Lizhou&rft.au=Lafia%2C+Sara&rft.au=Wofford%2C+Morgan&rft.au=Thomer%2C+Andrea&rft.date=2023-06-01&rft.pub=IEEE&rft.eissn=2575-8152&rft.spage=215&rft.epage=227&rft_id=info:doi/10.1109%2FJCDL57899.2023.00039&rft.externalDocID=10266038