LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship i...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Stubbemann, Maximilian, Stumme, Gerd
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 03.09.2021
Subjects
Online AccessGet full text
ISSN2331-8422
DOI10.48550/arxiv.2109.01479

Cover

Abstract The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
AbstractList The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
Author Stumme, Gerd
Stubbemann, Maximilian
Author_xml – sequence: 1
  givenname: Maximilian
  surname: Stubbemann
  fullname: Stubbemann, Maximilian
– sequence: 2
  givenname: Gerd
  surname: Stumme
  fullname: Stumme, Gerd
BackLink https://doi.org/10.1007/978-3-031-01333-1_25$$DView published paper (Access to full text may be restricted)
https://doi.org/10.48550/arXiv.2109.01479$$DView paper in arXiv
BookMark eNotj8tKw0AARQdRsNZ-gCsHXCfO--GuFI1CVFDpNkySSTq1namTxMffG1tXBy6Xyz1n4NgHbwG4wChlinN0beK3-0wJRjpFmEl9BCaEUpwoRsgpmHXdGiFEhCSc0wl4yTM2X97ARdiWzjvfwtz4djCthY-htpsOGl_DLJrdCj7ZIZrNiP4rxPcONiHC-dCvRixtdI2rTO-CPwcnjdl0dvbPKXi9u31b3Cf5c_awmOeJ4YQmttRKKdTgEguKMWK1IBXRQmrMJTGYGVlSjpGsJKrrijZj0GBc8rrUQlR0Ci4Pq3vfYhfd1sSf4s-72HuPjatDYxfDx2C7vliHIfrxUkG40EwRRSn9BZNVWv8
ContentType Paper
Journal Article
Copyright 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKY
GOX
DOI 10.48550/arxiv.2109.01479
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
arXiv Computer Science
arXiv.org
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
ExternalDocumentID 2109_01479
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKY
GOX
ID FETCH-LOGICAL-a523-eb98880f1b1631104d62c296791572a14a7b35107c70ddc3f4a7f11b5db966c3
IEDL.DBID GOX
IngestDate Tue Jul 22 21:57:39 EDT 2025
Mon Jun 30 09:17:46 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a523-eb98880f1b1631104d62c296791572a14a7b35107c70ddc3f4a7f11b5db966c3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://arxiv.org/abs/2109.01479
PQID 2569482833
PQPubID 2050157
ParticipantIDs arxiv_primary_2109_01479
proquest_journals_2569482833
PublicationCentury 2000
PublicationDate 20210903
2021-09-03
PublicationDateYYYYMMDD 2021-09-03
PublicationDate_xml – month: 09
  year: 2021
  text: 20210903
  day: 03
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2021
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.7736931
SecondaryResourceType preprint
Snippet The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact...
The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact...
SourceID arxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Authorship
Bibliometrics
Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Learning
Documents
Graph neural networks
Knowledge representation
Neural networks
Scientific papers
Verification
Web services
SummonAdditionalLinks – databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFA-6IXjzk02n5OA1W9ukTeNFRNyGTBE_xm4lXwVhdHWd4p_vS5bpQfAUSKGHl-T93u99InRBAdWksiVJaBkRZoUmeZrmJOWuEN2aVGSuwPn-IRu_srtZOgsOtyakVW50olfUZqGdj3wA0CwY0ANKr-p34qZGuehqGKGxjdpxAljrKsWHox8fS5JxsJjpOpjpW3cN5PLr7bMPPEf0gRy4BK623_qjij2-DPdQ-1HWdrmPtmx1gHZ8WqZuDtHTZMSup5cYXq3ykxzwJPgXsRtiNm-wrAweuabT2LXZkHNYfF53g8Eaxc4BBssUrlkZnHNH6Hl4-3IzJmEKApFAEolVAkhqVMYKLCfAamayRCci4yJOeSJjJrmi8LC45pExmpawUcaxSo0CJqPpMWpVi8p2EAYJZpHOM5ZrMJsoLNQwbeEPYFRJTruo40VR1Os-F4WTUuGl1EW9jXSKcMeb4vdETv7_fIp2E5cJ4sIwtIdaq-WHPQMoX6lzf17f6yqbtA
  priority: 102
  providerName: ProQuest
Title LG4AV: Combining Language Models and Graph Neural Networks for Author Verification
URI https://www.proquest.com/docview/2569482833
https://arxiv.org/abs/2109.01479
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED61ZWFBIEAFSuWBNZD4EcdsBbWpUFtQgapbFDuOhIQKagpi4rdzdlIxIJZYspwbzo_7vrPvDuCCoVXLtS0Dysow4FaZIBEiCYR0gei2ECp2Ac7TWTx-5ndLsWwB2cbC5Ouvl886P7CurpCPqEsE8VK1oU2pI1fp_bK-nPSpuJrxv-MQY_quP0ertxejfdhrgB4Z1DNzAC27OoT5JOWDxTXBXah9ZQYyafyFxBUle60IEnuSuiTSxKXNQAGz-p12RRBdEufQwmaBy6ZsnG1H8DgaPt2Og6aqQZAj6QusVkg6wzLSiITQ9vIipoaqWKpISJpHPJea4UaRRoZFYViJHWUUaVFoZCaGHUNn9bayXSAIVuLQJDFPDMIghg0ruLEoAUFSLtkJdL0qsvc6b0XmtJR5LZ1Ab6udrFmzVYbyFEcCxtjp_3-ewS51rzrclQrrQWez_rDnaJY3ug_tZJT2YedmOHuY9_1M4Xf6PfwB2RSOuA
linkProvider Cornell University
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB5qi-jNJ1ar7kGPaZvs5iWIiNqHTYtoLT0ZNpsNCCWtTX39J3-ks9tGD4K3nhY2sIeZycx88wQ4oWjVeCQTw6JJ3WDSF4Zn255hu6oRXca276gG527PaT2y26E9LMBX3gujyipznagVdTwWKkZeQ9PsM4QHlF5MXgy1NUplV_MVGnOx6MjPd4Rs2Xn7Gvl7almNm_5Vy1hsFTA4gi5DRj6CvnpiRuiJoO1jsWMJy3dc37Rdi5uMuxFFQXWFW49jQRO8SEwzsuMIkYGg-OoKlBilVBUQeo3mT0THclz0z-k8daoHhdX49OP5rYqoyq8iFFHlYiV99Ufxa2vW2IDSHZ_I6SYUZLoFq7oIVGTbcB802eXgjKCOiPTeCBIsoplErUwbZYSnMWmqEddEDfXgIzx0FXlG0PclKtyGxwCFOlmEAnfgYQnU2YViOk7lHhDkl1MXnsM8gU4axYPGTEh8AV047tIy7GlShJP5VI1QUSnUVCpDJadOuPijsvCX__v_fz6GtVa_G4RBu9c5gHVL1aCoBBCtQHE2fZWH6ETMoiPNOwJPyxWVb0-_1Wo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LG4AV%3A+Combining+Language+Models+and+Graph+Neural+Networks+for+Author+Verification&rft.jtitle=arXiv.org&rft.au=Stubbemann%2C+Maximilian&rft.au=Stumme%2C+Gerd&rft.date=2021-09-03&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2109.01479