LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship i...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Stubbemann, Maximilian, Stumme, Gerd
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 03.09.2021
Subjects	Authorship Bibliometrics Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Documents Graph neural networks Knowledge representation Neural networks Scientific papers Verification Web services
Online Access	Get full text
ISSN	2331-8422
DOI	10.48550/arxiv.2109.01479

Cover

Abstract	The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
AbstractList	The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments. The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
Author	Stumme, Gerd Stubbemann, Maximilian
Author_xml	– sequence: 1 givenname: Maximilian surname: Stubbemann fullname: Stubbemann, Maximilian – sequence: 2 givenname: Gerd surname: Stumme fullname: Stumme, Gerd
BackLink	https://doi.org/10.1007/978-3-031-01333-1_25$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2109.01479$$DView paper in arXiv
BookMark	eNotj8tKw0AARQdRsNZ-gCsHXCfO--GuFI1CVFDpNkySSTq1namTxMffG1tXBy6Xyz1n4NgHbwG4wChlinN0beK3-0wJRjpFmEl9BCaEUpwoRsgpmHXdGiFEhCSc0wl4yTM2X97ARdiWzjvfwtz4djCthY-htpsOGl_DLJrdCj7ZIZrNiP4rxPcONiHC-dCvRixtdI2rTO-CPwcnjdl0dvbPKXi9u31b3Cf5c_awmOeJ4YQmttRKKdTgEguKMWK1IBXRQmrMJTGYGVlSjpGsJKrrijZj0GBc8rrUQlR0Ci4Pq3vfYhfd1sSf4s-72HuPjatDYxfDx2C7vliHIfrxUkG40EwRRSn9BZNVWv8
ContentType	Paper Journal Article
Copyright	2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: 2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX
DOI	10.48550/arxiv.2109.01479
DatabaseName	ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv.org
DatabaseTitle	Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection
DatabaseTitleList	Publicly Available Content Database
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Physics
EISSN	2331-8422
ExternalDocumentID	2109_01479
Genre	Working Paper/Pre-Print
GroupedDBID	8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX
ID	FETCH-LOGICAL-a523-eb98880f1b1631104d62c296791572a14a7b35107c70ddc3f4a7f11b5db966c3
IEDL.DBID	GOX
IngestDate	Tue Jul 22 21:57:39 EDT 2025 Mon Jun 30 09:17:46 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a523-eb98880f1b1631104d62c296791572a14a7b35107c70ddc3f4a7f11b5db966c3
Notes	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50
OpenAccessLink	https://arxiv.org/abs/2109.01479
PQID	2569482833
PQPubID	2050157
ParticipantIDs	arxiv_primary_2109_01479 proquest_journals_2569482833
PublicationCentury	2000
PublicationDate	20210903 2021-09-03
PublicationDateYYYYMMDD	2021-09-03
PublicationDate_xml	– month: 09 year: 2021 text: 20210903 day: 03
PublicationDecade	2020
PublicationPlace	Ithaca
PublicationPlace_xml	– name: Ithaca
PublicationTitle	arXiv.org
PublicationYear	2021
Publisher	Cornell University Library, arXiv.org
Publisher_xml	– name: Cornell University Library, arXiv.org
SSID	ssj0002672553
Score	1.7736931
SecondaryResourceType	preprint
Snippet	The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact... The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact...
SourceID	arxiv proquest
SourceType	Open Access Repository Aggregation Database
SubjectTerms	Authorship Bibliometrics Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Documents Graph neural networks Knowledge representation Neural networks Scientific papers Verification Web services
SummonAdditionalLinks	– databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFA-6IXjzk02n5OA1W9ukTeNFRNyGTBE_xm4lXwVhdHWd4p_vS5bpQfAUSKGHl-T93u99InRBAdWksiVJaBkRZoUmeZrmJOWuEN2aVGSuwPn-IRu_srtZOgsOtyakVW50olfUZqGdj3wA0CwY0ANKr-p34qZGuehqGKGxjdpxAljrKsWHox8fS5JxsJjpOpjpW3cN5PLr7bMPPEf0gRy4BK623_qjij2-DPdQ-1HWdrmPtmx1gHZ8WqZuDtHTZMSup5cYXq3ykxzwJPgXsRtiNm-wrAweuabT2LXZkHNYfF53g8Eaxc4BBssUrlkZnHNH6Hl4-3IzJmEKApFAEolVAkhqVMYKLCfAamayRCci4yJOeSJjJrmi8LC45pExmpawUcaxSo0CJqPpMWpVi8p2EAYJZpHOM5ZrMJsoLNQwbeEPYFRJTruo40VR1Os-F4WTUuGl1EW9jXSKcMeb4vdETv7_fIp2E5cJ4sIwtIdaq-WHPQMoX6lzf17f6yqbtA priority: 102 providerName: ProQuest
Title	LG4AV: Combining Language Models and Graph Neural Networks for Author Verification
URI	https://www.proquest.com/docview/2569482833 https://arxiv.org/abs/2109.01479
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED61ZWFBIEAFSuWBNZD4EcdsBbWpUFtQgapbFDuOhIQKagpi4rdzdlIxIJZYspwbzo_7vrPvDuCCoVXLtS0Dysow4FaZIBEiCYR0gei2ECp2Ac7TWTx-5ndLsWwB2cbC5Ouvl886P7CurpCPqEsE8VK1oU2pI1fp_bK-nPSpuJrxv-MQY_quP0ertxejfdhrgB4Z1DNzAC27OoT5JOWDxTXBXah9ZQYyafyFxBUle60IEnuSuiTSxKXNQAGz-p12RRBdEufQwmaBy6ZsnG1H8DgaPt2Og6aqQZAj6QusVkg6wzLSiITQ9vIipoaqWKpISJpHPJea4UaRRoZFYViJHWUUaVFoZCaGHUNn9bayXSAIVuLQJDFPDMIghg0ruLEoAUFSLtkJdL0qsvc6b0XmtJR5LZ1Ab6udrFmzVYbyFEcCxtjp_3-ewS51rzrclQrrQWez_rDnaJY3ug_tZJT2YedmOHuY9_1M4Xf6PfwB2RSOuA
linkProvider	Cornell University
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB5qi-jNJ1ar7kGPaZvs5iWIiNqHTYtoLT0ZNpsNCCWtTX39J3-ks9tGD4K3nhY2sIeZycx88wQ4oWjVeCQTw6JJ3WDSF4Zn255hu6oRXca276gG527PaT2y26E9LMBX3gujyipznagVdTwWKkZeQ9PsM4QHlF5MXgy1NUplV_MVGnOx6MjPd4Rs2Xn7Gvl7almNm_5Vy1hsFTA4gi5DRj6CvnpiRuiJoO1jsWMJy3dc37Rdi5uMuxFFQXWFW49jQRO8SEwzsuMIkYGg-OoKlBilVBUQeo3mT0THclz0z-k8daoHhdX49OP5rYqoyq8iFFHlYiV99Ufxa2vW2IDSHZ_I6SYUZLoFq7oIVGTbcB802eXgjKCOiPTeCBIsoplErUwbZYSnMWmqEddEDfXgIzx0FXlG0PclKtyGxwCFOlmEAnfgYQnU2YViOk7lHhDkl1MXnsM8gU4axYPGTEh8AV047tIy7GlShJP5VI1QUSnUVCpDJadOuPijsvCX__v_fz6GtVa_G4RBu9c5gHVL1aCoBBCtQHE2fZWH6ETMoiPNOwJPyxWVb0-_1Wo
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LG4AV%3A+Combining+Language+Models+and+Graph+Neural+Networks+for+Author+Verification&rft.jtitle=arXiv.org&rft.au=Stubbemann%2C+Maximilian&rft.au=Stumme%2C+Gerd&rft.date=2021-09-03&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2109.01479