LG4AV: Combining Language Models and Graph Neural Networks for Author Verification

The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship i...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Stubbemann, Maximilian, Stumme, Gerd
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 03.09.2021
Subjects	Authorship Bibliometrics Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Documents Graph neural networks Knowledge representation Neural networks Scientific papers Verification Web services
Online Access	Get full text
ISSN	2331-8422
DOI	10.48550/arxiv.2109.01479

Cover

More Information
Summary:	The automatic verification of document authorships is important in various settings. Researchers are for example judged and compared by the amount and impact of their publications and public figures are confronted by their posts on social media platforms. Therefore, it is important that authorship information in frequently used web services and platforms is correct. The question whether a given document is written by a given author is commonly referred to as authorship verification (AV). While AV is a widely investigated problem in general, only few works consider settings where the documents are short and written in a rather uniform style. This makes most approaches unpractical for online databases and knowledge graphs in the scholarly domain. Here, authorships of scientific publications have to be verified, often with just abstracts and titles available. To this point, we present our novel approach LG4AV which combines language models and graph neural networks for authorship verification. By directly feeding the available texts in a pre-trained transformer architecture, our model does not need any hand-crafted stylometric features that are not meaningful in scenarios where the writing style is, at least to some extent, standardized. By the incorporation of a graph neural network structure, our model can benefit from relations between authors that are meaningful with respect to the verification process. For example, scientific authors are more likely to write about topics that are addressed by their co-authors and twitter users tend to post about the same subjects as people they follow. We experimentally evaluate our model and study to which extent the inclusion of co-authorships enhances verification decisions in bibliometric environments.
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50
ISSN:	2331-8422
DOI:	10.48550/arxiv.2109.01479