GFTE: Graph-Based Financial Table Extraction

Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF)...

Full description

Saved in:
Bibliographic Details
Published inPattern Recognition. ICPR International Workshops and Challenges Vol. 12662; pp. 644 - 658
Main Authors Li, Yiren, Huang, Zheng, Yan, Junchi, Zhou, Yi, Ye, Fan, Liu, Xianhui
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2021
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030687892
3030687899
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-68790-8_50

Cover

Abstract Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images, which are difficult to be extracted directly. In this paper, to facilitate deep learning based table extraction from unstructured digital files, we publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds and their corresponding structure representation in JSON. In addition, we propose a novel graph-based convolutional neural network model named GFTE as a baseline for future comparison. GFTE integrates image feature, position feature and textual feature together for precise edge prediction and reaches overall good results https://github.com/Irene323/GFTE.
AbstractList Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison. However, in financial industry and many other fields, tables are often disclosed in unstructured digital files, e.g. Portable Document Format (PDF) and images, which are difficult to be extracted directly. In this paper, to facilitate deep learning based table extraction from unstructured digital files, we publish a standard Chinese dataset named FinTab, which contains more than 1,600 financial tables of diverse kinds and their corresponding structure representation in JSON. In addition, we propose a novel graph-based convolutional neural network model named GFTE as a baseline for future comparison. GFTE integrates image feature, position feature and textual feature together for precise edge prediction and reaches overall good results https://github.com/Irene323/GFTE.
Author Yan, Junchi
Zhou, Yi
Li, Yiren
Huang, Zheng
Ye, Fan
Liu, Xianhui
Author_xml – sequence: 1
  givenname: Yiren
  orcidid: 0000-0002-8684-628X
  surname: Li
  fullname: Li, Yiren
  email: irene716@sjtu.edu.cn
– sequence: 2
  givenname: Zheng
  surname: Huang
  fullname: Huang, Zheng
– sequence: 3
  givenname: Junchi
  surname: Yan
  fullname: Yan, Junchi
– sequence: 4
  givenname: Yi
  surname: Zhou
  fullname: Zhou, Yi
– sequence: 5
  givenname: Fan
  surname: Ye
  fullname: Ye, Fan
– sequence: 6
  givenname: Xianhui
  surname: Liu
  fullname: Liu, Xianhui
BookMark eNpNUMtOwzAQNFAQbekfcMgHYFg_Y3ODqi1IlbiUs2UnGxqIkhAHic_HpRw4rHY1uzPamRmZtF2LhFwzuGUA-Z3NDRUUBFBtcgvUOAUnZJFgkcBfzJySKdOMUSGkPfu_M5ZPyDTNnNpcigsyY1wy0DpX6pIsYnwHAK6ACyWm5Gaz3q3us83g-z199BHLbF23vi1q32Q7HxrMVt_j4Iux7torcl75JuLir8_J63q1Wz7R7cvmefmwpT1TBqgKBVYeqxJKlMiUlb7gQVcs18YgllUQSrJgg_cFKggix9JXLJh0xlAKMSf8qBv7oW7fcHCh6z6iY-AO-bhk1gmXLLrfLNwhn0SSR1I_dJ9fGEeHB1aBbXq_Kfa-H3GITiswklun81TKih9Or2SI
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2021
Copyright_xml – notice: Springer Nature Switzerland AG 2021
DBID FFUUA
DOI 10.1007/978-3-030-68790-8_50
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 9783030687908
3030687902
EISSN 1611-3349
Editor Del Bimbo, Alberto
Bertini, Marco
Vezzani, Roberto
Sclaroff, Stan
Mei, Tao
Farinella, Giovanni Maria
Cucchiara, Rita
Escalante, Hugo Jair
Editor_xml – sequence: 1
  fullname: Del Bimbo, Alberto
– sequence: 2
  fullname: Bertini, Marco
– sequence: 3
  fullname: Vezzani, Roberto
– sequence: 4
  fullname: Sclaroff, Stan
– sequence: 5
  fullname: Mei, Tao
– sequence: 6
  fullname: Farinella, Giovanni Maria
– sequence: 7
  fullname: Cucchiara, Rita
– sequence: 8
  fullname: Escalante, Hugo Jair
EndPage 658
ExternalDocumentID EBC6508429_679_659
GroupedDBID 38.
AABBV
AABLV
ABNDO
ACWLQ
AEDXK
AEJLV
AEKFX
AELOD
AIYYB
ALMA_UNASSIGNED_HOLDINGS
ARRAB
BAHJK
BBABE
CZZ
DBWEY
FFUUA
I4C
IEZ
OCUHQ
ORHYB
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p1580-5bcefaefd0de4e1594ac2b6f17688eedfb3541b9baace50b37edaf1b8ac21e433
ISBN 9783030687892
3030687899
ISSN 0302-9743
IngestDate Wed Sep 17 04:49:35 EDT 2025
Tue Oct 21 01:39:40 EDT 2025
IsPeerReviewed false
IsScholarly false
LCCallNum TA1634
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p1580-5bcefaefd0de4e1594ac2b6f17688eedfb3541b9baace50b37edaf1b8ac21e433
OCLC 1241066755
ORCID 0000-0002-8684-628X
PQID EBC6508429_679_659
PageCount 15
ParticipantIDs springer_books_10_1007_978_3_030_68790_8_50
proquest_ebookcentralchapters_6508429_679_659
PublicationCentury 2000
PublicationDate 2021
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – year: 2021
  text: 2021
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Image Processing, Computer Vision, Pattern Recognition, and Graphics
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle Virtual Event, January 10-15, 2021, Proceedings, Part II
PublicationTitle Pattern Recognition. ICPR International Workshops and Challenges
PublicationYear 2021
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  orcidid: 0000-0001-8816-2693
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002502353
ssj0002792
Score 1.825403
Snippet Tabular data is a crucial form of information expression, which can organize data in a standard structure for easy information retrieval and comparison....
SourceID springer
proquest
SourceType Publisher
StartPage 644
SubjectTerms Deep learning
Document analysis
Document image processing
Title GFTE: Graph-Based Financial Table Extraction
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6508429&ppg=659
http://link.springer.com/10.1007/978-3-030-68790-8_50
Volume 12662
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELZKuSAOvMUuD-XArQTFsZMmSBygaresygqhLlq4WPEj6l7a1SaVED-BX81MbCdp2MtyaFRZTuzMOOOZz57PhLzRYPBMLMsQXGsWcpnqUGYaviuupDIF56rh2f5yli7P-elFcjEa_entWtrX8p36fWNeyf9oFcpAr5glewvNtg-FAvgP-oUraBiuA-f3EGa1pBcNMyYC8G4LELQ6-Tz7-m2A8iEcXm12V5aMeeYPT2l96VWznv_jspcUttw7FPnnxriZDQ1D4fI4tmpz2QHOu729vz_4ThbrOWINJ0iHHX6CmVJPFi23x7pJ15r_qq9tVoW1bMi4XH1YuTWNs13dbBWb-GMnvBXqwxQxHcAUHqYciKDD2g7iWoaRTDbN8j70ycB2Q_RjzaGx5jpFEkZmSU-dCU4tn6SbzVNLDP_PRNHfGwJPDqE1zKkXCP_cgQ6Myd2P89PV9xavA1cxZkk3yyPxol2hsr3CvCHf69wyO3Vv0cvZvKnJg-hmsCDf-Dnrh-Q-5r4EmJQC8ntERmb7mDxwkUrgNFBBkdeKL3tC3qLO3wc9jQetxoNG40Gn8afkfDFfz5ahO4ojvKJJFoUJfLhlYUodacMNuMC8ULFMSwrRagZuVilZwqnMZVEok0SSTY0uSiozqEYNZ-wZGW93W_OcBHnEkrRkqoi04jGX-ZQqGVNW5JrlJqNHJPTCEM2GAbdLWdlXrwTGFOBFiXQKvyQ_IhMvMYHVK-GZuEHUggkQtWhELVDUx7eq_YLc68bySzKur_fmFTihtXztxsdfRD19CQ
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Pattern+Recognition.+ICPR+International+Workshops+and+Challenges&rft.au=Li%2C+Yiren&rft.au=Huang%2C+Zheng&rft.au=Yan%2C+Junchi&rft.au=Zhou%2C+Yi&rft.atitle=GFTE%3A+Graph-Based+Financial+Table+Extraction&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2021-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030687892&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=644&rft.epage=658&rft_id=info:doi/10.1007%2F978-3-030-68790-8_50
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6508429-l.jpg