Extracting the Main Content of Web Pages Using the First Impression Area

Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extra...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 10; p. 1
Main Authors Jung, Geunseong, Han, Sungjae, Kim, Hansung, Kim, Kwanguk, Cha, Jaehyuk
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2022.3229080

Cover

Abstract Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks F 0.5 ) and non-English (up to 42%, matched text blocks F 0.5 ) web pages.
AbstractList Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks F 0.5 ) and non-English (up to 42%, matched text blocks F 0.5 ) web pages.
Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks <tex-math notation="LaTeX">$\mathrm {\mathbf {F_{0.5}}}$ </tex-math>) and non-English (up to 42%, matched text blocks <tex-math notation="LaTeX">$\mathrm {\mathbf {F_{0.5}}}$ </tex-math>) web pages.
Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers’ (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks [Formula Omitted]) and non-English (up to 42%, matched text blocks [Formula Omitted]) web pages.
Author Cha, Jaehyuk
Han, Sungjae
Kim, Kwanguk
Kim, Hansung
Jung, Geunseong
Author_xml – sequence: 1
  givenname: Geunseong
  orcidid: 0000-0003-1722-4214
  surname: Jung
  fullname: Jung, Geunseong
  organization: Department of Computer Science, Hanyang University, Seoul, South Korea
– sequence: 2
  givenname: Sungjae
  surname: Han
  fullname: Han, Sungjae
  organization: JEI Group, Seoul, South Korea
– sequence: 3
  givenname: Hansung
  surname: Kim
  fullname: Kim, Hansung
  organization: Department of Sociology, Hanyang University, Seoul, South Korea
– sequence: 4
  givenname: Kwanguk
  orcidid: 0000-0002-4184-2058
  surname: Kim
  fullname: Kim, Kwanguk
  organization: Department of Computer Science, Hanyang University, Seoul, South Korea
– sequence: 5
  givenname: Jaehyuk
  surname: Cha
  fullname: Cha, Jaehyuk
  organization: Department of Computer Science, Hanyang University, Seoul, South Korea
BookMark eNptkVtvEzEQhS1UJErpL-jLSjwn-LK-PUarlEYqAqlUPFpe72xwtLWD7Qj673HYEqEIv9gane_MzPFbdBFiAIRuCF4SgvWHVdetHx6WFFO6ZJRqrPArdEmJ0AvGmbj45_0GXee8w_WoWuLyEt2tf5VkXfFh25Tv0HyyPjRdDAVCaeLYfIO--WK3kJvH_Fdz61MuzeZpnyBnH0OzSmDfodejnTJcv9xX6PF2_bW7W9x__rjpVvcL12JVFhRwz-0gGKaKtwo4kRQTxpSkhGtdNcxJNVKJmRY9J4PCeBg5OKL0gEfGrtBm9h2i3Zl98k82PZtovflTiGlrbCreTWAYoUoIKlivxrZ1qhe9Uq2WQo5CDnD0amevQ9jb5592mk6GBJtjuMY6V5c0x3DNS7gVez9j-xR_HCAXs4uHFOrWhkouuCCCy6rSs8qlmHOC0ThfbKl51cD9dOowf995B3bGns_1f-pmpjwAnAitVSuYZL8BRgyifA
CODEN IAECCG
CitedBy_id crossref_primary_10_9728_dcs_2023_24_4_691
crossref_primary_10_1016_j_softx_2023_101501
Cites_doi 10.1016/j.cviu.2016.02.007
10.1007/978-3-030-98785-5_1
10.1145/3451168
10.1057/s41262-018-0092-6
10.1145/3316810
10.1145/3366424.3383547
10.1016/j.knosys.2014.07.007
10.1080/01449290500330448
10.1037/0278-7393.16.3.417
10.3390/app10113837
10.17705/1thci.00060
10.1016/j.ipm.2017.02.002
10.1016/j.ins.2021.06.071
10.1145/1458082.1458237
10.2307/23044048
10.1007/s10115-013-0687-x
10.1109/RCIS.2017.7956560
10.1007/978-3-319-76941-7_13
10.1145/355460.355478
10.1016/j.jksuci.2017.06.002
10.1007/s10209-021-00815-1
10.1016/j.jbusres.2018.10.048
10.1145/1718487.1718542
10.1145/1988688.1988717
10.1109/HICSS.2010.171
10.1109/APSIPA.2015.7415394
10.18653/v1/2021.acl-demo.15
10.3390/app11104443
10.1145/2009916.2009952
10.1109/WI.2006.67
10.1109/ICSMC.2012.6378072
10.3390/su13031425
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
DOA
DOI 10.1109/ACCESS.2022.3229080
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Unpaywall for CDI: Periodical Content
Unpaywall
Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList

Materials Research Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 1
ExternalDocumentID oai_doaj_org_article_312866263b8f44c8b6b8849767f67de3
10.1109/access.2022.3229080
10_1109_ACCESS_2022_3229080
9984637
Genre orig-research
GrantInformation_xml – fundername: Ministry of Science and ICT, South Korea
  grantid: 2016R1A2B4016591; 2018R1A5A7059549
  funderid: 10.13039/501100014188
GroupedDBID 0R~
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
4.4
AAYXX
AGSQL
CITATION
EJD
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
ID FETCH-LOGICAL-c408t-2e0b5ad63028548e5172013387215994083c78f270396b51d800df5ec189d0f33
IEDL.DBID UNPAY
ISSN 2169-3536
IngestDate Fri Oct 03 12:45:11 EDT 2025
Wed Oct 01 16:15:34 EDT 2025
Sun Jun 29 15:43:37 EDT 2025
Thu Apr 24 23:05:34 EDT 2025
Wed Oct 01 03:26:28 EDT 2025
Wed Aug 27 02:29:16 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c408t-2e0b5ad63028548e5172013387215994083c78f270396b51d800df5ec189d0f33
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-1722-4214
0000-0002-4184-2058
0000-0003-4869-901X
0000-0001-9387-7319
OpenAccessLink https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ielx7/6287639/6514899/09984637.pdf
PQID 2756561657
PQPubID 4845423
PageCount 1
ParticipantIDs proquest_journals_2756561657
unpaywall_primary_10_1109_access_2022_3229080
ieee_primary_9984637
crossref_citationtrail_10_1109_ACCESS_2022_3229080
doaj_primary_oai_doaj_org_article_312866263b8f44c8b6b8849767f67de3
crossref_primary_10_1109_ACCESS_2022_3229080
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-01-01
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref35
ref12
munich (ref24) 2019; 13
ref15
ref36
ref14
ref31
dong (ref33) 2008; 2
ref30
ref11
ref32
ref10
ref2
ref39
andrew (ref37) 2019
ref17
ref38
ref16
ref19
yesilada (ref1) 2011
shaw (ref23) 2005
zubi (ref8) 2009
cai (ref18) 2003
ref26
ref25
ref20
ref41
ref22
ref21
ref28
ref27
ref29
ref7
ref9
ref4
ref3
lindgaard (ref34) 2008
ref6
ref5
ref40
liu (ref13) 2021; 38
References_xml – ident: ref39
  doi: 10.1016/j.cviu.2016.02.007
– ident: ref5
  doi: 10.1007/978-3-030-98785-5_1
– start-page: 157
  year: 2008
  ident: ref34
  article-title: Judging web page visual appeal: Do east and west really differ?
  publication-title: Proc IADIS Multi Conf Comput Sci Inf Syst
– ident: ref3
  doi: 10.1145/3451168
– ident: ref30
  doi: 10.1057/s41262-018-0092-6
– volume: 2
  start-page: 19
  year: 2008
  ident: ref33
  article-title: A cross-cultural comparative study of users' perceptions of a webpage: With a focus on the cognitive styles of Chinese, Koreans and Americans
  publication-title: Int J
– start-page: 1
  year: 2005
  ident: ref23
  article-title: An eye-tracking evaluation of multicultural interface designs
– ident: ref4
  doi: 10.1145/3316810
– ident: ref16
  doi: 10.1145/3366424.3383547
– start-page: 73
  year: 2009
  ident: ref8
  article-title: Using some web content mining techniques for Arabic text classification
  publication-title: Proc 8th WSEAS Int Conf Data Netw Commun Comput (DNCOCO)
– ident: ref2
  doi: 10.1016/j.knosys.2014.07.007
– ident: ref11
  doi: 10.1080/01449290500330448
– ident: ref35
  doi: 10.1037/0278-7393.16.3.417
– year: 2003
  ident: ref18
  article-title: VIPS: A vision-based page segmentation algorithm
– ident: ref21
  doi: 10.3390/app10113837
– ident: ref10
  doi: 10.17705/1thci.00060
– ident: ref40
  doi: 10.1016/j.ipm.2017.02.002
– ident: ref31
  doi: 10.1016/j.ins.2021.06.071
– ident: ref36
  doi: 10.1145/1458082.1458237
– ident: ref25
  doi: 10.2307/23044048
– ident: ref17
  doi: 10.1007/s10115-013-0687-x
– start-page: 1
  year: 2011
  ident: ref1
  article-title: Web page segmentation: A review
– ident: ref32
  doi: 10.1109/RCIS.2017.7956560
– ident: ref15
  doi: 10.1007/978-3-319-76941-7_13
– ident: ref28
  doi: 10.1145/355460.355478
– ident: ref9
  doi: 10.1016/j.jksuci.2017.06.002
– ident: ref27
  doi: 10.1007/s10209-021-00815-1
– ident: ref26
  doi: 10.1016/j.jbusres.2018.10.048
– ident: ref14
  doi: 10.1145/1718487.1718542
– volume: 38
  start-page: 1
  year: 2021
  ident: ref13
  article-title: The roles of visual complexity and order in first impressions of webpages: An ERP study of webpage rapid evaluation
  publication-title: Int J Hum -Comput Interact
– volume: 13
  start-page: 25
  year: 2019
  ident: ref24
  article-title: What you see is what you know: The influence of involvement and eye movement on online users' knowledge acquisition
  publication-title: Int J Commun
– ident: ref19
  doi: 10.1145/1988688.1988717
– ident: ref29
  doi: 10.1109/HICSS.2010.171
– ident: ref38
  doi: 10.1109/APSIPA.2015.7415394
– ident: ref20
  doi: 10.18653/v1/2021.acl-demo.15
– ident: ref6
  doi: 10.3390/app11104443
– ident: ref41
  doi: 10.1145/2009916.2009952
– start-page: 423
  year: 2019
  ident: ref37
  article-title: Web page segmentation for non visual skimming
  publication-title: Proc 33rd Pacific Asia Conf Lang Inf Comput (PACLIC)
– ident: ref22
  doi: 10.1109/WI.2006.67
– ident: ref7
  doi: 10.1109/ICSMC.2012.6378072
– ident: ref12
  doi: 10.3390/su13031425
SSID ssj0000816957
Score 2.2449234
Snippet Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods...
SourceID doaj
unpaywall
proquest
crossref
ieee
SourceType Open Website
Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Block detection
Boilerplate removal
Feature extraction
Main content extraction
Web content extraction
Web mining
Web segmentation
Websites
SummonAdditionalLinks – databaseName: Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYqLpRDxaNVlwLygSOBxG8fYcVqVQnEAQQ3K3bsqtIqIAgC_n1nnLAKQoJLr8nEcWbGnm8c-xtC9rVMNiSTisSNwN-MvPBRx4I1Nmkhg5V5vePsXM2vxO8beTMq9YV7wnp64F5xRxwmUIWUKd4kIYLxyhsjIIjqpHQTM89naewomcpzsKmUlXqgGapKe3Q8ncIXQULI2CFHknMkghyFoszYP5RYeYM2Vx_bu_rlqV4sRoFntk6-DYiRHvc93SBfYrtJ1kY8gltkfvrc5dNO7R8KgI6eQbpPM-9U29HbRK-jpxcwcTzQvEMgy8z-Au6juKbQM3PAC2L9nVzNTi-n82IokVAEUZquYLH0sm4UL_EkpIkS8EiJaSckdtJakOFBm8RgXFvlZdUAPmySjKEytikT5z_ISnvbxp-EYm7CtGfRhiC0TrUGqCQwgLMqqaaeEPaqLRcG_nAsY7FwOY8oretV7FDFblDxhBwsH7rr6TM-Fj9BMyxFkfs6XwCPcINHuM88YkK20IjLRiChFIrrCdl5NaobxumDQ_J7QJBKwu1iaeh3Xa1z8co3Xd3-H139Rb5im_2Szg5Z6e4f4y6AnM7vZX_-BzrN70g
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwELbaXoADr4LYUpAPHJtt1m8f21VXK6RFHKjozYodG1WsslWbFbS_vjNONtoWhLhFySS2M358M_Z8Q8gnLZMNyaQicSNwm5EXPupYsNomLWSwMvs7Fl_U_Fx8vpAXO-RoiIWJMebDZ3GMl3kvv16FNbrKjsE0EIrrXbKrjepitQZ_CiaQsFL3xEKT0h6fTKfQBjABGRtzpDVH6setxSdz9PdJVR7gyyfr5qq6_VUtl1tLzewFWWwq2Z0w-Tlet34c7h7xN_5vK16S5z3mpCddJ3lFdmLzmjzbYiLcJ_Oz322Ol2p-UICEdFFdNjQzVzUtXSX6PXr6FaaeG5rPGGSZ2SUgR4peiY7bAwqI1RtyPjv7Np0XfZKFIojStAWLpZdVrXiJsZQmSkA0JRquYBpKa0GGB20Sg5nBKi8nNSDMOskYJsbWZeL8LdlrVk18RyhaN0x7Fm0IQutUaQBbAiEAmyRVVyPCNn_fhZ6BHBNhLF22RErrOpU5VJnrVTYiR8NLVx0Bx7_FT1GtgyiyZ-cboALXD0bHYVFWSMPjTRIiGK-8MQKAmU5K15GPyD6qbfhIr7EROdx0EteP9BuH9PmAQZWEx8XQcf6oapXTXz6o6sHfS3lPnqJU5-Y5JHvt9Tp-AODT-o-5x98DZqr6EA
  priority: 102
  providerName: IEEE
Title Extracting the Main Content of Web Pages Using the First Impression Area
URI https://ieeexplore.ieee.org/document/9984637
https://www.proquest.com/docview/2756561657
https://ieeexplore.ieee.org/ielx7/6287639/6514899/09984637.pdf
https://doaj.org/article/312866263b8f44c8b6b8849767f67de3
UnpaywallVersion publishedVersion
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: KQ8
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: DOA
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2169-3536
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000816957
  issn: 2169-3536
  databaseCode: M~E
  dateStart: 20130101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELagewAOvBZE2aXygSNJU799LNVWFdKu9kDFcrJix0YVVbZiU7Hw6xk7btUFCQlueUwSRzP2fDO2v0HoreRBu6BCEahicZqRFtZLX5BGB8m40zzlO84vxGLJPlzxq5xwS3thvPdp8Zkv42Gay1_59a0cCxLJ0_RYgIuHIGEM0AZ8J5Xlpgn30ZHggMUH6Gh5cTn9HCvKTYQuaJqbPMnEmuM61SCEoJCQkkai80gGeeCOEmt_LrNyB3E-2Lab-sf3er0-cD7zJ8jsmt2vOflabjtbup-_MTr-_389RY8zLsXT3pCeoXu-fY4eHbAVHqPF2W2X9lS1XzDARnxer1qc2K3aDl8H_MlbfAnD0w1O6xCSzHwF6BLHzEXP_wEf8PULtJyffZwtilyIoXCsUl1BfGV53Qhaxf2WynNAPVUMbiF85FqDDHVSBQKjhxaWTxpAoU3g3k2UbqpA6Us0aK9b_wrhGAERaYnXzjEpQy0BkLEIE8gkiKYeIrLTh3GZpTwWy1ibFK1U2kxnMzBNE5VoshKH6N3-oU1P0vF38fdR0XvRyLCdLoBSTO6whoLjFpGqx6rAmFNWWKUYgDcZhGw8HaLjqMj9S7LWhuh0ZzYmjwY3JlLsA04VHG4Xe1P6o6m9ed5p6ut_lD9BD-NpnyM6RYPu29a_AdTU2VHKNozSBsdR7ia_AGrsDEU
linkProvider Unpaywall
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcigceBXEQgEfODbbbGzH9rGsWi3QrTi0ojcrduyqYpWtaFY8fj0zTjbaAkLcomSc2Jmx_c3Y_gbgrZLR-KhjFrkWtMzIMxdUyIraRCWkNzLFO-an5excfLiQF1uwP5yFCSGkzWdhTJdpLb9e-hWFyg7QNRAlV3fgrhRCyO601hBRoRQSRqqeWmiSm4PD6RRbgU5gUYw5EZsT-ePG9JNY-vu0KrcQ5s6qua5-fKsWi43J5vghzNfV7PaYfBmvWjf2P39jcPzfdjyCBz3qZIedmTyGrdA8gfsbXIS7MDv63qYTU80lQ1DI5tVVwxJ3VdOyZWSfg2OfcPC5YWmXQZI5vkLsyCgu0bF74AdC9RTOj4_OprOsT7OQeZHrNitC7mRVlzyn05Q6SMQ0Obmu6BxKY1CGe6VjgWODKZ2c1Igx6yiDn2hT55HzZ7DdLJvwHBj5N4VyRTDeC6VipRBuCQIBxSSWdTWCYv33re85yCkVxsImXyQ3tlOZJZXZXmUj2B8KXXcUHP8Wf0dqHUSJPzvdQBXYvjtajtNySUQ8TkchvHal01ogNFOxVHXgI9gltQ0v6TU2gr21kdi-r99YItBHFFpKfJwNhvNHVauUAPNWVV_8_StvYGd2Nj-xJ-9PP76Ee1SiC_rswXb7dRVeIQxq3etk_b8Apbj9XQ
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LbxMxELYgPQAHXgURWpAPHNnNxm8fQ9QoQmrVAxHlZK29NooabSO6UVt-PWOvE6UgIcFtH7O7Xs3Y883Y_gahD5IH7YIKRaCKxWlGWlgvfUEaHSTjTvOU7zg9E_MF-3zBL3LCLe2F8d6nxWe-jIdpLn_pV7dyJEgkT9MjAS4egoQRQBvwnVSW6yY8RAeCAxYfoIPF2fnkW6woNxa6oGlu8igTa47qVIMQgkJCShqJziMZ5J47Sqz9uczKPcT5aNOu67uberXacz6zZ8hsm92vObksN50t3c_fGB3__7-eo6cZl-JJb0gv0APfvkRP9tgKD9H85LZLe6ra7xhgIz6tly1O7FZth68C_uotPofh6RqndQhJZrYEdIlj5qLn_4AP-PoVWsxOvkznRS7EUDhWqa4gvrK8bgSt4n5L5TmgnioGtxA-cq1BhjqpAoHRQwvLxw2g0CZw78ZKN1Wg9DUatFetf4NwjICItMRr55iUoZYAyFiECWQcRFMPEdnqw7jMUh6LZaxMilYqbSbTKZimiUo0WYlD9HH30Lon6fi7-Keo6J1oZNhOF0ApJndYQ8Fxi0jVY1VgzCkrrFIMwJsMQjaeDtFhVOTuJVlrQ3S8NRuTR4NrEyn2AacKDreLnSn90dTePO819e0_yh-hx_G0zxEdo0H3Y-PfAWrq7PvcNX4B6QMKTw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+the+Main+Content+of+Web+Pages+Using+the+First+Impression+Area&rft.jtitle=IEEE+access&rft.au=Jung%2C+Geunseong&rft.au=Han%2C+Sungjae&rft.au=Kim%2C+Hansung&rft.au=Kim%2C+Kwanguk&rft.date=2022-01-01&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=10&rft.spage=129958&rft.epage=129969&rft_id=info:doi/10.1109%2FACCESS.2022.3229080&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2022_3229080
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon