Extracting the Main Content of Web Pages Using the First Impression Area
Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extra...
Saved in:
| Published in | IEEE access Vol. 10; p. 1 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Piscataway
IEEE
01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2169-3536 2169-3536 |
| DOI | 10.1109/ACCESS.2022.3229080 |
Cover
| Abstract | Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks F 0.5 ) and non-English (up to 42%, matched text blocks F 0.5 ) web pages. |
|---|---|
| AbstractList | Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks F 0.5 ) and non-English (up to 42%, matched text blocks F 0.5 ) web pages. Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers' (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks <tex-math notation="LaTeX">$\mathrm {\mathbf {F_{0.5}}}$ </tex-math>) and non-English (up to 42%, matched text blocks <tex-math notation="LaTeX">$\mathrm {\mathbf {F_{0.5}}}$ </tex-math>) web pages. Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods using text-based algorithms and features for English text can be ineffective for non-English web pages. This study proposes a main content extraction method that obtains visual and structural features from the rendered web page. Our method uses the first impression area (FIA), a part of a web page that users initially view. In this area, websites have applied many techniques that enable users to find the main content easily. Using the non-textual properties in the FIA, our method selects three points with high content area density and expands the area from each point until it meets several structural and visual-based conditions. We evaluated our method, browsers’ (Mozilla Firefox and Google Chrome) reader modes, and existing main content extraction methods on multilingual datasets using two measures: Longest Common Subsequences and matched text blocks. The results showed that our method performed better than other methods in both English (up to 46%, matched text blocks [Formula Omitted]) and non-English (up to 42%, matched text blocks [Formula Omitted]) web pages. |
| Author | Cha, Jaehyuk Han, Sungjae Kim, Kwanguk Kim, Hansung Jung, Geunseong |
| Author_xml | – sequence: 1 givenname: Geunseong orcidid: 0000-0003-1722-4214 surname: Jung fullname: Jung, Geunseong organization: Department of Computer Science, Hanyang University, Seoul, South Korea – sequence: 2 givenname: Sungjae surname: Han fullname: Han, Sungjae organization: JEI Group, Seoul, South Korea – sequence: 3 givenname: Hansung surname: Kim fullname: Kim, Hansung organization: Department of Sociology, Hanyang University, Seoul, South Korea – sequence: 4 givenname: Kwanguk orcidid: 0000-0002-4184-2058 surname: Kim fullname: Kim, Kwanguk organization: Department of Computer Science, Hanyang University, Seoul, South Korea – sequence: 5 givenname: Jaehyuk surname: Cha fullname: Cha, Jaehyuk organization: Department of Computer Science, Hanyang University, Seoul, South Korea |
| BookMark | eNptkVtvEzEQhS1UJErpL-jLSjwn-LK-PUarlEYqAqlUPFpe72xwtLWD7Qj673HYEqEIv9gane_MzPFbdBFiAIRuCF4SgvWHVdetHx6WFFO6ZJRqrPArdEmJ0AvGmbj45_0GXee8w_WoWuLyEt2tf5VkXfFh25Tv0HyyPjRdDAVCaeLYfIO--WK3kJvH_Fdz61MuzeZpnyBnH0OzSmDfodejnTJcv9xX6PF2_bW7W9x__rjpVvcL12JVFhRwz-0gGKaKtwo4kRQTxpSkhGtdNcxJNVKJmRY9J4PCeBg5OKL0gEfGrtBm9h2i3Zl98k82PZtovflTiGlrbCreTWAYoUoIKlivxrZ1qhe9Uq2WQo5CDnD0amevQ9jb5592mk6GBJtjuMY6V5c0x3DNS7gVez9j-xR_HCAXs4uHFOrWhkouuCCCy6rSs8qlmHOC0ThfbKl51cD9dOowf995B3bGns_1f-pmpjwAnAitVSuYZL8BRgyifA |
| CODEN | IAECCG |
| CitedBy_id | crossref_primary_10_9728_dcs_2023_24_4_691 crossref_primary_10_1016_j_softx_2023_101501 |
| Cites_doi | 10.1016/j.cviu.2016.02.007 10.1007/978-3-030-98785-5_1 10.1145/3451168 10.1057/s41262-018-0092-6 10.1145/3316810 10.1145/3366424.3383547 10.1016/j.knosys.2014.07.007 10.1080/01449290500330448 10.1037/0278-7393.16.3.417 10.3390/app10113837 10.17705/1thci.00060 10.1016/j.ipm.2017.02.002 10.1016/j.ins.2021.06.071 10.1145/1458082.1458237 10.2307/23044048 10.1007/s10115-013-0687-x 10.1109/RCIS.2017.7956560 10.1007/978-3-319-76941-7_13 10.1145/355460.355478 10.1016/j.jksuci.2017.06.002 10.1007/s10209-021-00815-1 10.1016/j.jbusres.2018.10.048 10.1145/1718487.1718542 10.1145/1988688.1988717 10.1109/HICSS.2010.171 10.1109/APSIPA.2015.7415394 10.18653/v1/2021.acl-demo.15 10.3390/app11104443 10.1145/2009916.2009952 10.1109/WI.2006.67 10.1109/ICSMC.2012.6378072 10.3390/su13031425 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D ADTOC UNPAY DOA |
| DOI | 10.1109/ACCESS.2022.3229080 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Unpaywall for CDI: Periodical Content Unpaywall Directory of Open Access Journals |
| DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Materials Research Database |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2169-3536 |
| EndPage | 1 |
| ExternalDocumentID | oai_doaj_org_article_312866263b8f44c8b6b8849767f67de3 10.1109/access.2022.3229080 10_1109_ACCESS_2022_3229080 9984637 |
| Genre | orig-research |
| GrantInformation_xml | – fundername: Ministry of Science and ICT, South Korea grantid: 2016R1A2B4016591; 2018R1A5A7059549 funderid: 10.13039/501100014188 |
| GroupedDBID | 0R~ 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS 4.4 AAYXX AGSQL CITATION EJD 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D ADTOC UNPAY |
| ID | FETCH-LOGICAL-c408t-2e0b5ad63028548e5172013387215994083c78f270396b51d800df5ec189d0f33 |
| IEDL.DBID | UNPAY |
| ISSN | 2169-3536 |
| IngestDate | Fri Oct 03 12:45:11 EDT 2025 Wed Oct 01 16:15:34 EDT 2025 Sun Jun 29 15:43:37 EDT 2025 Thu Apr 24 23:05:34 EDT 2025 Wed Oct 01 03:26:28 EDT 2025 Wed Aug 27 02:29:16 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | https://creativecommons.org/licenses/by/4.0/legalcode cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c408t-2e0b5ad63028548e5172013387215994083c78f270396b51d800df5ec189d0f33 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0003-1722-4214 0000-0002-4184-2058 0000-0003-4869-901X 0000-0001-9387-7319 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ielx7/6287639/6514899/09984637.pdf |
| PQID | 2756561657 |
| PQPubID | 4845423 |
| PageCount | 1 |
| ParticipantIDs | proquest_journals_2756561657 unpaywall_primary_10_1109_access_2022_3229080 ieee_primary_9984637 crossref_citationtrail_10_1109_ACCESS_2022_3229080 doaj_primary_oai_doaj_org_article_312866263b8f44c8b6b8849767f67de3 crossref_primary_10_1109_ACCESS_2022_3229080 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2022-01-01 |
| PublicationDateYYYYMMDD | 2022-01-01 |
| PublicationDate_xml | – month: 01 year: 2022 text: 2022-01-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Piscataway |
| PublicationPlace_xml | – name: Piscataway |
| PublicationTitle | IEEE access |
| PublicationTitleAbbrev | Access |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref35 ref12 munich (ref24) 2019; 13 ref15 ref36 ref14 ref31 dong (ref33) 2008; 2 ref30 ref11 ref32 ref10 ref2 ref39 andrew (ref37) 2019 ref17 ref38 ref16 ref19 yesilada (ref1) 2011 shaw (ref23) 2005 zubi (ref8) 2009 cai (ref18) 2003 ref26 ref25 ref20 ref41 ref22 ref21 ref28 ref27 ref29 ref7 ref9 ref4 ref3 lindgaard (ref34) 2008 ref6 ref5 ref40 liu (ref13) 2021; 38 |
| References_xml | – ident: ref39 doi: 10.1016/j.cviu.2016.02.007 – ident: ref5 doi: 10.1007/978-3-030-98785-5_1 – start-page: 157 year: 2008 ident: ref34 article-title: Judging web page visual appeal: Do east and west really differ? publication-title: Proc IADIS Multi Conf Comput Sci Inf Syst – ident: ref3 doi: 10.1145/3451168 – ident: ref30 doi: 10.1057/s41262-018-0092-6 – volume: 2 start-page: 19 year: 2008 ident: ref33 article-title: A cross-cultural comparative study of users' perceptions of a webpage: With a focus on the cognitive styles of Chinese, Koreans and Americans publication-title: Int J – start-page: 1 year: 2005 ident: ref23 article-title: An eye-tracking evaluation of multicultural interface designs – ident: ref4 doi: 10.1145/3316810 – ident: ref16 doi: 10.1145/3366424.3383547 – start-page: 73 year: 2009 ident: ref8 article-title: Using some web content mining techniques for Arabic text classification publication-title: Proc 8th WSEAS Int Conf Data Netw Commun Comput (DNCOCO) – ident: ref2 doi: 10.1016/j.knosys.2014.07.007 – ident: ref11 doi: 10.1080/01449290500330448 – ident: ref35 doi: 10.1037/0278-7393.16.3.417 – year: 2003 ident: ref18 article-title: VIPS: A vision-based page segmentation algorithm – ident: ref21 doi: 10.3390/app10113837 – ident: ref10 doi: 10.17705/1thci.00060 – ident: ref40 doi: 10.1016/j.ipm.2017.02.002 – ident: ref31 doi: 10.1016/j.ins.2021.06.071 – ident: ref36 doi: 10.1145/1458082.1458237 – ident: ref25 doi: 10.2307/23044048 – ident: ref17 doi: 10.1007/s10115-013-0687-x – start-page: 1 year: 2011 ident: ref1 article-title: Web page segmentation: A review – ident: ref32 doi: 10.1109/RCIS.2017.7956560 – ident: ref15 doi: 10.1007/978-3-319-76941-7_13 – ident: ref28 doi: 10.1145/355460.355478 – ident: ref9 doi: 10.1016/j.jksuci.2017.06.002 – ident: ref27 doi: 10.1007/s10209-021-00815-1 – ident: ref26 doi: 10.1016/j.jbusres.2018.10.048 – ident: ref14 doi: 10.1145/1718487.1718542 – volume: 38 start-page: 1 year: 2021 ident: ref13 article-title: The roles of visual complexity and order in first impressions of webpages: An ERP study of webpage rapid evaluation publication-title: Int J Hum -Comput Interact – volume: 13 start-page: 25 year: 2019 ident: ref24 article-title: What you see is what you know: The influence of involvement and eye movement on online users' knowledge acquisition publication-title: Int J Commun – ident: ref19 doi: 10.1145/1988688.1988717 – ident: ref29 doi: 10.1109/HICSS.2010.171 – ident: ref38 doi: 10.1109/APSIPA.2015.7415394 – ident: ref20 doi: 10.18653/v1/2021.acl-demo.15 – ident: ref6 doi: 10.3390/app11104443 – ident: ref41 doi: 10.1145/2009916.2009952 – start-page: 423 year: 2019 ident: ref37 article-title: Web page segmentation for non visual skimming publication-title: Proc 33rd Pacific Asia Conf Lang Inf Comput (PACLIC) – ident: ref22 doi: 10.1109/WI.2006.67 – ident: ref7 doi: 10.1109/ICSMC.2012.6378072 – ident: ref12 doi: 10.3390/su13031425 |
| SSID | ssj0000816957 |
| Score | 2.2449234 |
| Snippet | Extracting the main content from a web page is essential in various applications such as web crawlers and browser reader modes. Existing extraction methods... |
| SourceID | doaj unpaywall proquest crossref ieee |
| SourceType | Open Website Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithms Block detection Boilerplate removal Feature extraction Main content extraction Web content extraction Web mining Web segmentation Websites |
| SummonAdditionalLinks | – databaseName: Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYqLpRDxaNVlwLygSOBxG8fYcVqVQnEAQQ3K3bsqtIqIAgC_n1nnLAKQoJLr8nEcWbGnm8c-xtC9rVMNiSTisSNwN-MvPBRx4I1Nmkhg5V5vePsXM2vxO8beTMq9YV7wnp64F5xRxwmUIWUKd4kIYLxyhsjIIjqpHQTM89naewomcpzsKmUlXqgGapKe3Q8ncIXQULI2CFHknMkghyFoszYP5RYeYM2Vx_bu_rlqV4sRoFntk6-DYiRHvc93SBfYrtJ1kY8gltkfvrc5dNO7R8KgI6eQbpPM-9U29HbRK-jpxcwcTzQvEMgy8z-Au6juKbQM3PAC2L9nVzNTi-n82IokVAEUZquYLH0sm4UL_EkpIkS8EiJaSckdtJakOFBm8RgXFvlZdUAPmySjKEytikT5z_ISnvbxp-EYm7CtGfRhiC0TrUGqCQwgLMqqaaeEPaqLRcG_nAsY7FwOY8oretV7FDFblDxhBwsH7rr6TM-Fj9BMyxFkfs6XwCPcINHuM88YkK20IjLRiChFIrrCdl5NaobxumDQ_J7QJBKwu1iaeh3Xa1z8co3Xd3-H139Rb5im_2Szg5Z6e4f4y6AnM7vZX_-BzrN70g priority: 102 providerName: Directory of Open Access Journals – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwELbaXoADr4LYUpAPHJtt1m8f21VXK6RFHKjozYodG1WsslWbFbS_vjNONtoWhLhFySS2M358M_Z8Q8gnLZMNyaQicSNwm5EXPupYsNomLWSwMvs7Fl_U_Fx8vpAXO-RoiIWJMebDZ3GMl3kvv16FNbrKjsE0EIrrXbKrjepitQZ_CiaQsFL3xEKT0h6fTKfQBjABGRtzpDVH6setxSdz9PdJVR7gyyfr5qq6_VUtl1tLzewFWWwq2Z0w-Tlet34c7h7xN_5vK16S5z3mpCddJ3lFdmLzmjzbYiLcJ_Oz322Ol2p-UICEdFFdNjQzVzUtXSX6PXr6FaaeG5rPGGSZ2SUgR4peiY7bAwqI1RtyPjv7Np0XfZKFIojStAWLpZdVrXiJsZQmSkA0JRquYBpKa0GGB20Sg5nBKi8nNSDMOskYJsbWZeL8LdlrVk18RyhaN0x7Fm0IQutUaQBbAiEAmyRVVyPCNn_fhZ6BHBNhLF22RErrOpU5VJnrVTYiR8NLVx0Bx7_FT1GtgyiyZ-cboALXD0bHYVFWSMPjTRIiGK-8MQKAmU5K15GPyD6qbfhIr7EROdx0EteP9BuH9PmAQZWEx8XQcf6oapXTXz6o6sHfS3lPnqJU5-Y5JHvt9Tp-AODT-o-5x98DZqr6EA priority: 102 providerName: IEEE |
| Title | Extracting the Main Content of Web Pages Using the First Impression Area |
| URI | https://ieeexplore.ieee.org/document/9984637 https://www.proquest.com/docview/2756561657 https://ieeexplore.ieee.org/ielx7/6287639/6514899/09984637.pdf https://doaj.org/article/312866263b8f44c8b6b8849767f67de3 |
| UnpaywallVersion | publishedVersion |
| Volume | 10 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: KQ8 dateStart: 20130101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: DOA dateStart: 20130101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2169-3536 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000816957 issn: 2169-3536 databaseCode: M~E dateStart: 20130101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELagewAOvBZE2aXygSNJU799LNVWFdKu9kDFcrJix0YVVbZiU7Hw6xk7btUFCQlueUwSRzP2fDO2v0HoreRBu6BCEahicZqRFtZLX5BGB8m40zzlO84vxGLJPlzxq5xwS3thvPdp8Zkv42Gay1_59a0cCxLJ0_RYgIuHIGEM0AZ8J5Xlpgn30ZHggMUH6Gh5cTn9HCvKTYQuaJqbPMnEmuM61SCEoJCQkkai80gGeeCOEmt_LrNyB3E-2Lab-sf3er0-cD7zJ8jsmt2vOflabjtbup-_MTr-_389RY8zLsXT3pCeoXu-fY4eHbAVHqPF2W2X9lS1XzDARnxer1qc2K3aDl8H_MlbfAnD0w1O6xCSzHwF6BLHzEXP_wEf8PULtJyffZwtilyIoXCsUl1BfGV53Qhaxf2WynNAPVUMbiF85FqDDHVSBQKjhxaWTxpAoU3g3k2UbqpA6Us0aK9b_wrhGAERaYnXzjEpQy0BkLEIE8gkiKYeIrLTh3GZpTwWy1ibFK1U2kxnMzBNE5VoshKH6N3-oU1P0vF38fdR0XvRyLCdLoBSTO6whoLjFpGqx6rAmFNWWKUYgDcZhGw8HaLjqMj9S7LWhuh0ZzYmjwY3JlLsA04VHG4Xe1P6o6m9ed5p6ut_lD9BD-NpnyM6RYPu29a_AdTU2VHKNozSBsdR7ia_AGrsDEU |
| linkProvider | Unpaywall |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcigceBXEQgEfODbbbGzH9rGsWi3QrTi0ojcrduyqYpWtaFY8fj0zTjbaAkLcomSc2Jmx_c3Y_gbgrZLR-KhjFrkWtMzIMxdUyIraRCWkNzLFO-an5excfLiQF1uwP5yFCSGkzWdhTJdpLb9e-hWFyg7QNRAlV3fgrhRCyO601hBRoRQSRqqeWmiSm4PD6RRbgU5gUYw5EZsT-ePG9JNY-vu0KrcQ5s6qua5-fKsWi43J5vghzNfV7PaYfBmvWjf2P39jcPzfdjyCBz3qZIedmTyGrdA8gfsbXIS7MDv63qYTU80lQ1DI5tVVwxJ3VdOyZWSfg2OfcPC5YWmXQZI5vkLsyCgu0bF74AdC9RTOj4_OprOsT7OQeZHrNitC7mRVlzyn05Q6SMQ0Obmu6BxKY1CGe6VjgWODKZ2c1Igx6yiDn2hT55HzZ7DdLJvwHBj5N4VyRTDeC6VipRBuCQIBxSSWdTWCYv33re85yCkVxsImXyQ3tlOZJZXZXmUj2B8KXXcUHP8Wf0dqHUSJPzvdQBXYvjtajtNySUQ8TkchvHal01ogNFOxVHXgI9gltQ0v6TU2gr21kdi-r99YItBHFFpKfJwNhvNHVauUAPNWVV_8_StvYGd2Nj-xJ-9PP76Ee1SiC_rswXb7dRVeIQxq3etk_b8Apbj9XQ |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LbxMxELYgPQAHXgURWpAPHNnNxm8fQ9QoQmrVAxHlZK29NooabSO6UVt-PWOvE6UgIcFtH7O7Xs3Y883Y_gahD5IH7YIKRaCKxWlGWlgvfUEaHSTjTvOU7zg9E_MF-3zBL3LCLe2F8d6nxWe-jIdpLn_pV7dyJEgkT9MjAS4egoQRQBvwnVSW6yY8RAeCAxYfoIPF2fnkW6woNxa6oGlu8igTa47qVIMQgkJCShqJziMZ5J47Sqz9uczKPcT5aNOu67uberXacz6zZ8hsm92vObksN50t3c_fGB3__7-eo6cZl-JJb0gv0APfvkRP9tgKD9H85LZLe6ra7xhgIz6tly1O7FZth68C_uotPofh6RqndQhJZrYEdIlj5qLn_4AP-PoVWsxOvkznRS7EUDhWqa4gvrK8bgSt4n5L5TmgnioGtxA-cq1BhjqpAoHRQwvLxw2g0CZw78ZKN1Wg9DUatFetf4NwjICItMRr55iUoZYAyFiECWQcRFMPEdnqw7jMUh6LZaxMilYqbSbTKZimiUo0WYlD9HH30Lon6fi7-Keo6J1oZNhOF0ApJndYQ8Fxi0jVY1VgzCkrrFIMwJsMQjaeDtFhVOTuJVlrQ3S8NRuTR4NrEyn2AacKDreLnSn90dTePO819e0_yh-hx_G0zxEdo0H3Y-PfAWrq7PvcNX4B6QMKTw |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+the+Main+Content+of+Web+Pages+Using+the+First+Impression+Area&rft.jtitle=IEEE+access&rft.au=Jung%2C+Geunseong&rft.au=Han%2C+Sungjae&rft.au=Kim%2C+Hansung&rft.au=Kim%2C+Kwanguk&rft.date=2022-01-01&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=10&rft.spage=129958&rft.epage=129969&rft_id=info:doi/10.1109%2FACCESS.2022.3229080&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2022_3229080 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |