Efficient URL and URI Compression

Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store count...

Full description

Saved in:
Bibliographic Details
Published inProceedings - International Conference on Computer Communications and Networks pp. 1 - 9
Main Authors Savins, Felix, Saric, Kevin, Ramachandran, Gowri Sankar, Jurdak, Raja
Format Conference Proceeding
LanguageEnglish
Published IEEE 29.07.2024
Subjects
Online AccessGet full text
ISSN2637-9430
DOI10.1109/ICCCN61486.2024.10637589

Cover

Abstract Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store countless URIs. Due to the proliferation of the Internet, the number of URIs has increased rapidly, demanding significant storage. Several compression schemes are present in the literature for efficiently storing files on hard disks. However, existing compression schemes are designed for generic content, resulting in sub-optimal storage efficiency for standalone URIs. This paper presents a compression scheme specifically designed for URIs. Our contribution is three-fold: a) an empirical analysis of existing compression schemes for storing URIs, b) a design for a novel URI-focused compression scheme that improves on existing schemes and c) an adaptation of the well-known Huffman coding scheme to URIs using Natural Language Processing (NLP) to create a custom compression dictionary. Evaluation results using five million standalone URI strings show that our novel compression scheme improves storage efficiency by 18%. Furthermore, our customized Huffman coding compression scheme outperforms the standard content-agnostic Huffman technique. Our compression scheme reduces the storage space of a single instance of all URIs in existence - estimated to be more than 130 trillion - by more than 1.2 petabytes (PB) compared to the standard Huffman coding technique. Considering only unique instances of URIs, this bare minimum of 1.2 PB of hard disk savings worth approximately USD24,000 can be saved, however, in practice, many orders of magnitude more may be possible.
AbstractList Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store countless URIs. Due to the proliferation of the Internet, the number of URIs has increased rapidly, demanding significant storage. Several compression schemes are present in the literature for efficiently storing files on hard disks. However, existing compression schemes are designed for generic content, resulting in sub-optimal storage efficiency for standalone URIs. This paper presents a compression scheme specifically designed for URIs. Our contribution is three-fold: a) an empirical analysis of existing compression schemes for storing URIs, b) a design for a novel URI-focused compression scheme that improves on existing schemes and c) an adaptation of the well-known Huffman coding scheme to URIs using Natural Language Processing (NLP) to create a custom compression dictionary. Evaluation results using five million standalone URI strings show that our novel compression scheme improves storage efficiency by 18%. Furthermore, our customized Huffman coding compression scheme outperforms the standard content-agnostic Huffman technique. Our compression scheme reduces the storage space of a single instance of all URIs in existence - estimated to be more than 130 trillion - by more than 1.2 petabytes (PB) compared to the standard Huffman coding technique. Considering only unique instances of URIs, this bare minimum of 1.2 PB of hard disk savings worth approximately USD24,000 can be saved, however, in practice, many orders of magnitude more may be possible.
Author Ramachandran, Gowri Sankar
Savins, Felix
Jurdak, Raja
Saric, Kevin
Author_xml – sequence: 1
  givenname: Felix
  surname: Savins
  fullname: Savins, Felix
  email: felix.savins@connect.qut.edu.au
  organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia
– sequence: 2
  givenname: Kevin
  surname: Saric
  fullname: Saric, Kevin
  email: kevin.saric@hdr.qut.edu.au
  organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia
– sequence: 3
  givenname: Gowri Sankar
  surname: Ramachandran
  fullname: Ramachandran, Gowri Sankar
  email: g.ramachandran@qut.edu.au
  organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia
– sequence: 4
  givenname: Raja
  surname: Jurdak
  fullname: Jurdak, Raja
  email: r.jurdak@qut.edu.au
  organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia
BookMark eNo1j81Kw0AUhUdRsK19AxfxARLvzdz5W8pQNRAsFLsu0_mBETspSTe-vQF1c77FgY9zluymDCUyViE0iGCeOmvtu0TSsmmhpQZBciW0uWJro4zmArgmiXjNFu3c1IY43LHlNH0CgJZAC_a4SSn7HMul2u_6ypUws6vscDqPcZryUO7ZbXJfU1z_ccX2L5sP-1b329fOPvd1RiUvdUroSMojqUAiBtVCciIY40wrkxRBKO_SvEcfCYm00nN68BzJB-WC4iv28OvNMcbDecwnN34f_i_xH9XYQF8
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCCN61486.2024.10637589
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9798350384611
EISSN 2637-9430
EndPage 9
ExternalDocumentID 10637589
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i176t-ff1a466b47d45ed720fa5d99a926f65d57caf0388b4144878144c0c314cd7ad73
IEDL.DBID RIE
IngestDate Wed Aug 27 02:32:06 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i176t-ff1a466b47d45ed720fa5d99a926f65d57caf0388b4144878144c0c314cd7ad73
PageCount 9
ParticipantIDs ieee_primary_10637589
PublicationCentury 2000
PublicationDate 2024-July-29
PublicationDateYYYYMMDD 2024-07-29
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-July-29
  day: 29
PublicationDecade 2020
PublicationTitle Proceedings - International Conference on Computer Communications and Networks
PublicationTitleAbbrev ICCCN
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008604
Score 2.2659614
Snippet Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Algorithm
Analysis
Dictionaries
Firewalls (computing)
Hard disks
Internet
Natural language processing
Networking
Storage Optimization
Uniform resource locators
URI Compression
URL Compression
Web pages
Title Efficient URL and URI Compression
URI https://ieeexplore.ieee.org/document/10637589
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxGA22J724VdwZwWvGJJNtzkNLK1pELPRWsoIIU5HpxV_fJNOpCwieEgIfyZeQfFneewHgFulCcU89RMoWMFIhoWREQ4ydFxZhRZJk_uOUj2f0fs7mG7J64sI45xL4zOUxm97y7dKs4lVZmOG8CPvbsgd6QvKWrLVddiVHtIPqoPJuUlXVNKpcRhwCoXln--MXlRRERvtg2lXfYkfe8lWjc_P5S5nx3-07AIMvvl72tI1Eh2DH1Udg75vU4DG4GSatiGCfzZ4fMlXbkE6yuBy0SNh6AGaj4Us1hpvvEeArFryB3mNFOddUWMqcFQR5xWxZqpJwz5llwigfxV40DacmGbWtqEGmwNRYoawoTkC_XtbuFGQ0RKXCcSVLG86LUkuHmcacI6NI8ImegUH0dvHeKmAsOkfP_yi_ALux0-MdKCkvQb_5WLmrELwbfZ0GbQ03fZXB
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD7ofFBfvE28W8HX1qQ9SdvnsrHpVkQ22NtImgRE6IZ0L_56k3adFxB8SggcyCEkJ-fk-74A3BMZCW7Q-ESoyHdUSD9hofQp1SZWhIqwlswf53wwxccZm63J6jUXRmtdg8904Lr1W75aFCtXKrM7nEf2fptuww5DRNbQtTYHb8IJtmAdkj4MsyzLnc6lQyKEGLTWP_5RqcNI_wDydgINeuQtWFUyKD5-aTP-e4aH0P1i7HnPm1h0BFu6PIb9b2KDJ3DXq9UirL03fRl5olS2HXruQGiwsGUXpv3eJBv46w8S_Fca88o3hgrkXGKskGkVh8QIptJUpCE3nCkWF8I4uReJNm9KnLoVFqSIKBYqFiqOTqFTLkp9Bh7auBRpLpJU2YwxkYmmTFLOSSFC6xOeQ9d5O182Ghjz1tGLP8ZvYXcwGY_mo2H-dAl7bgFcRTRMr6BTva_0tQ3llbypF_ATRwuZDg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+Computer+Communications+and+Networks&rft.atitle=Efficient+URL+and+URI+Compression&rft.au=Savins%2C+Felix&rft.au=Saric%2C+Kevin&rft.au=Ramachandran%2C+Gowri+Sankar&rft.au=Jurdak%2C+Raja&rft.date=2024-07-29&rft.pub=IEEE&rft.eissn=2637-9430&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FICCCN61486.2024.10637589&rft.externalDocID=10637589