Efficient URL and URI Compression
Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store count...
Saved in:
| Published in | Proceedings - International Conference on Computer Communications and Networks pp. 1 - 9 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
29.07.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2637-9430 |
| DOI | 10.1109/ICCCN61486.2024.10637589 |
Cover
| Abstract | Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store countless URIs. Due to the proliferation of the Internet, the number of URIs has increased rapidly, demanding significant storage. Several compression schemes are present in the literature for efficiently storing files on hard disks. However, existing compression schemes are designed for generic content, resulting in sub-optimal storage efficiency for standalone URIs. This paper presents a compression scheme specifically designed for URIs. Our contribution is three-fold: a) an empirical analysis of existing compression schemes for storing URIs, b) a design for a novel URI-focused compression scheme that improves on existing schemes and c) an adaptation of the well-known Huffman coding scheme to URIs using Natural Language Processing (NLP) to create a custom compression dictionary. Evaluation results using five million standalone URI strings show that our novel compression scheme improves storage efficiency by 18%. Furthermore, our customized Huffman coding compression scheme outperforms the standard content-agnostic Huffman technique. Our compression scheme reduces the storage space of a single instance of all URIs in existence - estimated to be more than 130 trillion - by more than 1.2 petabytes (PB) compared to the standard Huffman coding technique. Considering only unique instances of URIs, this bare minimum of 1.2 PB of hard disk savings worth approximately USD24,000 can be saved, however, in practice, many orders of magnitude more may be possible. |
|---|---|
| AbstractList | Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files and web pages on the Internet. Messaging services, firewalls, content distribution frameworks, event logs, databases and datasets store countless URIs. Due to the proliferation of the Internet, the number of URIs has increased rapidly, demanding significant storage. Several compression schemes are present in the literature for efficiently storing files on hard disks. However, existing compression schemes are designed for generic content, resulting in sub-optimal storage efficiency for standalone URIs. This paper presents a compression scheme specifically designed for URIs. Our contribution is three-fold: a) an empirical analysis of existing compression schemes for storing URIs, b) a design for a novel URI-focused compression scheme that improves on existing schemes and c) an adaptation of the well-known Huffman coding scheme to URIs using Natural Language Processing (NLP) to create a custom compression dictionary. Evaluation results using five million standalone URI strings show that our novel compression scheme improves storage efficiency by 18%. Furthermore, our customized Huffman coding compression scheme outperforms the standard content-agnostic Huffman technique. Our compression scheme reduces the storage space of a single instance of all URIs in existence - estimated to be more than 130 trillion - by more than 1.2 petabytes (PB) compared to the standard Huffman coding technique. Considering only unique instances of URIs, this bare minimum of 1.2 PB of hard disk savings worth approximately USD24,000 can be saved, however, in practice, many orders of magnitude more may be possible. |
| Author | Ramachandran, Gowri Sankar Savins, Felix Jurdak, Raja Saric, Kevin |
| Author_xml | – sequence: 1 givenname: Felix surname: Savins fullname: Savins, Felix email: felix.savins@connect.qut.edu.au organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia – sequence: 2 givenname: Kevin surname: Saric fullname: Saric, Kevin email: kevin.saric@hdr.qut.edu.au organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia – sequence: 3 givenname: Gowri Sankar surname: Ramachandran fullname: Ramachandran, Gowri Sankar email: g.ramachandran@qut.edu.au organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia – sequence: 4 givenname: Raja surname: Jurdak fullname: Jurdak, Raja email: r.jurdak@qut.edu.au organization: Queensland University of Technology,School of Computer Science,Brisbane,Australia |
| BookMark | eNo1j81Kw0AUhUdRsK19AxfxARLvzdz5W8pQNRAsFLsu0_mBETspSTe-vQF1c77FgY9zluymDCUyViE0iGCeOmvtu0TSsmmhpQZBciW0uWJro4zmArgmiXjNFu3c1IY43LHlNH0CgJZAC_a4SSn7HMul2u_6ypUws6vscDqPcZryUO7ZbXJfU1z_ccX2L5sP-1b329fOPvd1RiUvdUroSMojqUAiBtVCciIY40wrkxRBKO_SvEcfCYm00nN68BzJB-WC4iv28OvNMcbDecwnN34f_i_xH9XYQF8 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICCCN61486.2024.10637589 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 9798350384611 |
| EISSN | 2637-9430 |
| EndPage | 9 |
| ExternalDocumentID | 10637589 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i176t-ff1a466b47d45ed720fa5d99a926f65d57caf0388b4144878144c0c314cd7ad73 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 02:32:06 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i176t-ff1a466b47d45ed720fa5d99a926f65d57caf0388b4144878144c0c314cd7ad73 |
| PageCount | 9 |
| ParticipantIDs | ieee_primary_10637589 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-July-29 |
| PublicationDateYYYYMMDD | 2024-07-29 |
| PublicationDate_xml | – month: 07 year: 2024 text: 2024-July-29 day: 29 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Conference on Computer Communications and Networks |
| PublicationTitleAbbrev | ICCCN |
| PublicationYear | 2024 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0008604 |
| Score | 2.2659614 |
| Snippet | Web applications use Universal Resource Identifiers (URIs), interchangeably referred to as Uniform Resource Locators (URLs), to locate resources such as files... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithm Analysis Dictionaries Firewalls (computing) Hard disks Internet Natural language processing Networking Storage Optimization Uniform resource locators URI Compression URL Compression Web pages |
| Title | Efficient URL and URI Compression |
| URI | https://ieeexplore.ieee.org/document/10637589 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1JSwMxGA22J724VdwZwWvGJJNtzkNLK1pELPRWsoIIU5HpxV_fJNOpCwieEgIfyZeQfFneewHgFulCcU89RMoWMFIhoWREQ4ydFxZhRZJk_uOUj2f0fs7mG7J64sI45xL4zOUxm97y7dKs4lVZmOG8CPvbsgd6QvKWrLVddiVHtIPqoPJuUlXVNKpcRhwCoXln--MXlRRERvtg2lXfYkfe8lWjc_P5S5nx3-07AIMvvl72tI1Eh2DH1Udg75vU4DG4GSatiGCfzZ4fMlXbkE6yuBy0SNh6AGaj4Us1hpvvEeArFryB3mNFOddUWMqcFQR5xWxZqpJwz5llwigfxV40DacmGbWtqEGmwNRYoawoTkC_XtbuFGQ0RKXCcSVLG86LUkuHmcacI6NI8ImegUH0dvHeKmAsOkfP_yi_ALux0-MdKCkvQb_5WLmrELwbfZ0GbQ03fZXB |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD7ofFBfvE28W8HX1qQ9SdvnsrHpVkQ22NtImgRE6IZ0L_56k3adFxB8SggcyCEkJ-fk-74A3BMZCW7Q-ESoyHdUSD9hofQp1SZWhIqwlswf53wwxccZm63J6jUXRmtdg8904Lr1W75aFCtXKrM7nEf2fptuww5DRNbQtTYHb8IJtmAdkj4MsyzLnc6lQyKEGLTWP_5RqcNI_wDydgINeuQtWFUyKD5-aTP-e4aH0P1i7HnPm1h0BFu6PIb9b2KDJ3DXq9UirL03fRl5olS2HXruQGiwsGUXpv3eJBv46w8S_Fca88o3hgrkXGKskGkVh8QIptJUpCE3nCkWF8I4uReJNm9KnLoVFqSIKBYqFiqOTqFTLkp9Bh7auBRpLpJU2YwxkYmmTFLOSSFC6xOeQ9d5O182Ghjz1tGLP8ZvYXcwGY_mo2H-dAl7bgFcRTRMr6BTva_0tQ3llbypF_ATRwuZDg |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Conference+on+Computer+Communications+and+Networks&rft.atitle=Efficient+URL+and+URI+Compression&rft.au=Savins%2C+Felix&rft.au=Saric%2C+Kevin&rft.au=Ramachandran%2C+Gowri+Sankar&rft.au=Jurdak%2C+Raja&rft.date=2024-07-29&rft.pub=IEEE&rft.eissn=2637-9430&rft.spage=1&rft.epage=9&rft_id=info:doi/10.1109%2FICCCN61486.2024.10637589&rft.externalDocID=10637589 |