Accelerated preprocessing in task of searching substrings in a string
Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative s...
Saved in:
| Published in | Advanced engineering research (Rostov-na-Donu, Russia) Vol. 19; no. 3; pp. 290 - 300 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English Russian |
| Published |
Don State Technical University
04.10.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1992-5980 1992-6006 1992-6006 2687-1653 |
| DOI | 10.23947/1992-5980-2019-19-3-290-300 |
Cover
| Abstract | Introduction.
A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks.
Materials and Methods
. The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed.
Research Results
. The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given.
Discussion and Conclusions
. The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search. |
|---|---|
| AbstractList | Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks.Materials and Methods. The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed.Research Results. The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given.Discussion and Conclusions. The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search. Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks. Materials and Methods . The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed. Research Results . The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given. Discussion and Conclusions . The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search. |
| Author | Mazurenko, A. V. Boldyrikhin, N. V. |
| Author_xml | – sequence: 1 givenname: A. V. orcidid: 0000-0001-9541-3374 surname: Mazurenko fullname: Mazurenko, A. V. organization: DDoS-GUARD LLC – sequence: 2 givenname: N. V. orcidid: 0000-0002-9896-9543 surname: Boldyrikhin fullname: Boldyrikhin, N. V. organization: Don State Technical University |
| BookMark | eNqVkF9LwzAUxYNMcM59hz74Wr3506YBfRhj6mDgiz6HJL2d1dqWpEP27U3X4bsQknvPvecQftdk1nYtEnJL4Y5xJeQ9VYqlmSogZUBVGg9PmYKUA1yQ-WmaA-Szcz1uXpFlCLWFDCRXBYU52aycwwa9GbBMeo-97xzGnXaf1G0ymPCVdFUS0Hj3MYrhYMPgYxXGuUmm5oZcVqYJuDy_C_L-tHlbv6S71-fterVLHZU8_jNDhoZRqEAUqrRFoRCsLAQqUzlGXYU0i1IuWSU4CJk5RTk3VpZlLrjgC7KdcsvOfOre19_GH3Vnan0SOr_Xxg-1a1AXkkVnLrJ4C5pLW1kLSEFluSwVG7Mep6xD25vjj2mav0AK-oRYj-D0CE6PiGOruY6IdUQc_Q-T3_kuBI_V_-y_UWyCBA |
| Cites_doi | 10.1145/1217856.1217858 10.1093/bib/bbt081 10.46298/dmtcs.597 10.23947/1992-5980-2018-18-2-246-255 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION ADTOC UNPAY DOA |
| DOI | 10.23947/1992-5980-2019-19-3-290-300 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1992-6006 2687-1653 |
| EndPage | 300 |
| ExternalDocumentID | oai_doaj_org_article_87233a64533a4167bfbb0e109567d924 10.23947/1992-5980-2019-19-3-290-300 10_23947_1992_5980_2019_19_3_290_300 |
| GroupedDBID | 642 AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ ADTOC UNPAY |
| ID | FETCH-LOGICAL-c1730-25e2ea210f0489db889e0b784e9afc21cfe159e0672f430475c9133ab7dd64343 |
| IEDL.DBID | DOA |
| ISSN | 1992-5980 1992-6006 |
| IngestDate | Fri Oct 03 12:38:54 EDT 2025 Wed Oct 01 16:39:31 EDT 2025 Tue Jul 01 02:58:42 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 3 |
| Language | English Russian |
| License | https://vestnik.donstu.ru/jour/about/editorialPolicies#openAccessPolicy cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1730-25e2ea210f0489db889e0b784e9afc21cfe159e0672f430475c9133ab7dd64343 |
| ORCID | 0000-0002-9896-9543 0000-0001-9541-3374 |
| OpenAccessLink | https://doaj.org/article/87233a64533a4167bfbb0e109567d924 |
| PageCount | 11 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_87233a64533a4167bfbb0e109567d924 unpaywall_primary_10_23947_1992_5980_2019_19_3_290_300 crossref_primary_10_23947_1992_5980_2019_19_3_290_300 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2019-10-04 |
| PublicationDateYYYYMMDD | 2019-10-04 |
| PublicationDate_xml | – month: 10 year: 2019 text: 2019-10-04 day: 04 |
| PublicationDecade | 2010 |
| PublicationTitle | Advanced engineering research (Rostov-na-Donu, Russia) |
| PublicationYear | 2019 |
| Publisher | Don State Technical University |
| Publisher_xml | – name: Don State Technical University |
| References | ref13 ref12 ref15 ref14 ref11 ref10 ref2 ref1 ref8 ref7 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref13 – ident: ref1 – ident: ref4 – ident: ref2 – ident: ref3 – ident: ref12 doi: 10.1145/1217856.1217858 – ident: ref5 – ident: ref6 – ident: ref7 – ident: ref15 doi: 10.1093/bib/bbt081 – ident: ref14 doi: 10.46298/dmtcs.597 – ident: ref8 doi: 10.23947/1992-5980-2018-18-2-246-255 – ident: ref9 – ident: ref10 – ident: ref11 |
| SSID | ssib050739810 ssib049923048 ssj0002876630 ssib025873731 ssj0002776233 |
| Score | 2.0842328 |
| Snippet | Introduction.
A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string,... Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string,... |
| SourceID | doaj unpaywall crossref |
| SourceType | Open Website Open Access Repository Index Database |
| StartPage | 290 |
| SubjectTerms | aho-corasick algorithm error function information search prefix tree string searching suffix array transition function |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7RRaLlAIVSdQtUOXDN5u3HcVuBEBKoh64EJ8uO7Qotyq42iar213fGCWjVW5FySPxQkhk_vpFnvgG48JwXPmVonWjr45LbIpZGZrGnZNuOs1Q4Ck6-vWPXi_LmvroffXPa0a2S-CWax2VsCSH1s02fkFyTUZSJJRb5lbYJTlaeEMJ_A7usQig-gd3F3ff5QzhJlmhkyZA5Ldzjxs724CK4PssS-z03wHGSyRgvymmGaxLFum1tUYHJfx_e9s1a__6ln562tp-rwyHHahtYC8nrZDnrOzOr__zD6fjqP3sPByMwjeZDoyPYcc0x7G_RFX6Ay3ld4y5F5BI2WhMdZggywLrosYk63S6jlY_GwwksbHFR6qhrS_U6Gh5OYHF1-ePbdTymYYjrDOd_nFcudxpNQ4-zXVojhHSp4aJ0Uvs6z2rvEBM5OtP1JZ3iVbVEy1cbbi2jwNWPMGlWjfsEkS_qvChFKYzAGpYZi_BMGO25IMM0nUL1LHm1Htg2FFopQWOKNKZIY4o0ho-qUKgxhRqbwldS00sf4swOBavNTzVKVwme41exEgGuRhjKjTcmdRkxMXKLZugU2IuS_-vtn1_b8RTeDYOPPGzPYNJteneOGKczX8aR_BdvgfG3 priority: 102 providerName: Unpaywall |
| Title | Accelerated preprocessing in task of searching substrings in a string |
| URI | https://www.vestnik-donstu.ru/jour/article/download/1537/1495 https://doaj.org/article/87233a64533a4167bfbb0e109567d924 |
| UnpaywallVersion | publishedVersion |
| Volume | 19 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1992-6006 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002776233 issn: 1992-5980 databaseCode: DOA dateStart: 20060101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1992-6006 dateEnd: 20201231 omitProxy: true ssIdentifier: ssj0002876630 issn: 1992-6006 databaseCode: DOA dateStart: 20060101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1992-6006 dateEnd: 20201231 omitProxy: true ssIdentifier: ssib050739810 issn: 1992-5980 databaseCode: M~E dateStart: 20010101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3dS8MwED9kgroH8RPnx-jDXsvapm2SxykbQ9jwwcF8CkmbgDq2sXWIL_7t3rVz9E0EoRSSEBJ-l4877vI7gI7jnLkgRetE586Pec58aWToO0q2bXkaCEuPk0fjdDiJH6fJtJbqi2LCKnrgCriu4BFjOo1RLdGoPHDjjAlsSPx5PEfjgU7fQMiaMfVWutNwkzN2AJ0y1lnGvFvGWSZSBLgwQunjR0nM8BCix221O6mk7m_C4Wa-1J8fejar3TeDEzjeKoper5rgKeytNmfQrNEHnkO_l2V4axDZQ-4tiZ6yDPrHNu917hV6_e4tnLd1FmDlGg-JgrquqV17VeECJoP-88PQ36ZF8LMQ96MfJTayGk01h7tP5kYIaQPDRWyldlkUZs6ijmLJx-pi8qolmURLVBue5yk9JL2Exnwxt1fgOZZFLBaxMAJb0tDkqC4Jox0XZCgGLUh-gFHLiv1CodVQAqoIUEWAKgIUi4opBFQhoC24JxR3fYjDuqxAyaqtZNVvkm1BupPBn0a__o_Rb-CoWicU_XoLjWK1sXeofxSmXS41_I---m3Yn4yfei_f6f3RGg |
| linkProvider | Directory of Open Access Journals |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7RRaLlAIVSdQtUOXDN5u3HcVuBEBKoh64EJ8uO7Qotyq42iar213fGCWjVW5FySPxQkhk_vpFnvgG48JwXPmVonWjr45LbIpZGZrGnZNuOs1Q4Ck6-vWPXi_LmvroffXPa0a2S-CWax2VsCSH1s02fkFyTUZSJJRb5lbYJTlaeEMJ_A7usQig-gd3F3ff5QzhJlmhkyZA5Ldzjxs724CK4PssS-z03wHGSyRgvymmGaxLFum1tUYHJfx_e9s1a__6ln562tp-rwyHHahtYC8nrZDnrOzOr__zD6fjqP3sPByMwjeZDoyPYcc0x7G_RFX6Ay3ld4y5F5BI2WhMdZggywLrosYk63S6jlY_GwwksbHFR6qhrS_U6Gh5OYHF1-ePbdTymYYjrDOd_nFcudxpNQ4-zXVojhHSp4aJ0Uvs6z2rvEBM5OtP1JZ3iVbVEy1cbbi2jwNWPMGlWjfsEkS_qvChFKYzAGpYZi_BMGO25IMM0nUL1LHm1Htg2FFopQWOKNKZIY4o0ho-qUKgxhRqbwldS00sf4swOBavNTzVKVwme41exEgGuRhjKjTcmdRkxMXKLZugU2IuS_-vtn1_b8RTeDYOPPGzPYNJteneOGKczX8aR_BdvgfG3 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerated+preprocessing+in+task+of+searching+substrings+in+a+string&rft.jtitle=Vestnik+Donskogo+gosudarstvennogo+tehni%C4%8Deskogo+universiteta+%28Online%29&rft.au=Mazurenko%2C+A.+V.&rft.au=Boldyrikhin%2C+N.+V.&rft.date=2019-10-04&rft.issn=1992-5980&rft.eissn=1992-6006&rft.volume=19&rft.issue=3&rft.spage=290&rft.epage=300&rft_id=info:doi/10.23947%2F1992-5980-2019-19-3-290-300&rft.externalDBID=n%2Fa&rft.externalDocID=10_23947_1992_5980_2019_19_3_290_300 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1992-5980&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1992-5980&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1992-5980&client=summon |