Accelerated preprocessing in task of searching substrings in a string

Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative s...

Full description

Saved in:
Bibliographic Details
Published inAdvanced engineering research (Rostov-na-Donu, Russia) Vol. 19; no. 3; pp. 290 - 300
Main Authors Mazurenko, A. V., Boldyrikhin, N. V.
Format Journal Article
LanguageEnglish
Russian
Published Don State Technical University 04.10.2019
Subjects
Online AccessGet full text
ISSN1992-5980
1992-6006
1992-6006
2687-1653
DOI10.23947/1992-5980-2019-19-3-290-300

Cover

Abstract Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks. Materials and Methods . The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed. Research Results . The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given. Discussion and Conclusions . The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search.
AbstractList Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks.Materials and Methods. The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed.Research Results. The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given.Discussion and Conclusions. The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search.
Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string, and approaches to its solution are actively investigated. This task is used to create database management systems that support associative search. Besides, it is applicable in solving information security issues and creating antivirus programs. Algorithms of searching substring in a string are used in signature-based discovery tasks. Materials and Methods . The solution to the problem is based on the Aho-Corasick algorithm which is a typical technique of searching substrings in a string. At the same time, a new approach regarding preprocessing is employed. Research Results . The possibility of constructing the transition function and suffix references through suffix arrays and special mappings, is shown. The relationship between the prefix tree and suffix arrays was investigated, which provided the development of a fundamentally new method of constructing the transition and error functions. The results obtained enable to substantially shorten the time intervals spent on the preelection processing of a set of pattern strings when using an integer alphabet. The paper lists eight algorithms. The developed algorithms are evaluated. The results obtained are compared to the formerly known. Two theorems and eight lemmas are proved. Two examples illustrating features of the practical application of the developed preprocessing procedure are given. Discussion and Conclusions . The preprocessing procedure proposed in this paper is based on the communication between the suffix array built on the ground of a set of pattern strings and the construction of transition and error functions at the initial stages of the Aho-Corasick algorithm. This approach differs from the traditional one and requires the use of algorithms providing a suffix array in linear time. Thus, the algorithms that enable to significantly reduce the time for preprocessing of a set of pattern strings under the condition of using a certain type of alphabet in comparison to the known approach proposed in the Aho- Corasick algorithm are described. The research results presented in the paper can be used in antivirus programs that apply searching for signatures of malicious data objects in the memory of a computer system. In addition, this approach to solving the problem on searching substrings in a string will significantly speed up the operation of database management systems using associative search.
Author Mazurenko, A. V.
Boldyrikhin, N. V.
Author_xml – sequence: 1
  givenname: A. V.
  orcidid: 0000-0001-9541-3374
  surname: Mazurenko
  fullname: Mazurenko, A. V.
  organization: DDoS-GUARD LLC
– sequence: 2
  givenname: N. V.
  orcidid: 0000-0002-9896-9543
  surname: Boldyrikhin
  fullname: Boldyrikhin, N. V.
  organization: Don State Technical University
BookMark eNqVkF9LwzAUxYNMcM59hz74Wr3506YBfRhj6mDgiz6HJL2d1dqWpEP27U3X4bsQknvPvecQftdk1nYtEnJL4Y5xJeQ9VYqlmSogZUBVGg9PmYKUA1yQ-WmaA-Szcz1uXpFlCLWFDCRXBYU52aycwwa9GbBMeo-97xzGnXaf1G0ymPCVdFUS0Hj3MYrhYMPgYxXGuUmm5oZcVqYJuDy_C_L-tHlbv6S71-fterVLHZU8_jNDhoZRqEAUqrRFoRCsLAQqUzlGXYU0i1IuWSU4CJk5RTk3VpZlLrjgC7KdcsvOfOre19_GH3Vnan0SOr_Xxg-1a1AXkkVnLrJ4C5pLW1kLSEFluSwVG7Mep6xD25vjj2mav0AK-oRYj-D0CE6PiGOruY6IdUQc_Q-T3_kuBI_V_-y_UWyCBA
Cites_doi 10.1145/1217856.1217858
10.1093/bib/bbt081
10.46298/dmtcs.597
10.23947/1992-5980-2018-18-2-246-255
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOA
DOI 10.23947/1992-5980-2019-19-3-290-300
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1992-6006
2687-1653
EndPage 300
ExternalDocumentID oai_doaj_org_article_87233a64533a4167bfbb0e109567d924
10.23947/1992-5980-2019-19-3-290-300
10_23947_1992_5980_2019_19_3_290_300
GroupedDBID 642
AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
ADTOC
UNPAY
ID FETCH-LOGICAL-c1730-25e2ea210f0489db889e0b784e9afc21cfe159e0672f430475c9133ab7dd64343
IEDL.DBID DOA
ISSN 1992-5980
1992-6006
IngestDate Fri Oct 03 12:38:54 EDT 2025
Wed Oct 01 16:39:31 EDT 2025
Tue Jul 01 02:58:42 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
Russian
License https://vestnik.donstu.ru/jour/about/editorialPolicies#openAccessPolicy
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1730-25e2ea210f0489db889e0b784e9afc21cfe159e0672f430475c9133ab7dd64343
ORCID 0000-0002-9896-9543
0000-0001-9541-3374
OpenAccessLink https://doaj.org/article/87233a64533a4167bfbb0e109567d924
PageCount 11
ParticipantIDs doaj_primary_oai_doaj_org_article_87233a64533a4167bfbb0e109567d924
unpaywall_primary_10_23947_1992_5980_2019_19_3_290_300
crossref_primary_10_23947_1992_5980_2019_19_3_290_300
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-10-04
PublicationDateYYYYMMDD 2019-10-04
PublicationDate_xml – month: 10
  year: 2019
  text: 2019-10-04
  day: 04
PublicationDecade 2010
PublicationTitle Advanced engineering research (Rostov-na-Donu, Russia)
PublicationYear 2019
Publisher Don State Technical University
Publisher_xml – name: Don State Technical University
References ref13
ref12
ref15
ref14
ref11
ref10
ref2
ref1
ref8
ref7
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref13
– ident: ref1
– ident: ref4
– ident: ref2
– ident: ref3
– ident: ref12
  doi: 10.1145/1217856.1217858
– ident: ref5
– ident: ref6
– ident: ref7
– ident: ref15
  doi: 10.1093/bib/bbt081
– ident: ref14
  doi: 10.46298/dmtcs.597
– ident: ref8
  doi: 10.23947/1992-5980-2018-18-2-246-255
– ident: ref9
– ident: ref10
– ident: ref11
SSID ssib050739810
ssib049923048
ssj0002876630
ssib025873731
ssj0002776233
Score 2.0842328
Snippet Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string,...
Introduction. A rapid development of the systems such as Yandex, Google, etc., has predetermined the relevance of the task of searching substrings in a string,...
SourceID doaj
unpaywall
crossref
SourceType Open Website
Open Access Repository
Index Database
StartPage 290
SubjectTerms aho-corasick algorithm
error function
information search
prefix tree
string searching
suffix array
transition function
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7RRaLlAIVSdQtUOXDN5u3HcVuBEBKoh64EJ8uO7Qotyq42iar213fGCWjVW5FySPxQkhk_vpFnvgG48JwXPmVonWjr45LbIpZGZrGnZNuOs1Q4Ck6-vWPXi_LmvroffXPa0a2S-CWax2VsCSH1s02fkFyTUZSJJRb5lbYJTlaeEMJ_A7usQig-gd3F3ff5QzhJlmhkyZA5Ldzjxs724CK4PssS-z03wHGSyRgvymmGaxLFum1tUYHJfx_e9s1a__6ln562tp-rwyHHahtYC8nrZDnrOzOr__zD6fjqP3sPByMwjeZDoyPYcc0x7G_RFX6Ay3ld4y5F5BI2WhMdZggywLrosYk63S6jlY_GwwksbHFR6qhrS_U6Gh5OYHF1-ePbdTymYYjrDOd_nFcudxpNQ4-zXVojhHSp4aJ0Uvs6z2rvEBM5OtP1JZ3iVbVEy1cbbi2jwNWPMGlWjfsEkS_qvChFKYzAGpYZi_BMGO25IMM0nUL1LHm1Htg2FFopQWOKNKZIY4o0ho-qUKgxhRqbwldS00sf4swOBavNTzVKVwme41exEgGuRhjKjTcmdRkxMXKLZugU2IuS_-vtn1_b8RTeDYOPPGzPYNJteneOGKczX8aR_BdvgfG3
  priority: 102
  providerName: Unpaywall
Title Accelerated preprocessing in task of searching substrings in a string
URI https://www.vestnik-donstu.ru/jour/article/download/1537/1495
https://doaj.org/article/87233a64533a4167bfbb0e109567d924
UnpaywallVersion publishedVersion
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1992-6006
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002776233
  issn: 1992-5980
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1992-6006
  dateEnd: 20201231
  omitProxy: true
  ssIdentifier: ssj0002876630
  issn: 1992-6006
  databaseCode: DOA
  dateStart: 20060101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1992-6006
  dateEnd: 20201231
  omitProxy: true
  ssIdentifier: ssib050739810
  issn: 1992-5980
  databaseCode: M~E
  dateStart: 20010101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3dS8MwED9kgroH8RPnx-jDXsvapm2SxykbQ9jwwcF8CkmbgDq2sXWIL_7t3rVz9E0EoRSSEBJ-l4877vI7gI7jnLkgRetE586Pec58aWToO0q2bXkaCEuPk0fjdDiJH6fJtJbqi2LCKnrgCriu4BFjOo1RLdGoPHDjjAlsSPx5PEfjgU7fQMiaMfVWutNwkzN2AJ0y1lnGvFvGWSZSBLgwQunjR0nM8BCix221O6mk7m_C4Wa-1J8fejar3TeDEzjeKoper5rgKeytNmfQrNEHnkO_l2V4axDZQ-4tiZ6yDPrHNu917hV6_e4tnLd1FmDlGg-JgrquqV17VeECJoP-88PQ36ZF8LMQ96MfJTayGk01h7tP5kYIaQPDRWyldlkUZs6ijmLJx-pi8qolmURLVBue5yk9JL2Exnwxt1fgOZZFLBaxMAJb0tDkqC4Jox0XZCgGLUh-gFHLiv1CodVQAqoIUEWAKgIUi4opBFQhoC24JxR3fYjDuqxAyaqtZNVvkm1BupPBn0a__o_Rb-CoWicU_XoLjWK1sXeofxSmXS41_I---m3Yn4yfei_f6f3RGg
linkProvider Directory of Open Access Journals
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT9wwEB7RRaLlAIVSdQtUOXDN5u3HcVuBEBKoh64EJ8uO7Qotyq42iar213fGCWjVW5FySPxQkhk_vpFnvgG48JwXPmVonWjr45LbIpZGZrGnZNuOs1Q4Ck6-vWPXi_LmvroffXPa0a2S-CWax2VsCSH1s02fkFyTUZSJJRb5lbYJTlaeEMJ_A7usQig-gd3F3ff5QzhJlmhkyZA5Ldzjxs724CK4PssS-z03wHGSyRgvymmGaxLFum1tUYHJfx_e9s1a__6ln562tp-rwyHHahtYC8nrZDnrOzOr__zD6fjqP3sPByMwjeZDoyPYcc0x7G_RFX6Ay3ld4y5F5BI2WhMdZggywLrosYk63S6jlY_GwwksbHFR6qhrS_U6Gh5OYHF1-ePbdTymYYjrDOd_nFcudxpNQ4-zXVojhHSp4aJ0Uvs6z2rvEBM5OtP1JZ3iVbVEy1cbbi2jwNWPMGlWjfsEkS_qvChFKYzAGpYZi_BMGO25IMM0nUL1LHm1Htg2FFopQWOKNKZIY4o0ho-qUKgxhRqbwldS00sf4swOBavNTzVKVwme41exEgGuRhjKjTcmdRkxMXKLZugU2IuS_-vtn1_b8RTeDYOPPGzPYNJteneOGKczX8aR_BdvgfG3
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Accelerated+preprocessing+in+task+of+searching+substrings+in+a+string&rft.jtitle=Vestnik+Donskogo+gosudarstvennogo+tehni%C4%8Deskogo+universiteta+%28Online%29&rft.au=Mazurenko%2C+A.+V.&rft.au=Boldyrikhin%2C+N.+V.&rft.date=2019-10-04&rft.issn=1992-5980&rft.eissn=1992-6006&rft.volume=19&rft.issue=3&rft.spage=290&rft.epage=300&rft_id=info:doi/10.23947%2F1992-5980-2019-19-3-290-300&rft.externalDBID=n%2Fa&rft.externalDocID=10_23947_1992_5980_2019_19_3_290_300
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1992-5980&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1992-5980&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1992-5980&client=summon