Formally Verified Suffix Array Construction Formally Verified Suffix Array Construction

Suffix arrays are a data structure with numerous real-world applications. They are extensively used in text retrieval and data compression applications, including query suggestion mechanisms in web search, and in bioinformatics tools for DNA sequencing and matching. This wide applicability means tha...

Full description

Saved in:
Bibliographic Details
Published inJournal of automated reasoning Vol. 69; no. 3; p. 21
Main Authors Cheung, Louis, Moffat, Alistair, Rizkallah, Christine
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Netherlands 01.09.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0168-7433
1573-0670
1573-0670
DOI10.1007/s10817-025-09735-8

Cover

More Information
Summary:Suffix arrays are a data structure with numerous real-world applications. They are extensively used in text retrieval and data compression applications, including query suggestion mechanisms in web search, and in bioinformatics tools for DNA sequencing and matching. This wide applicability means that algorithms for constructing suffix arrays are of great practical importance. The SA-IS algorithm is an efficient but conceptually complex suffix array construction technique, and implementing it requires a deep understanding of its underlying theory. As a critical step towards developing a provably correct and efficient implementation, we have developed the SA-IS algorithm in Isabelle/HOL and formally verified that it is equivalent to a mathematical functional specification of suffix arrays, a task that required verifying a wide range of underlying properties of strings and suffixes. We also used Isabelle’s code extraction facilities to extract an executable Haskell implementation of SA-IS, which albeit is inefficient due to using lists and natural numbers rather than arrays and machine words, demonstrates that our verified HOL implementation of SA-IS can be refined to an executable implementation in its current form.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0168-7433
1573-0670
1573-0670
DOI:10.1007/s10817-025-09735-8