Differential direct coding: a compression algorithm for nucleotide sequence data

While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more i...

Full description

Saved in:
Bibliographic Details
Published inDatabase : the journal of biological databases and curation Vol. 2009; p. bap013
Main Author Vey, Gregory
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.01.2009
Subjects
Online AccessGet full text
ISSN1758-0463
1758-0463
DOI10.1093/database/bap013

Cover

Abstract While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.
AbstractList While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.
While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.
While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. [PUBLICATION ABSTRACT]
Author Vey, Gregory
AuthorAffiliation Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
AuthorAffiliation_xml – name: Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
Author_xml – sequence: 1
  givenname: Gregory
  surname: Vey
  fullname: Vey, Gregory
  organization: Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
BackLink https://www.ncbi.nlm.nih.gov/pubmed/20157486$$D View this record in MEDLINE/PubMed
BookMark eNqFkc1rFTEUxYO02Pbp2p0MCMXN8-VzJuNCKPWrUNCFrsOdzJ3XlJlkTDJK_3vzeK3WLuoqF_I7h3vOPSEHPngk5AWjbxhtxaaHDB0k3HQwUyaekGPWKL2mshYH9-YjcpLSNaV1o7V8So44ZaqRuj4mX9-7YcCIPjsYq95FtLmyoXd--7aCMk1zxJRc8BWM2xBdvpqqIcTKL3bEkF2PVcIfC3qL1W6bZ-RwgDHh89t3Rb5__PDt_PP68suni_Ozy7WViue1kABCCUF5W_c9SLSyawRV0qLogVnLWz6ArpHxXnSade3QCNZ1XDGphRzEitC97-JnuPkF42jm6CaIN4ZRsyvH3JVj9uUUybu9ZF66CXtbQkf4KwvgzL8_3l2ZbfhpeNM2Uu0MTm8NYiiRUzaTSxbHETyGJRnOpOSK1wV8_SjItGqV1qJhBX31AL0OS_SluhKCF0cuSpYVeXl_9T873x2yAGoP2BhSijgY6zLkcreSxI2PdLJ5oPtfi78BF_nHLw
CitedBy_id crossref_primary_10_3390_e21111074
crossref_primary_10_1093_gigascience_giac079
crossref_primary_10_1007_s11227_016_1753_4
crossref_primary_10_1007_s13222_012_0098_2
crossref_primary_10_1136_amiajnl_2013_001694
crossref_primary_10_1093_gigascience_giaa119
Cites_doi 10.1093/bioinformatics/18.12.1696
10.1016/0306-4573(94)90014-0
10.1016/j.jtbi.2008.03.011
10.1093/nar/gkn942
10.1109/JRPROC.1952.273898
10.1093/nar/gkm929
10.1016/0300-9084(96)84763-8
10.1007/11496656_17
10.1186/1471-2105-9-176
10.1109/TIT.1978.1055934
10.1007/978-1-84800-072-8
10.1109/TIT.1977.1055714
10.1109/51.940049
10.1016/j.bulm.2004.10.005
10.1126/science.1093857
10.1016/S0378-4371(01)00661-6
ContentType Journal Article
Copyright The Author(s) 2009. Published by Oxford University Press. This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
The Author(s) 2009. Published by Oxford University Press. 2009
Copyright_xml – notice: The Author(s) 2009. Published by Oxford University Press. This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
– notice: The Author(s) 2009. Published by Oxford University Press. 2009
DBID AAYXX
CITATION
NPM
K9.
7X8
7TM
5PM
ADTOC
UNPAY
DOI 10.1093/database/bap013
DatabaseName CrossRef
PubMed
ProQuest Health & Medical Complete (Alumni)
MEDLINE - Academic
Nucleic Acids Abstracts
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
PubMed
ProQuest Health & Medical Complete (Alumni)
MEDLINE - Academic
Nucleic Acids Abstracts
DatabaseTitleList Nucleic Acids Abstracts
MEDLINE - Academic

PubMed
ProQuest Health & Medical Complete (Alumni)
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1758-0463
EndPage bap013
ExternalDocumentID 10.1093/database/bap013
PMC2797453
2696197311
20157486
10_1093_database_bap013
Genre Journal Article
GroupedDBID ---
.I3
0R~
18M
53G
5VS
5WA
70E
AAHBH
AAMVS
AAPXW
AAVAP
AAYXX
ABDBF
ABEJV
ABGNP
ABPTD
ABXVV
ACGFO
ACGFS
ACPRK
ACUHS
ADBBV
ADHZD
ADRAZ
AENZO
AHMBA
AIAGR
ALMA_UNASSIGNED_HOLDINGS
ALUQC
AMNDL
AOIJS
BAWUL
BAYMD
BCNDV
CIDKT
CITATION
CZ4
DIK
D~K
E3Z
EBD
EBS
EMOBN
ESX
GROUPED_DOAJ
GX1
H13
HYE
HZ~
KSI
M48
MK~
M~E
O5R
O5S
OAWHX
OJQWA
OK1
O~Y
P2P
PEELM
PQQKQ
RD5
RPM
RXO
SV3
TOX
TR2
TUS
X7H
ZBA
~91
~D7
~S-
EJD
NPM
K9.
7X8
7TM
5PM
ADTOC
UNPAY
ID FETCH-LOGICAL-c452t-34aa35330296dda4ec4b73054ce3da1cc292fa86e12d3b81b9f731bb2514834f3
IEDL.DBID M48
ISSN 1758-0463
IngestDate Wed Oct 29 11:22:15 EDT 2025
Tue Sep 30 16:39:56 EDT 2025
Wed Oct 01 14:20:16 EDT 2025
Fri Jul 11 08:50:19 EDT 2025
Tue Oct 07 06:07:22 EDT 2025
Thu Apr 03 06:59:15 EDT 2025
Tue Jul 01 04:03:36 EDT 2025
Thu Apr 24 23:11:10 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License http://creativecommons.org/licenses/by-nc/2.5/uk
This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c452t-34aa35330296dda4ec4b73054ce3da1cc292fa86e12d3b81b9f731bb2514834f3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.1093/database/bap013
PMID 20157486
PQID 1022142309
PQPubID 135335
ParticipantIDs unpaywall_primary_10_1093_database_bap013
pubmedcentral_primary_oai_pubmedcentral_nih_gov_2797453
proquest_miscellaneous_21442526
proquest_miscellaneous_1859588371
proquest_journals_1022142309
pubmed_primary_20157486
crossref_citationtrail_10_1093_database_bap013
crossref_primary_10_1093_database_bap013
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2009-01-01
2009-00-00
20090101
PublicationDateYYYYMMDD 2009-01-01
PublicationDate_xml – month: 01
  year: 2009
  text: 2009-01-01
  day: 01
PublicationDecade 2000
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Database : the journal of biological databases and curation
PublicationTitleAlternate Database (Oxford)
PublicationYear 2009
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Benson ( key 20180618194039_B2) 2008; 36
Salomon ( key 20180618194039_B7) 2008
Behzadi ( key 20180618194039_B9) 2005
Milosavljević ( key 20180618194039_B12) 1993; 1
Galperin ( key 20180618194039_B1) 2009; 37
Rivals ( key 20180618194039_B13) 1996; 78
Bonanno ( key 20180618194039_B18) 2002; 305
Grumbach ( key 20180618194039_B10) 1993
Chen ( key 20180618194039_B16) 2001; 20
Menconi ( key 20180618194039_B20) 2008; 253
Ziv ( key 20180618194039_B5) 1977; 23
Cherniavski ( key 20180618194039_B14) 2004
Grumbach ( key 20180618194039_B11) 1994; 30
Hoebeke ( key 20180618194039_B3) 2005
Williams ( key 20180618194039_B4)
Huffman ( key 20180618194039_B8) 1952; 40
Menconi ( key 20180618194039_B19) 2005; 67
Chen ( key 20180618194039_B17) 2002; 18
Ziv ( key 20180618194039_B6) 1978; 24
Liu ( key 20180618194039_B15) 2008; 9
Venter ( key 20180618194039_B21) 2004; 304
12490460 - Bioinformatics. 2002 Dec;18(12):1696-8
11494771 - IEEE Eng Med Biol Mag. 2001 Jul-Aug;20(4):61-6
18073190 - Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30
18373878 - BMC Bioinformatics. 2008 Mar 31;9:176
15001713 - Science. 2004 Apr 2;304(5667):66-74
8905150 - Biochimie. 1996;78(5):315-22
18430439 - J Theor Biol. 2008 Jul 21;253(2):281-8
7584347 - Proc Int Conf Intell Syst Mol Biol. 1993;1:284-91
19033364 - Nucleic Acids Res. 2009 Jan;37(Database issue):D1-4
15893551 - Bull Math Biol. 2005 Jul;67(4):737-59
References_xml – volume: 18
  start-page: 1696
  year: 2002
  ident: key 20180618194039_B17
  article-title: DNACompress: fast and effective DNA sequence compression
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/18.12.1696
– volume: 30
  start-page: 875
  year: 1994
  ident: key 20180618194039_B11
  article-title: A new challenge for compression algorithms: genetic sequences
  publication-title: J. Info. Process. Manag.
  doi: 10.1016/0306-4573(94)90014-0
– volume: 1
  start-page: 284
  year: 1993
  ident: key 20180618194039_B12
  article-title: Discovering sequence similarity by the algorithmic significance method
  publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol.
– start-page: 340
  volume-title: Proceedings of IEEE Symposium on Data Compression
  year: 1993
  ident: key 20180618194039_B10
  article-title: Compression of DNA sequences
– volume: 253
  start-page: 281
  year: 2008
  ident: key 20180618194039_B20
  article-title: Data compression and genomes: a two dimensional life domain map
  publication-title: J. Theoret. Biol.
  doi: 10.1016/j.jtbi.2008.03.011
– volume: 37
  start-page: D1
  year: 2009
  ident: key 20180618194039_B1
  article-title: Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkn942
– volume: 40
  start-page: 1098
  year: 1952
  ident: key 20180618194039_B8
  article-title: A method for the construction of minimum-redundancy codes
  publication-title: Proc. IRE
  doi: 10.1109/JRPROC.1952.273898
– volume: 36
  start-page: D25
  year: 2008
  ident: key 20180618194039_B2
  article-title: GenBank
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkm929
– volume: 78
  start-page: 315
  year: 1996
  ident: key 20180618194039_B13
  article-title: Compression and genetic sequence analysis
  publication-title: Biochimie
  doi: 10.1016/0300-9084(96)84763-8
– start-page: 190
  volume-title: Symposium on Combinatorial Pattern Matching (CPM'2005)
  year: 2005
  ident: key 20180618194039_B9
  article-title: DNA compression challenge revisited
  doi: 10.1007/11496656_17
– volume: 9
  start-page: 176
  year: 2008
  ident: key 20180618194039_B15
  article-title: RNACompress: grammar-based compression and informational complexity measurement of RNA secondary structure
  publication-title: BMC Bioinformatics
  doi: 10.1186/1471-2105-9-176
– volume: 24
  start-page: 530
  year: 1978
  ident: key 20180618194039_B6
  article-title: Compression of individual sequences via variable-rate coding
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.1978.1055934
– volume-title: Concise Introduction to Data Compression.
  year: 2008
  ident: key 20180618194039_B7
  doi: 10.1007/978-1-84800-072-8
– volume: 23
  start-page: 337
  year: 1977
  ident: key 20180618194039_B5
  article-title: A universal algorithm for sequential data compression
  publication-title: IEEE Trans. Inform. Theory
  doi: 10.1109/TIT.1977.1055714
– start-page: 1
  volume-title: Database Annotation in Molecular Biology
  year: 2005
  ident: key 20180618194039_B3
  article-title: Annotation and databases: status and prospects
– volume: 20
  start-page: 61
  year: 2001
  ident: key 20180618194039_B16
  article-title: A compression algorithm for DNA sequences
  publication-title: IEEE Eng. Med. Biol. Mag.
  doi: 10.1109/51.940049
– ident: key 20180618194039_B4
– volume: 67
  start-page: 737
  year: 2005
  ident: key 20180618194039_B19
  article-title: Sublinear growth of information in DNA sequences
  publication-title: Bulletin Math. Biol.
  doi: 10.1016/j.bulm.2004.10.005
– volume-title: Computer Science & Engineering Technical Report
  year: 2004
  ident: key 20180618194039_B14
  article-title: Grammar-based compression of DNA sequences
– volume: 304
  start-page: 66
  year: 2004
  ident: key 20180618194039_B21
  article-title: Environmental genome shotgun sequencing of the Sargasso Sea
  publication-title: Science
  doi: 10.1126/science.1093857
– volume: 305
  start-page: 196
  year: 2002
  ident: key 20180618194039_B18
  article-title: Information of sequences and applications
  publication-title: Physica A
  doi: 10.1016/S0378-4371(01)00661-6
– reference: 18430439 - J Theor Biol. 2008 Jul 21;253(2):281-8
– reference: 18373878 - BMC Bioinformatics. 2008 Mar 31;9:176
– reference: 15001713 - Science. 2004 Apr 2;304(5667):66-74
– reference: 19033364 - Nucleic Acids Res. 2009 Jan;37(Database issue):D1-4
– reference: 7584347 - Proc Int Conf Intell Syst Mol Biol. 1993;1:284-91
– reference: 12490460 - Bioinformatics. 2002 Dec;18(12):1696-8
– reference: 18073190 - Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30
– reference: 11494771 - IEEE Eng Med Biol Mag. 2001 Jul-Aug;20(4):61-6
– reference: 8905150 - Biochimie. 1996;78(5):315-22
– reference: 15893551 - Bull Math Biol. 2005 Jul;67(4):737-59
SSID ssj0067884
Score 1.8368045
Snippet While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage bap013
SubjectTerms Algorithms
Genetics
Original
Studies
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2VrRAnWr6DChiJAxyS1l-Jw60qVBUSVQ-sVA4ochy7XRGSVckKtQd-O-ONk7KsKsSBY-RxZDvj8XP8_AbglaKuzDh1sbIpblBypTEOahZT66hRlSnpUkzn43F6NBUfTuXpBnwZ7sLowApPhisNnijpA_puGMl4Xrlr2YGcXxuUeo54ZpemGVMyzcJzgva3YDOVCNUnsDk9Ptn_vLwkKZWnNfJB7mf9Tasr1Rr8XGdR3lk0c335Q9f1b0vU4Rb8HDrXM1O-JouuTMzVH7qP_63323A3gFuy37_lHmzY5j7c7tNdXj6Ak3chGwtGlZr0iykxrV893xJNPLu9Z-U2RNdn7cWsO_9GEFSTxosut92ssmSgfhPfpIcwPXz_6eAoDhkdYiMk62IutOaez8rytKq0sEagpyBqNJZXmhrDcua0Si1lFS8RUecOHaksEYT5n56OP4JJ0zb2CRDlZCZctaclRhShnXKMZVhFVJllTqgIkuHrFSbInfusG3XRH7vzYhi6oh-qCF6PFea90sfNpjuDOxRhyn8v_NaZIjjdyyN4ORbjZPUnMLqx7QJtvJqcUjyjEby4wcZr2DHJ0gge9_41tgaxGvZZYUm24nmjgZcKXy1pZudLyXCW4b5RYsPfjD76t04-_QfbHZh0Fwv7DPFZVz4Ps-wXGmpCLw
  priority: 102
  providerName: Unpaywall
Title Differential direct coding: a compression algorithm for nucleotide sequence data
URI https://www.ncbi.nlm.nih.gov/pubmed/20157486
https://www.proquest.com/docview/1022142309
https://www.proquest.com/docview/1859588371
https://www.proquest.com/docview/21442526
https://pubmed.ncbi.nlm.nih.gov/PMC2797453
https://academic.oup.com/database/article-pdf/doi/10.1093/database/bap013/16728567/bap013.pdf
UnpaywallVersion publishedVersion
Volume 2009
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: DOA
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVEBS
  databaseName: Academic Search Ultimate - eBooks
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: ABDBF
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: DIK
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: GX1
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVFQY
  databaseName: GFMER Free Medical Journals
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: GX1
  dateStart: 0
  isFulltext: true
  titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php
  providerName: Geneva Foundation for Medical Education and Research
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: M~E
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: RPM
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVASL
  databaseName: Oxford Journals Free Titles 2012-2013 - NESLI2
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: 70E
  dateStart: 0
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals
  providerName: Oxford University Press
– providerCode: PRVASL
  databaseName: Oxford Journals Open Access Collection
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: TOX
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: https://academic.oup.com/journals/
  providerName: Oxford University Press
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 1758-0463
  dateEnd: 20250131
  omitProxy: true
  ssIdentifier: ssj0067884
  issn: 1758-0463
  databaseCode: M48
  dateStart: 20090101
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwEB6hVgguFeUZaBcjcYBD2rXjJA4SQhVQKqRWPbDScoocx6YrpcmyZAX775lZJ6HVUiFxycW24szDM-OZfAPwUnFXpBF3obIJBiiZ0ngOahFy67hRpSn4Gkzn9Cw5mcjP03j6px1QR8Affw3tqJ_UZFEd_Pq-eocK_7YDQzqkWko68w8LPR9TB9ttNFMZ9XE4lUNKAQ9lJXtsn81FBAqMljGV9FP1VQu14XZuVk_eWdZzvfqpq-qKaTq-BzudT8mOvBDswi1b34fbvsvk6gGcf-iaoKAyV8zbMGYaMlpvmGZUVO6LYWumq2_NYtZeXDL0ZVlNWMdNOyst6yuuGX3QQ5gcf_zy_iTsGimERsaiDSOpdURlpCJLylJLayQyCJ01Y6NSc2NEJpxWieWijAp0ZDOH_CsK9H3ortFFj2Crbmr7BJhySCNXjnWMiiy1U06IFJfIMrXCSRXAQU-83HQo49Tsosp9tjvKe8LnnvABvBoWzD3Axs1T93pu5L2g5BSxcvQJx1kAL4Zh1BFKfOjaNkucQyBuCkNxHsDzG-YQdJyIRRLAY8_eYTe9XASQXmP8MIEQuq-P1LOLNVK3SDFci3HjrwcR-ddHPv3vtzyDuz7VRfdDe7DVLpZ2Hz2mthitbxrw-WnKR2utGMH25Oz86Otv49Mgjg
linkProvider Scholars Portal
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2VrRAnWr6DChiJAxyS1l-Jw60qVBUSVQ-sVA4ochy7XRGSVckKtQd-O-ONk7KsKsSBY-RxZDvj8XP8_AbglaKuzDh1sbIpblBypTEOahZT66hRlSnpUkzn43F6NBUfTuXpBnwZ7sLowApPhisNnijpA_puGMl4Xrlr2YGcXxuUeo54ZpemGVMyzcJzgva3YDOVCNUnsDk9Ptn_vLwkKZWnNfJB7mf9Tasr1Rr8XGdR3lk0c335Q9f1b0vU4Rb8HDrXM1O-JouuTMzVH7qP_63323A3gFuy37_lHmzY5j7c7tNdXj6Ak3chGwtGlZr0iykxrV893xJNPLu9Z-U2RNdn7cWsO_9GEFSTxosut92ssmSgfhPfpIcwPXz_6eAoDhkdYiMk62IutOaez8rytKq0sEagpyBqNJZXmhrDcua0Si1lFS8RUecOHaksEYT5n56OP4JJ0zb2CRDlZCZctaclRhShnXKMZVhFVJllTqgIkuHrFSbInfusG3XRH7vzYhi6oh-qCF6PFea90sfNpjuDOxRhyn8v_NaZIjjdyyN4ORbjZPUnMLqx7QJtvJqcUjyjEby4wcZr2DHJ0gge9_41tgaxGvZZYUm24nmjgZcKXy1pZudLyXCW4b5RYsPfjD76t04-_QfbHZh0Fwv7DPFZVz4Ps-wXGmpCLw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Differential+direct+coding%3A+a+compression+algorithm+for+nucleotide+sequence+data&rft.jtitle=Database+%3A+the+journal+of+biological+databases+and+curation&rft.au=Vey%2C+Gregory&rft.date=2009&rft.eissn=1758-0463&rft.volume=2009&rft.spage=bap013&rft_id=info:doi/10.1093%2Fdatabase%2Fbap013&rft_id=info%3Apmid%2F20157486&rft.externalDocID=20157486
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1758-0463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1758-0463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1758-0463&client=summon