Differential direct coding: a compression algorithm for nucleotide sequence data

While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more i...

Full description

Saved in:

Bibliographic Details
Published in	Database : the journal of biological databases and curation Vol. 2009; p. bap013
Main Author	Vey, Gregory
Format	Journal Article
Language	English
Published	England Oxford University Press 01.01.2009
Subjects	Algorithms Genetics Original Studies
Online Access	Get full text
ISSN	1758-0463 1758-0463
DOI	10.1093/database/bap013

Cover

Abstract	While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.
AbstractList	While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations.While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of paramount importance in order to facilitate fast search and retrieval operations through a reduction in disk traffic. This issue becomes even more important in light of the recent increase of very large data sets, such as metagenomes. In this article, I propose the Differential Direct Coding algorithm, a general-purpose nucleotide compression protocol that can differentiate between sequence data and auxiliary data by supporting the inclusion of supplementary symbols that are not members of the set of expected nucleotide bases, thereby offering reconciliation between sequence-specific and general-purpose compression strategies. This algorithm permits a sequence to contain a rich lexicon of auxiliary symbols that can represent wildcards, annotation data and special subsequences, such as functional domains or special repeats. In particular, the representation of special subsequences can be incorporated to provide structure-based coding that increases the overall degree of compression. Moreover, supporting a robust set of symbols removes the requirement of wildcard elimination and restoration phases, resulting in a complexity of O(n) for execution time, making this algorithm suitable for very large data sets. Because this algorithm compresses data on the basis of triplets, it is highly amenable to interpretation as a polypeptide at decompression time. Also, an encoded sequence may be further compressed using other existing algorithms, like gzip, thereby maximizing the final degree of compression. Overall, the Differential Direct Coding algorithm can offer a beneficial impact on disk traffic for database queries and other disk-intensive operations. [PUBLICATION ABSTRACT]
Author	Vey, Gregory
AuthorAffiliation	Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
AuthorAffiliation_xml	– name: Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
Author_xml	– sequence: 1 givenname: Gregory surname: Vey fullname: Vey, Gregory organization: Department of Biology, Wilfrid Laurier University, 75 University Avenue West, Waterloo ON, Canada N2L 3C5
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/20157486$$D View this record in MEDLINE/PubMed
BookMark	eNqFkc1rFTEUxYO02Pbp2p0MCMXN8-VzJuNCKPWrUNCFrsOdzJ3XlJlkTDJK_3vzeK3WLuoqF_I7h3vOPSEHPngk5AWjbxhtxaaHDB0k3HQwUyaekGPWKL2mshYH9-YjcpLSNaV1o7V8So44ZaqRuj4mX9-7YcCIPjsYq95FtLmyoXd--7aCMk1zxJRc8BWM2xBdvpqqIcTKL3bEkF2PVcIfC3qL1W6bZ-RwgDHh89t3Rb5__PDt_PP68suni_Ozy7WViue1kABCCUF5W_c9SLSyawRV0qLogVnLWz6ArpHxXnSade3QCNZ1XDGphRzEitC97-JnuPkF42jm6CaIN4ZRsyvH3JVj9uUUybu9ZF66CXtbQkf4KwvgzL8_3l2ZbfhpeNM2Uu0MTm8NYiiRUzaTSxbHETyGJRnOpOSK1wV8_SjItGqV1qJhBX31AL0OS_SluhKCF0cuSpYVeXl_9T873x2yAGoP2BhSijgY6zLkcreSxI2PdLJ5oPtfi78BF_nHLw
CitedBy_id	crossref_primary_10_3390_e21111074 crossref_primary_10_1093_gigascience_giac079 crossref_primary_10_1007_s11227_016_1753_4 crossref_primary_10_1007_s13222_012_0098_2 crossref_primary_10_1136_amiajnl_2013_001694 crossref_primary_10_1093_gigascience_giaa119
Cites_doi	10.1093/bioinformatics/18.12.1696 10.1016/0306-4573(94)90014-0 10.1016/j.jtbi.2008.03.011 10.1093/nar/gkn942 10.1109/JRPROC.1952.273898 10.1093/nar/gkm929 10.1016/0300-9084(96)84763-8 10.1007/11496656_17 10.1186/1471-2105-9-176 10.1109/TIT.1978.1055934 10.1007/978-1-84800-072-8 10.1109/TIT.1977.1055714 10.1109/51.940049 10.1016/j.bulm.2004.10.005 10.1126/science.1093857 10.1016/S0378-4371(01)00661-6
ContentType	Journal Article
Copyright	The Author(s) 2009. Published by Oxford University Press. This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. The Author(s) 2009. Published by Oxford University Press. 2009
Copyright_xml	– notice: The Author(s) 2009. Published by Oxford University Press. This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. – notice: The Author(s) 2009. Published by Oxford University Press. 2009
DBID	AAYXX CITATION NPM K9. 7X8 7TM 5PM ADTOC UNPAY
DOI	10.1093/database/bap013
DatabaseName	CrossRef PubMed ProQuest Health & Medical Complete (Alumni) MEDLINE - Academic Nucleic Acids Abstracts PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef PubMed ProQuest Health & Medical Complete (Alumni) MEDLINE - Academic Nucleic Acids Abstracts
DatabaseTitleList	Nucleic Acids Abstracts MEDLINE - Academic PubMed ProQuest Health & Medical Complete (Alumni)
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1758-0463
EndPage	bap013
ExternalDocumentID	10.1093/database/bap013 PMC2797453 2696197311 20157486 10_1093_database_bap013
Genre	Journal Article
GroupedDBID	--- .I3 0R~ 18M 53G 5VS 5WA 70E AAHBH AAMVS AAPXW AAVAP AAYXX ABDBF ABEJV ABGNP ABPTD ABXVV ACGFO ACGFS ACPRK ACUHS ADBBV ADHZD ADRAZ AENZO AHMBA AIAGR ALMA_UNASSIGNED_HOLDINGS ALUQC AMNDL AOIJS BAWUL BAYMD BCNDV CIDKT CITATION CZ4 DIK D~K E3Z EBD EBS EMOBN ESX GROUPED_DOAJ GX1 H13 HYE HZ~ KSI M48 MK~ M~E O5R O5S OAWHX OJQWA OK1 O~Y P2P PEELM PQQKQ RD5 RPM RXO SV3 TOX TR2 TUS X7H ZBA ~91 ~D7 ~S- EJD NPM K9. 7X8 7TM 5PM ADTOC UNPAY
ID	FETCH-LOGICAL-c452t-34aa35330296dda4ec4b73054ce3da1cc292fa86e12d3b81b9f731bb2514834f3
IEDL.DBID	M48
ISSN	1758-0463
IngestDate	Wed Oct 29 11:22:15 EDT 2025 Tue Sep 30 16:39:56 EDT 2025 Wed Oct 01 14:20:16 EDT 2025 Fri Jul 11 08:50:19 EDT 2025 Tue Oct 07 06:07:22 EDT 2025 Thu Apr 03 06:59:15 EDT 2025 Tue Jul 01 04:03:36 EDT 2025 Thu Apr 24 23:11:10 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
License	http://creativecommons.org/licenses/by-nc/2.5/uk This is Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. cc-by
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c452t-34aa35330296dda4ec4b73054ce3da1cc292fa86e12d3b81b9f731bb2514834f3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.1093/database/bap013
PMID	20157486
PQID	1022142309
PQPubID	135335
ParticipantIDs	unpaywall_primary_10_1093_database_bap013 pubmedcentral_primary_oai_pubmedcentral_nih_gov_2797453 proquest_miscellaneous_21442526 proquest_miscellaneous_1859588371 proquest_journals_1022142309 pubmed_primary_20157486 crossref_citationtrail_10_1093_database_bap013 crossref_primary_10_1093_database_bap013
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2009-01-01 2009-00-00 20090101
PublicationDateYYYYMMDD	2009-01-01
PublicationDate_xml	– month: 01 year: 2009 text: 2009-01-01 day: 01
PublicationDecade	2000
PublicationPlace	England
PublicationPlace_xml	– name: England – name: Oxford
PublicationTitle	Database : the journal of biological databases and curation
PublicationTitleAlternate	Database (Oxford)
PublicationYear	2009
Publisher	Oxford University Press
Publisher_xml	– name: Oxford University Press
References	Benson ( key 20180618194039_B2) 2008; 36 Salomon ( key 20180618194039_B7) 2008 Behzadi ( key 20180618194039_B9) 2005 Milosavljević ( key 20180618194039_B12) 1993; 1 Galperin ( key 20180618194039_B1) 2009; 37 Rivals ( key 20180618194039_B13) 1996; 78 Bonanno ( key 20180618194039_B18) 2002; 305 Grumbach ( key 20180618194039_B10) 1993 Chen ( key 20180618194039_B16) 2001; 20 Menconi ( key 20180618194039_B20) 2008; 253 Ziv ( key 20180618194039_B5) 1977; 23 Cherniavski ( key 20180618194039_B14) 2004 Grumbach ( key 20180618194039_B11) 1994; 30 Hoebeke ( key 20180618194039_B3) 2005 Williams ( key 20180618194039_B4) Huffman ( key 20180618194039_B8) 1952; 40 Menconi ( key 20180618194039_B19) 2005; 67 Chen ( key 20180618194039_B17) 2002; 18 Ziv ( key 20180618194039_B6) 1978; 24 Liu ( key 20180618194039_B15) 2008; 9 Venter ( key 20180618194039_B21) 2004; 304 12490460 - Bioinformatics. 2002 Dec;18(12):1696-8 11494771 - IEEE Eng Med Biol Mag. 2001 Jul-Aug;20(4):61-6 18073190 - Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30 18373878 - BMC Bioinformatics. 2008 Mar 31;9:176 15001713 - Science. 2004 Apr 2;304(5667):66-74 8905150 - Biochimie. 1996;78(5):315-22 18430439 - J Theor Biol. 2008 Jul 21;253(2):281-8 7584347 - Proc Int Conf Intell Syst Mol Biol. 1993;1:284-91 19033364 - Nucleic Acids Res. 2009 Jan;37(Database issue):D1-4 15893551 - Bull Math Biol. 2005 Jul;67(4):737-59
References_xml	– volume: 18 start-page: 1696 year: 2002 ident: key 20180618194039_B17 article-title: DNACompress: fast and effective DNA sequence compression publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.12.1696 – volume: 30 start-page: 875 year: 1994 ident: key 20180618194039_B11 article-title: A new challenge for compression algorithms: genetic sequences publication-title: J. Info. Process. Manag. doi: 10.1016/0306-4573(94)90014-0 – volume: 1 start-page: 284 year: 1993 ident: key 20180618194039_B12 article-title: Discovering sequence similarity by the algorithmic significance method publication-title: Proc. Int. Conf. Intell. Syst. Mol. Biol. – start-page: 340 volume-title: Proceedings of IEEE Symposium on Data Compression year: 1993 ident: key 20180618194039_B10 article-title: Compression of DNA sequences – volume: 253 start-page: 281 year: 2008 ident: key 20180618194039_B20 article-title: Data compression and genomes: a two dimensional life domain map publication-title: J. Theoret. Biol. doi: 10.1016/j.jtbi.2008.03.011 – volume: 37 start-page: D1 year: 2009 ident: key 20180618194039_B1 article-title: Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkn942 – volume: 40 start-page: 1098 year: 1952 ident: key 20180618194039_B8 article-title: A method for the construction of minimum-redundancy codes publication-title: Proc. IRE doi: 10.1109/JRPROC.1952.273898 – volume: 36 start-page: D25 year: 2008 ident: key 20180618194039_B2 article-title: GenBank publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkm929 – volume: 78 start-page: 315 year: 1996 ident: key 20180618194039_B13 article-title: Compression and genetic sequence analysis publication-title: Biochimie doi: 10.1016/0300-9084(96)84763-8 – start-page: 190 volume-title: Symposium on Combinatorial Pattern Matching (CPM'2005) year: 2005 ident: key 20180618194039_B9 article-title: DNA compression challenge revisited doi: 10.1007/11496656_17 – volume: 9 start-page: 176 year: 2008 ident: key 20180618194039_B15 article-title: RNACompress: grammar-based compression and informational complexity measurement of RNA secondary structure publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-9-176 – volume: 24 start-page: 530 year: 1978 ident: key 20180618194039_B6 article-title: Compression of individual sequences via variable-rate coding publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.1978.1055934 – volume-title: Concise Introduction to Data Compression. year: 2008 ident: key 20180618194039_B7 doi: 10.1007/978-1-84800-072-8 – volume: 23 start-page: 337 year: 1977 ident: key 20180618194039_B5 article-title: A universal algorithm for sequential data compression publication-title: IEEE Trans. Inform. Theory doi: 10.1109/TIT.1977.1055714 – start-page: 1 volume-title: Database Annotation in Molecular Biology year: 2005 ident: key 20180618194039_B3 article-title: Annotation and databases: status and prospects – volume: 20 start-page: 61 year: 2001 ident: key 20180618194039_B16 article-title: A compression algorithm for DNA sequences publication-title: IEEE Eng. Med. Biol. Mag. doi: 10.1109/51.940049 – ident: key 20180618194039_B4 – volume: 67 start-page: 737 year: 2005 ident: key 20180618194039_B19 article-title: Sublinear growth of information in DNA sequences publication-title: Bulletin Math. Biol. doi: 10.1016/j.bulm.2004.10.005 – volume-title: Computer Science & Engineering Technical Report year: 2004 ident: key 20180618194039_B14 article-title: Grammar-based compression of DNA sequences – volume: 304 start-page: 66 year: 2004 ident: key 20180618194039_B21 article-title: Environmental genome shotgun sequencing of the Sargasso Sea publication-title: Science doi: 10.1126/science.1093857 – volume: 305 start-page: 196 year: 2002 ident: key 20180618194039_B18 article-title: Information of sequences and applications publication-title: Physica A doi: 10.1016/S0378-4371(01)00661-6 – reference: 18430439 - J Theor Biol. 2008 Jul 21;253(2):281-8 – reference: 18373878 - BMC Bioinformatics. 2008 Mar 31;9:176 – reference: 15001713 - Science. 2004 Apr 2;304(5667):66-74 – reference: 19033364 - Nucleic Acids Res. 2009 Jan;37(Database issue):D1-4 – reference: 7584347 - Proc Int Conf Intell Syst Mol Biol. 1993;1:284-91 – reference: 12490460 - Bioinformatics. 2002 Dec;18(12):1696-8 – reference: 18073190 - Nucleic Acids Res. 2008 Jan;36(Database issue):D25-30 – reference: 11494771 - IEEE Eng Med Biol Mag. 2001 Jul-Aug;20(4):61-6 – reference: 8905150 - Biochimie. 1996;78(5):315-22 – reference: 15893551 - Bull Math Biol. 2005 Jul;67(4):737-59
SSID	ssj0067884
Score	1.8368045
Snippet	While modern hardware can provide vast amounts of inexpensive storage for biological databases, the compression of nucleotide sequence data is still of...
SourceID	unpaywall pubmedcentral proquest pubmed crossref
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	bap013
SubjectTerms	Algorithms Genetics Original Studies
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2VrRAnWr6DChiJAxyS1l-Jw60qVBUSVQ-sVA4ochy7XRGSVckKtQd-O-ONk7KsKsSBY-RxZDvj8XP8_AbglaKuzDh1sbIpblBypTEOahZT66hRlSnpUkzn43F6NBUfTuXpBnwZ7sLowApPhisNnijpA_puGMl4Xrlr2YGcXxuUeo54ZpemGVMyzcJzgva3YDOVCNUnsDk9Ptn_vLwkKZWnNfJB7mf9Tasr1Rr8XGdR3lk0c335Q9f1b0vU4Rb8HDrXM1O-JouuTMzVH7qP_63323A3gFuy37_lHmzY5j7c7tNdXj6Ak3chGwtGlZr0iykxrV893xJNPLu9Z-U2RNdn7cWsO_9GEFSTxosut92ssmSgfhPfpIcwPXz_6eAoDhkdYiMk62IutOaez8rytKq0sEagpyBqNJZXmhrDcua0Si1lFS8RUecOHaksEYT5n56OP4JJ0zb2CRDlZCZctaclRhShnXKMZVhFVJllTqgIkuHrFSbInfusG3XRH7vzYhi6oh-qCF6PFea90sfNpjuDOxRhyn8v_NaZIjjdyyN4ORbjZPUnMLqx7QJtvJqcUjyjEby4wcZr2DHJ0gge9_41tgaxGvZZYUm24nmjgZcKXy1pZudLyXCW4b5RYsPfjD76t04-_QfbHZh0Fwv7DPFZVz4Ps-wXGmpCLw priority: 102 providerName: Unpaywall
Title	Differential direct coding: a compression algorithm for nucleotide sequence data
URI	https://www.ncbi.nlm.nih.gov/pubmed/20157486 https://www.proquest.com/docview/1022142309 https://www.proquest.com/docview/1859588371 https://www.proquest.com/docview/21442526 https://pubmed.ncbi.nlm.nih.gov/PMC2797453 https://academic.oup.com/database/article-pdf/doi/10.1093/database/bap013/16728567/bap013.pdf
UnpaywallVersion	publishedVersion
Volume	2009
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: DOA dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVEBS databaseName: Academic Search Ultimate - eBooks customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: ABDBF dateStart: 20090101 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: DIK dateStart: 20090101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: GX1 dateStart: 20090101 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVFQY databaseName: GFMER Free Medical Journals customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: GX1 dateStart: 0 isFulltext: true titleUrlDefault: http://www.gfmer.ch/Medical_journals/Free_medical.php providerName: Geneva Foundation for Medical Education and Research – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: M~E dateStart: 20090101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: RPM dateStart: 20090101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVASL databaseName: Oxford Journals Free Titles 2012-2013 - NESLI2 customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: 70E dateStart: 0 isFulltext: true titleUrlDefault: https://academic.oup.com/journals providerName: Oxford University Press – providerCode: PRVASL databaseName: Oxford Journals Open Access Collection customDbUrl: eissn: 1758-0463 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: TOX dateStart: 20090101 isFulltext: true titleUrlDefault: https://academic.oup.com/journals/ providerName: Oxford University Press – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 1758-0463 dateEnd: 20250131 omitProxy: true ssIdentifier: ssj0067884 issn: 1758-0463 databaseCode: M48 dateStart: 20090101 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwEB6hVgguFeUZaBcjcYBD2rXjJA4SQhVQKqRWPbDScoocx6YrpcmyZAX775lZJ6HVUiFxycW24szDM-OZfAPwUnFXpBF3obIJBiiZ0ngOahFy67hRpSn4Gkzn9Cw5mcjP03j6px1QR8Affw3tqJ_UZFEd_Pq-eocK_7YDQzqkWko68w8LPR9TB9ttNFMZ9XE4lUNKAQ9lJXtsn81FBAqMljGV9FP1VQu14XZuVk_eWdZzvfqpq-qKaTq-BzudT8mOvBDswi1b34fbvsvk6gGcf-iaoKAyV8zbMGYaMlpvmGZUVO6LYWumq2_NYtZeXDL0ZVlNWMdNOyst6yuuGX3QQ5gcf_zy_iTsGimERsaiDSOpdURlpCJLylJLayQyCJ01Y6NSc2NEJpxWieWijAp0ZDOH_CsK9H3ortFFj2Crbmr7BJhySCNXjnWMiiy1U06IFJfIMrXCSRXAQU-83HQo49Tsosp9tjvKe8LnnvABvBoWzD3Axs1T93pu5L2g5BSxcvQJx1kAL4Zh1BFKfOjaNkucQyBuCkNxHsDzG-YQdJyIRRLAY8_eYTe9XASQXmP8MIEQuq-P1LOLNVK3SDFci3HjrwcR-ddHPv3vtzyDuz7VRfdDe7DVLpZ2Hz2mthitbxrw-WnKR2utGMH25Oz86Otv49Mgjg
linkProvider	Scholars Portal
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB2VrRAnWr6DChiJAxyS1l-Jw60qVBUSVQ-sVA4ochy7XRGSVckKtQd-O-ONk7KsKsSBY-RxZDvj8XP8_AbglaKuzDh1sbIpblBypTEOahZT66hRlSnpUkzn43F6NBUfTuXpBnwZ7sLowApPhisNnijpA_puGMl4Xrlr2YGcXxuUeo54ZpemGVMyzcJzgva3YDOVCNUnsDk9Ptn_vLwkKZWnNfJB7mf9Tasr1Rr8XGdR3lk0c335Q9f1b0vU4Rb8HDrXM1O-JouuTMzVH7qP_63323A3gFuy37_lHmzY5j7c7tNdXj6Ak3chGwtGlZr0iykxrV893xJNPLu9Z-U2RNdn7cWsO_9GEFSTxosut92ssmSgfhPfpIcwPXz_6eAoDhkdYiMk62IutOaez8rytKq0sEagpyBqNJZXmhrDcua0Si1lFS8RUecOHaksEYT5n56OP4JJ0zb2CRDlZCZctaclRhShnXKMZVhFVJllTqgIkuHrFSbInfusG3XRH7vzYhi6oh-qCF6PFea90sfNpjuDOxRhyn8v_NaZIjjdyyN4ORbjZPUnMLqx7QJtvJqcUjyjEby4wcZr2DHJ0gge9_41tgaxGvZZYUm24nmjgZcKXy1pZudLyXCW4b5RYsPfjD76t04-_QfbHZh0Fwv7DPFZVz4Ps-wXGmpCLw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Differential+direct+coding%3A+a+compression+algorithm+for+nucleotide+sequence+data&rft.jtitle=Database+%3A+the+journal+of+biological+databases+and+curation&rft.au=Vey%2C+Gregory&rft.date=2009&rft.eissn=1758-0463&rft.volume=2009&rft.spage=bap013&rft_id=info:doi/10.1093%2Fdatabase%2Fbap013&rft_id=info%3Apmid%2F20157486&rft.externalDocID=20157486
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1758-0463&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1758-0463&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1758-0463&client=summon