Identification and characterization of pseudogenes in the rice gene complement

Background The Osa1 Genome Annotation of rice ( Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1...

Full description

Saved in:

Bibliographic Details
Published in	BMC genomics Vol. 10; no. 1; p. 317
Main Authors	Thibaud-Nissen, Françoise, Ouyang, Shu, Buell, C Robin
Format	Journal Article
Language	English
Published	London BioMed Central 16.07.2009 BioMed Central Ltd BMC
Subjects	Animal Genetics and Genomics Automation Biomedical and Life Sciences DNA, Plant - genetics Gene expression Genes, Plant Genetic aspects Genetics Genome, Plant Genomes Genomics Life Sciences Microarrays Microbial Genetics and Genomics Mutation Open Reading Frames Oryza - genetics Physiological aspects Plant genetics Plant Genetics and Genomics Proteomics Pseudogenes Research Article Rice Sequence Alignment Sequence Analysis, DNA Standard deviation United States GOSlim Term Massively Parallel Signature Sequencing Data Massively Parallel Signature Sequencing Duplicate Region Paralogous Family
Online Access	Get full text
ISSN	1471-2164 1471-2164
DOI	10.1186/1471-2164-10-317

Cover

More Information
Summary:	Background The Osa1 Genome Annotation of rice ( Oryza sativa L. ssp. japonica cv. Nipponbare) is the product of a semi-automated pipeline that does not explicitly predict pseudogenes. As such, it is likely to mis-annotate pseudogenes as functional genes. A total of 22,033 gene models within the Osa1 Release 5 were investigated as potential pseudogenes as these genes exhibit at least one feature potentially indicative of pseudogenes: lack of transcript support, short coding region, long untranslated region, or, for genes residing within a segmentally duplicated region, lack of a paralog or significantly shorter corresponding paralog. Results A total of 1,439 pseudogenes, identified among genes with pseudogene features, were characterized by similarity to fully-supported gene models and the presence of frameshifts or premature translational stop codons. Significant difference in the length of duplicated genes within segmentally-duplicated regions was the optimal indicator of pseudogenization. Among the 816 pseudogenes for which a probable origin could be determined, 75% originated from gene duplication events while 25% were the result of retrotransposition events. A total of 12% of the pseudogenes were expressed. Finally, F-box proteins, BTB/POZ proteins, terpene synthases, chalcone synthases and cytochrome P450 protein families were found to harbor large numbers of pseudogenes. Conclusion These pseudogenes still have a detectable open reading frame and are thus distinct from pseudogenes detected within intergenic regions which typically lack definable open reading frames. Families containing the highest number of pseudogenes are fast-evolving families involved in ubiquitination and secondary metabolism.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2164 1471-2164
DOI:	10.1186/1471-2164-10-317