Journal Article A pipeline for the systematic identification of non-redundant full-ORF cDNAs for polymorphic and evolutionary divergent genomes: Application to the ascidian Ciona intestinalis

Gilchrist, Michael J.  ,  Sobral, Daniel  ,  Khoueiry, Pierre  ,  Daian, Fabrice  ,  Laporte, Batiste  ,  Patrushev, Ilya  ,  Matsumoto, Jun  ,  Dewar, Ken  ,  Hastings, Kenneth E M  ,  Satou, Yutaka  ,  Lemaire, Patrick  ,  Rothbächer, Ute

404 ( 2 )  , pp.149 - 163 , 2015-08-15 , Elsevier Inc.
Genome-wide resources, such as collections of cDNA clones encoding for complete proteins (full-ORF clones), are crucial tools for studying the evolution of gene function and genetic interactions. Non-model organisms, in particular marine organisms, provide a rich source of functional diversity. Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes. The construction of full-ORF clone collections from non-model organisms is hindered by the difficulty of predicting accurately the N-terminal ends of proteins, and distinguishing recent paralogs from highly polymorphic alleles. We report a computational strategy that overcomes these difficulties, and allows for accurate gene level clustering of transcript data followed by the automated identification of full-ORFs with correct 5'- and 3'-ends. It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms. We developed this pipeline for the ascidian Ciona intestinalis, a highly polymorphic member of the divergent sister group of the vertebrates, emerging as a powerful model organism to study chordate gene function, Gene Regulatory Networks and molecular mechanisms underlying human pathologies. Using this pipeline we have generated the first full-ORF collection for a highly polymorphic marine invertebrate. It contains 19, 163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.

Number of accesses :  

Other information