The Genome of the Sea Urchin Strongylocentrotus purpuratus
Sea Urchin Genome Sequencing Consortium:
Erica Sodergren,1,2
George M. Weinstock,1,2*
Eric H Davidson,3
R. Andrew Cameron,3
Richard A. Gibbs,1,2
Robert C. Angerer,4
Lynne M. Angerer,4
Maria Ina Arnone,5
David R. Burgess,6
Robert D. Burke,7
James A. Coffman,8
Michael Dean,9
Maurice R. Elphick,10
Charles A. Ettensohn,11
Kathy R. Foltz,12
Amro Hamdoun,13
Richard O. Hynes,14
William H. Klein,15
William Marzluff,16
David R. McClay,17
Robert L. Morris,18
Arcady Mushegian,19,20
Jonathan P. Rast,21
L. Courtney Smith,22
Michael C. Thorndyke,23
Victor D. Vacquier,24
Gary M. Wessel,25
Greg Wray,26
Lan Zhang,1,2
Christine G. Elsik,27
Olga Ermolaeva,28
Wratko Hlavina,28
Gretchen Hofmann,29
Paul Kitts,28
Melissa J. Landrum,28
Aaron J. Mackey,30
Donna Maglott,28
Georgia Panopoulou,31
Albert J. Poustka,31
Kim Pruitt,28
Victor Sapojnikov,28
Xingzhi Song,1,2
Alexandre Souvorov,28
Victor Solovyev,32
Zheng Wei,4
Charles A. Whittaker,33
Kim Worley,1,2
K. James Durbin,1,2
Yufeng Shen,1,2
Olivier Fedrigo,26
David Garfield,26
Ralph Haygood,17
Alexander Primus,26
Rahul Satija,26
Tonya Severson,26
Manuel L. Gonzalez-Garay,1,2
Andrew R. Jackson,1,2
Aleksandar Milosavljevic,1,2
Mark Tong,1,2
Christopher E. Killian,34
Brian T. Livingston,35
Fred H. Wilt,34
Nikki Adams,35
Robert Bellé,36,37
Seth Carbonneau,8
Rocky Cheung,16
Patrick Cormier,36,37
Bertrand Cosson,36,37
Jenifer Croce,17
Antonio Fernandez-Guerra,38,39
Anne-Marie Genevière,38,39
Manisha Goel,19
Hemant Kelkar,40
Julia Morales,36,37
Odile Mulner-Lorillon,36,37
Anthony J. Robertson,8
Jared V. Goldstone,41
Bryan Cole,13
David Epel,13
Bert Gold,9
Mark E. Hahn,42
Meredith Howard-Ashby,3
Mark Scally,9
John J. Stegeman,41
Erin L. Allgood,18
Jonah Cool,18
Kyle M. Judkins,18
Shawn S. McCafferty,18
Ashlan M. Musante,18
Robert A. Obar,42
Amanda P. Rawson,18
Blair J. Rossetti,18
Ian R. Gibbons,43
Matthew P. Hoffman,6
Andrew Leone,6
Sorin Istrail,44
Stefan C. Materna,3
Manoj P. Samanta,45,46
Viktor Stolc,45
Waraporn Tongprasit,45
Qiang Tu,3
Karl-Frederik Bergeron,47
Bruce P. Brandhorst,48
James Whittle,49
Kevin Berney,3
David J. Bottjer,50
Cristina Calestani,51
Kevin Peterson,52
Elly Chow,53
Qiu Autumn Yuan,53
Eran Elhaik,54
Dan Graur,54
Justin T. Reese,27
Ian Bosdet,55
Shin Heesun,55
Marco A. Marra,55
Jacqueline Schein,55
Michele K. Anderson,56
Virginia Brockton,22
Katherine M. Buckley,22
Avis H. Cohen,57
Sebastian D. Fugmann,58
Taku Hibino,21
Mariano Loza-Coll,21
Audrey J. Majeske,22
Cynthia Messier,21
Sham V. Nair,59
Zeev Pancer,60
David P. Terwilliger,22
Cavit Agca,61
Enrique Arboleda,5
Nansheng Chen,48
Allison M. Churcher,62
F. Hallböök,63
Glen W. Humphrey,64
Mohammed M. Idris,5
Takae Kiyama,15
Shuguang Liang,15
Dan Mellott,60
Xiuqian Mu,15
Greg Murray,46
Robert P. Olinski,63
Florian Raible,65,66
Matthew Rowe,10
John S. Taylor,62
Kristin Tessmar-Raible,65
D. Wang,62
Karen H. Wilson,23
Shunsuke Yaguchi,7
Terry Gaasterland,24
Blanca E. Galindo,67
Herath J. Gunaratne,24
Celina Juliano,25
Masashi Kinukawa,24
Gary W. Moy,24
Anna T. Neill,24
Mamoru Nomura,24
Michael Raisch,12
Anna Reade,12
Michelle M. Roux,12
Jia L. Song,25
Yi-Hsien Su,3
Ian K. Townley,12
Ekaterina Voronina,25
Julian L. Wong,25
Gabriele Amore,5
Margherita Branno,5
Euan R. Brown,5
Vincenzo Cavalieri,68
Véronique Duboc,69
Louise Duloquin,69
Constantin Flytzanis,70,71
Christian Gache,69
François Lapraz,69
Thierry Lepage,69
Annamaria Locascio,5
Pedro Martinez,72,73
Giorgio Matassi,74
Valeria Matranga,75
Ryan Range,69
Francesca Rizzo,5
Eric Röttinger,69
Wendy Beane,17
Cynthia Bradham,17
Christine Byrum,17,76
Tom Glenn,17
Sofia Hussain,77
Gerard Manning,78
Esther Miranda,17
Rebecca Thomason,17,76
Katherine Walton,17
Athula Wikramanayke,76
Shu-Yu Wu,17
Ronghui Xu,76
C. Titus Brown,3
Lili Chen,3
Rachel F. Gray,3
Pei Yun Lee,3
Jongmin Nam,3
Paola Oliveri,3
Joel Smith,3
Donna Muzny,1,2
Stephanie Bell,1,2
Joseph Chacko,1,2
Andrew Cree,1,2
Stacey Curry,1,2
Clay Davis,1,2
Huyen Dinh,1,2
Shannon Dugan-Rocha,1,2
Jerry Fowler,1,2
Rachel Gill,1,2
Cerrissa Hamilton,1,2
Judith Hernandez,1,2
Sandra Hines,1,2
Jennifer Hume,1,2
LaRonda Jackson,1,2
Angela Jolivet,1,2
Christie Kovar,1,2
Sandra Lee,1,2
Lora Lewis,1,2
George Miner,1,2
Margaret Morgan,1,2
Lynne V. Nazareth,1,2
Geoffrey Okwuonu,1,2
David Parker,1,2
Ling-Ling Pu,1,2
Rachel Thorn,1,2
Rita Wright1,2
Abstract:
We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.
1 Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
2 Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
3 Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA.
4 National Institute of Dental and Craniofacial Research, National Institutes of Health (NIH), Bethesda, MD 20892, USA.
5 Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy.
6 Department of Biology, Boston College, Chestnut Hill, MA 02467, USA.
7 Department of Biology, Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada, V8W 3N5.
8 Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA.
9 Human Genetics Section, Laboratory of Genomic Diversity, National Cancer InstituteFrederick, Frederick, MD 21702, USA.
10 School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, UK.
11 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.
12 Department Molecular, Cellular and Developmental Biology and the Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA 931069610, USA.
13 Hopkins Marine Station, Stanford University, Pacific Grove, CA 93950, USA.
14 Howard Hughes Medical Institute, Center for Cancer Research, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA.
15 Departments of Biochemistry and Molecular Biology, University of Texas, M.D.Anderson Cancer Center, Houston, TX, 77030, USA.
16 Molecular Biology and Biotechnology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
17 Department of Biology, Duke University, Durham, NC 27708, USA.
18 Department of Biology, Wheaton College, Norton, MA 02766, USA.
19 Stowers Institute for Medical Research, Kansas City, MO 64110, USA.
20 Department of Microbiology, Kansas University Medical Center, Kansas City, KS 66160, USA.
21 Sunnybrook Research Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada, M4N 3M5.
22 Department of Biological Sciences, George Washington University, Washington, DC 20052, USA.
23 Royal Swedish Academy of Sciences, Kristineberg Marine Research Station, Fiskebackskil, 450 34, Sweden.
24 Marine Biology, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 920930202, USA.
25 Department of Molecular and Cellular Biology and Biochemistry, Brown University Providence, RI 02912, USA.
26 Department of Biology and Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA.
27 Department of Animal Science, Texas A&M University, College Station, TX 77843, USA.
28 National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA.
29 Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA.
30 Penn Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA.
31 Evolution and Development Group, Max-Planck Institut fuer Molekulare Genetik, 14195 Berlin, Germany.
32 Royal Holloway, University of London, Egham, Surrey TW20 0EX, UK.
33 Center for Cancer Research, MIT, Cambridge, MA 02139, USA.
34 Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 947203200, USA.
35 Biology Department, California Polytechnic State University, San Luis Obispo, CA 93407, USA.
36 Université Pierre et Marie Curie-Paris6, UMR 7150, Equipe Cycle Cellulaire et Développement, Station Biologique de Roscoff, 29682 Roscoff Cedex, France.
37 CNRS, UMR 7150, Station Biologique de Roscoff, 29682 Roscoff Cedex, France.
38 CNRS, UMR7628, Banyuls-sur-Mer, F-66650, France.
39 Université Pierre et Marie Curie-Paris 6, UMR7628, Banyuls-sur-Mer, F-66650, France.
40 Center for Bioinformatics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
41 Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA.
42 Tethys Research, LLC, 2115 Union Street, Bangor, Maine 04401, USA.
43 Department of Molecular, Cellular, and Developmental Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
44 Center for Computational Molecular Biology, and Computer Science Department, Brown University, Providence, RI 02912, USA.
45 Genome Research Facility, National Aeronautics and Space Administration, Ames Research Center, Moffet Field, CA 94035, USA.
46 Systemix Institute, Cupertino, CA 95014, USA.
47 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada, V5A 1S6.
48 Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada, V5A 1S6.
49 Department of Biology, Center for Cancer Research, MIT, Cambridge, MA 02139, USA.
50 Department of Earth Sciences, University of Southern California, Los Angeles, CA 900890740, USA.
51 Department of Biology, University of Central Florida, Orlando, FL 328162368, USA.
52 Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA.
53 Center for Computational Regulatory Genomics, Beckman Institute, California Institute of Technology, Pasadena, CA 91125, USA.
54 Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA.
55 Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada, V5Z 4E6.
56 Department of Immunology, University of Toronto, Toronto, Ontario, Canada, M4N 3M5.
57 Department of Biology and the Institute of Systems Research, University of Maryland, College Park, MD 20742, USA.
58 Laboratory of Cellular and Molecular Biology, National Institute on Aging, NIH, Baltimore, MD 21224, USA.
59 Department of Biological Sciences, Macquarie University, Sydney NSW 2109, Australia.
60 Center of Marine Biotechnology, UMBI, Columbus Center, Baltimore, MD 21202, USA.
61 Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA.
62 Department of Biology, University of Victoria, Victoria, BC, Canada, V8W 2Y2.
63 Department of Neuroscience, Uppsala University, Uppsala, Sweden.
64 Laboratory of Cellular and Molecular Biophysics, National Institute of Child Health and Development, NIH, Bethesda, MD 20895, USA.
65 Developmental Unit, EMBL, 69117 Heidelberg, Germany.
66 Computational Unit, EMBL, 69117 Heidelberg, Germany.
67 Biotechnology Institute, Universidad Nacional Autónoma de Mexico (UNAM), Cuernavaca, Morelos, Mexico 62250.
68 Department of Cellular and Developmental Biology "Alberto Monroy," University of Palermo, 90146 Palermo, Italy.
69 Laboratoire de Biologie du Développement (UMR 7009), CNRS and Université Pierre et Marie Curie (Paris 6), Observatoire Océanologique, 06230 Villefranche-sur-Mer, France.
70 Department of Biology, University of Patras, Patras, Greece.
71 Department of Molecular and Cellular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
72 Departament de Genetica, Universitat de Barcelona, 08028Barcelona, Spain.
73 Institució Catalana de Recerca i Estudis Avancats (ICREA), Barcelona, Spain.
74 Institut Jacques Monod, CNR-UMR 7592, 75005 Paris, France.
75 Consiglio Nazionale delle Ricerche, Istituto di Biomedicina e Immunologia Molecolare "Alberto Monroy," 90146 Palermo, Italy.
76 Department of Zoology, University of Hawaii at Manoa, Honolulu, HI 96822, USA.
77 Department of Biology, University of South Florida, Tampa, FL 33618, USA.
78 Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, CA 92186, USA.

Present address: GlaxoSmithKline, 1250 South Collegeville Road,
Collegeville, PA 19426, USA.

Present address: Massachusetts General Hospital Cancer Center, Charlestown, MA 02129, USA. 
* To whom correspondence should be addressed. E-mail: gwstock{at}bcm.tmc.edu
The genome of the sea urchin was sequenced primarily because
of the remarkable usefulness of the echinoderm embryo as a research
model system for modern molecular, evolutionary, and cell biology.
The sea urchin is the first animal with a sequenced genome that
(i) is a free-living, motile marine invertebrate; (ii) has a
bilaterally organized embryo but a radial adult body plan; (iii)
has the endoskeleton and water vascular system found only in
echinoderms; and (iv) has a nonadaptive immune system that is
unique in the enormous complexity of its receptor repertoire.
Sea urchins are remarkably long-lived with life spans of
Strongylocentrotid species extending to over a century [see supporting online material
(SOM)] and highly fecund, producing millions of gametes each
year; and
Strongylocentrotus purpuratus is a pivotal component
of subtidal marine ecology and an important fishery catch in
several areas of the world, including the United States. Although
a research model in developmental biology for a century and
a half, for most of that time, few were aware of one of the
most important characteristics of sea urchins, a character that
directly enhances its significance for genomic analysis: Echinoderms
(and their sister phylum, the hemichordates) are the closest
known relatives of the chordates (
Fig. 1 and SOM). A description
of the echinoderm body plan, as well as aspects of the life-style,
longevity, polymorphic gene pool, and characteristics that make
the sea urchin so valuable as a research organism, are presented
in the SOM.
|
Fig. 1.. The phylogenetic position of the sea urchin relative to other model systems and humans. The chordates are shown on the darker blue background overlapping the deuterostomes as a whole on a lighter blue background. Organisms for which genome projects have been initiated or finished are shown across the top.
[View Larger Version of this Image (18K GIF file)]
|
|
The last common ancestors of the deuterostomal groups at the branch points shown in Fig. 1 are of Precambrian antiquity [>540 million years ago (Ma)], according to protein molecular phylogeny. Stem group echinoderms appear in the Lower Cambrian fossil assemblages dating to 520 Ma. Cambrian echinoderms came in many distinct forms, but from their first appearance, the fossil record illustrates certain distinctive features that are still present: their water vascular system, including rows of tube feet protruding through holes in the ambulacral grooves and their calcite endoskeleton (mainly, a certain form of CaCO3), which displays the specific three-dimensional structure known as "stereom." The species sequenced, Strongylocentrotus purpuratus, commonly known as the "California purple sea urchin" is a representative of the thin-spined "modern" group of regularly developing sea urchins (euechinoids). These evolved to become the dominant echinoid form after the great Permian-Triassic extinction 250 million years ago.
We present here a description of the S. purpuratus genome and gene products. The genome provides a wealth of discoveries about the biology of the sea urchin, Echinodermata, and the deuterostomes. Among the key findings are the following:
- The sea urchin is estimated to have 23,300 genes with representatives of nearly all vertebrate gene families, although often the families are not as large as in vertebrates.
- Some genes thought to be vertebrate-specific were found in the sea urchin (deuterostome-specific); others were identified in sea urchin but not the chordate lineage, which suggests loss in the vertebrates.
- Expansion of some gene families occurred apparently independently in the sea urchin and vertebrates.
- The sea urchin has a diverse and sophisticated immune system mediated by an astonishingly large repertoire of innate pathogen recognition proteins.
- An extensive defensome was identified.
- The sea urchin has orthologs of genes associated with vision, hearing, balance, and chemosensation in vertebrates, which suggests hitherto unknown sensory capabilities.
- Distinct genes for biomineralization exist in the sea urchin and vertebrates.
- Orthologs of many human diseaseassociated genes were found in the sea urchin.
Sequencing and Annotation of the S. purpuratus Genome Back to Top
Sequencing and assembly. Sperm from a single male was used to
prepare DNA for all libraries (tables S1 and S2) and whole-genome
shotgun (WGS) sequencing. The overall approach was based on
the "combined strategy" used for the rat genome (
1), where WGS
sequencing to six times coverage was combined with two times
sequence coverage of BAC clones from a minimal tiling path (MTP)
(fig. S1). The use of BACs provided a framework for localizing
the assembly process, which aided in the assembly of repeated
sequences and solved problems associated with the high heterozygosity
of the sea urchin genome, without our resorting to extremely
high coverage sequencing.
Several different assemblies were produced during the course of the project (see SOM for details). The Sea Urchin Genome Project (SUGP) was the first to produce both intermediate WGS assemblies and a final combined assembly. This was especially useful, not only for the early availability of an assembly for analysis, but also because WGS contigs were used to fill gaps between BACs in the combined assembly. The pure WGS assembly was produced (v 0.5 GenBank accession number range AAGJ01000001 to AAGJ01320773; also referred to as NCBI build 1.1) and released in April 2005. The final combined BAC-WGS assembly was released in July 2006 as version (v) 2.1 and submitted to GenBank (accession number range AAGJ02000001 to AAGJ02220581).
A second innovation in the SUGP was the use of the clone-array pooled shotgun sequencing (CAPSS) strategy (2) for BAC sequencing (fig. S2). The MTP consisted of 8248 BACs, and rather than prepare separate random libraries from each of these, the CAPSS strategy involved BAC shotgun sequencing from pools of clones and then deconvoluting the reads to the individual BACs. This allowed the BAC sequencing to be performed in 1/5th the time and at 1/10th the cost.
The principal new challenge in the SUGP was the high heterozygosity in the outbred animal that was sequenced. It was known that single-copy DNA in the sea urchin varied by as much as 4 to 5% [single nucleotide polymorphism (SNP) plus insertion/deletion (indel)], which is much greater than human (
0.5%) (3). Moreover, alignment of WGS reads to the early v 0.1 WGS assembly revealed at least one SNP per 100 bases, as well as a comparable frequency of indel variants. This average frequency of a mismatch per 50 bases or higher prevented merging by the assembly module in Atlas, the Phrap assembler, and also made it difficult to determine if reads were from duplicated but diverged sections of the genome or heterozygous homologs. This challenge was met by adding components to Atlas to handle local regions of heterozygosity and to take advantage of the BAC data, because each BAC sequence represented a single haplotype (see SOM). High heterozygosity has been seen in the past with the Ciona genomes (4, 5) and is likely to be the norm in the future as fewer inbred organisms are sequenced. Moreover, the CAPSS approach makes BAC sequencing more manageable for large genomes. Thus, the sea urchin project may serve as a paradigm for future difficult endeavors.
Combining the BAC-derived sequence with the WGS sequence generated a high-quality draft with 4 to 5% redundancy that covered more than 90% of the genome while sequencing to a level of 8x base coverage (table S2). The assembly size of 814 Mb is in good agreement with the previous estimate of genome size, 800 Mb ± 5% (6). The assembly is a mosaic of the two haplotypes, but it was possible to determine the phase of the BACs on the basis of how many mismatches neighboring BACs had in their overlap regions. This information will be used to create a future version of the genome in which the individual haplotypes are resolved.
Gene predictions. The v 0.5 WGS assembly displayed sufficient sequence continuity (a contig N50 of 9.1 kb) and higher-order organization (a scaffold N50 of 65.6 kb) to allow gene predictions to be produced and the annotation process to begin even while the BAC component was being sequenced. We generated an official gene set (OGS), consisting of
28,900 gene models, by merging four different sets of gene predictions with the GLEAN program (7) (see SOM for details). One of these gene sets, produced from the Ensembl gene prediction software, was created for both v 0.5 and v 2.0 assemblies.
To estimate the number of genes in the S. purpuratus genome, we began with the 28,900 gene models in the OGS and reduced this by the 5% redundancy found by mapping to the v 2.0 assembly, then increased it by a few percent for the new genes observed in the Ensembl set from the v 2.0 assembly compared with v 0.5. From manual analysis of well-characterized gene sets (e.g., ciliary, cell cycle control, and RNA metabolism genes), we estimated that, in addition to redundancy, another 25% of the genes in the OGS were fragments, pseudogenes, or otherwise not valid. Finally, whole-genome tiling microarray analysis (see below) showed 10% of the transcriptionally active regions (long open reading frames, not small RNAs) were not represented by genes in the OGS. Taken together, this analysis gave an estimate of about 23,300 genes for S. purpuratus. Information on all annotated genes can be found at (8).
The overall trends in gene structure were similar to those seen in the human genome. The statistics of the Ensembl predictions from the WGS assembly revealed an average of 8.3 exons and 7.3 introns per transcript (see SOM). The average gene length was 7.7 kb with an average primary transcript length of 8.9 kb. A broad distribution of all exon lengths peaked at around 100 to 115 nucleotides, whereas that for introns at around 750 nucleotides. The smaller average intron size relative to humans' was consistent with the trend that intron size is correlated with genome size.
Annotation process. Manual annotation and analysis of the OGS was performed by a group of over 200 international volunteers, primarily from the sea urchin research community. To facilitate and to centralize the annotation efforts, an annotation database and a shared Web browser, Genboree (9), were established at the BCM-HGSC. These tools enabled integrated and collaborative analysis of both precomputed and experimental information (see SOM). A variety of precomputed information for each predicted gene model was made available to the annotators in the browser, including expressed sequence tag (EST) data, the four unmerged gene prediction sets, and transcription data from whole-genome tiling microarray with embryonic RNA (see below) (10). Additional resources available to the community are listed in table S4.
Over 9000 gene models were manually curated by the consortium with 159 novel models (gene models not represented in the OGS) added to the official set. If we assume no bias in the curated gene models, the number of novel models added may imply that the official set contains >98% of the protein-coding genes.
Genome features. A window on the genetic landscape is scaffold-centric in S. purpuratus, because linkage and cytogenetic maps are not available. The 36.9% GC content of the genome is uniformly low because assessment of the average GC content by domains is consistent (36.8%), and the distribution is tight (see SOM). Genes from the OGS show no tendency to occupy regions of higher- or lower-than-average GC content. In fact, nearly all genes lie in regions of 35 to 39% GC.
The Echinoderm Genome in the Context of Metazoan Evolution Back to Top
The sea urchin genetic tool kit lends evolutionary perspective
to the gene catalogs that characterize the superclades of the
bilaterian animals. The distribution of highly conserved protein
domains and sequence motifs provides a view of the expansion
and contraction of gene families, as well as an insight into
changes in protein function. Examples are enumerated in
Table 1,
which presents a global overview of gene variety obtained by
comparing sequences identified in Interpro, and
Table 2, which
shows the distribution of specific Pfam database domains associated
with selected aspects of cell physiology, including sequences
identified in the cnidarian
Nematostella vectensis (
11). The
Interpro data suggest that about one-third of the 50 most prevalent
domains in the sea urchin gene models are not in the 50 most
abundant families in the other representative genomes (mouse,
tunicate, fruit fly, and nematode), and thus, they constitute
expansions that are specific at least to sea urchins, if not
to the complex of echinoderms and hemichordates. Two of the
most abundant domains make up 3% of the total and mark genes
that are involved in the innate immune response. Others define
proteins associated with apoptosis and cell death regulation,
as well as proteins that serve as downstream effectors in the
Tollinterleukin 1 (IL-1) receptor (TIR) cascade. The
quinoprotein amine dehydrogenase domain seen in the sea urchin
set is 10 times as abundant as in other representative genomes
and may be used in the systems of quinone-containing pigments
known to occur in these marine animals. The large number of
nucleosomal histone domains found agrees with the long-established
sea urchinspecific expansion of histone genes. In summary,
the distribution of proteins among these conserved families
shows the trend of expansion and shrinkage of the preexisting
protein families, rather than frequent gene innovation or loss.
Gene family sizes in the sea urchin are more closely correlated
with what is seen in deuterostomes than what is seen in the
protostomes.
Table 1.. Unique aspects of gene family distribution in sea urchin: Selected examples of the frequency of Interpro domains in the proteome of selected species. ID is the identification number used in the INTERPRO database; the second column shows the name given to the domain or motif family in the database. Species abbreviations: Sp, Strongylocentrotus purpuratus; Mm, Mus musculus; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ce, Caenorhabditis elegans.
| ID |
Name |
Species, total number (percentage of total matches)
|
| Sp |
Mm |
Ci |
Dm |
| |