Research ArticleEVOLUTION

Genomic Survey of Premetazoans Shows Deep Conservation of Cytoplasmic Tyrosine Kinases and Multiple Radiations of Receptor Tyrosine Kinases

See allHide authors and affiliations

Science Signaling  01 May 2012:
Vol. 5, Issue 222, pp. ra35
DOI: 10.1126/scisignal.2002733


The evolution of multicellular metazoans from a unicellular ancestor is one of the most important advances in the history of life. Protein tyrosine kinases play important roles in cell-to-cell communication, cell adhesion, and differentiation in metazoans; thus, elucidating their origins and early evolution is crucial for understanding the origin of metazoans. Although tyrosine kinases exist in choanoflagellates, few data are available about their existence in other premetazoan lineages. To unravel the origin of tyrosine kinases, we performed a genomic and polymerase chain reaction (PCR)–based survey of the genes that encode tyrosine kinases in the two described filasterean species, Capsaspora owczarzaki and Ministeria vibrans, the closest relatives to the Metazoa and Choanoflagellata clades. We present 103 tyrosine kinase–encoding genes identified in the whole genome sequence of C. owczarzaki and 15 tyrosine kinase–encoding genes cloned by PCR from M. vibrans. Through detailed phylogenetic analysis, comparison of the organizations of the protein domains, and resequencing and revision of tyrosine kinase sequences previously found in some whole genome sequences, we demonstrate that the basic repertoire of metazoan cytoplasmic tyrosine kinases was established before the divergence of filastereans from the Metazoa and Choanoflagellata clades. In contrast, the receptor tyrosine kinases diversified extensively in each of the filasterean, choanoflagellate, and metazoan clades. This difference in the divergence patterns between cytoplasmic tyrosine kinases and receptor tyrosine kinases suggests that receptor tyrosine kinases that had been used for receiving environmental cues were subsequently recruited as a communication tool between cells at the onset of metazoan multicellularity.


Multicellularity evolved several times in eukaryotes (13). Of particular interest is the multicellular system of metazoans, composed of highly specialized cells that perform coordinated interaction and communication, which leads to various highly complex and mobile forms. Although it has been hypothesized that the evolution of metazoan multicellularity was driven by the emergence of genes whose products were involved in cell adhesion, cell differentiation, and cell-to-cell communication, it is now evident that several of these key genetic modifications occurred well before the origin of animals (2, 48).

Many multicellular-specific functions, including cell-to-cell communication and control of cell proliferation and differentiation, are served by protein tyrosine kinases (TKs or PTKs) (911). TKs are divided into two types: receptor TKs (RTKs) and nonreceptor or cytoplasmic TKs (CTKs). RTKs mostly receive their specific ligands through their extracellular domains and initiate signal transduction cascades that are mediated by phosphorylated tyrosine residues, whereas CTKs act within cells and transmit the phosphotyrosine signals initiated by receptors (9, 10). One of the most remarkable features of TKs is their high degree of structural diversity. Most TKs are made up of multiple protein domains and motifs other than the catalytic (or kinase) domain (KD). Such divergent architectures are considered to have been generated by gene duplication and domain shuffling (9, 10, 12). The extracellular regions of RTKs, which are composed of various domains and motifs, are thought to bind to specific ligands directly, and their expression is mostly restricted to specific cell types in the organism (10). Moreover, KDs show an increased versatility (that is, they are frequently combined with other domains), especially in the Metazoa (1315). Thus, the diversity of TKs, which consist of approximately 30 families [which correspond to subfamilies in some of our previous publications (1619)] with different protein domain organizations, may reflect the complexity of the phosphotyrosine-based signaling system in metazoans.

TKs are members of the eukaryotic protein kinase (ePK) superfamily and are likely to have evolved from one of the ancestral ePK groups (2022). TKs phosphorylate substrates only on tyrosine residues, whereas other ePKs phosphorylate mostly on serine and threonine residues and are thus classified as serine-threonine kinases (STKs) (21). Although some STKs can also phosphorylate tyrosine residues (23), the TK group is discriminated from other groups of ePKs by their overall sequence similarity and by the presence of a characteristic catalytic loop motif (24).

The repertoire of CTK and RTK families is highly conserved across all metazoans, including sponges (17, 18, 25). In accordance with their substantial involvement in intercellular communication and the control of cell proliferation and differentiation, TKs had been thought to be exclusive to metazoans (21); however, this notion was dispelled by the discovery of an elaborate collection of TKs in the unicellular and colonial choanoflagellates (7, 19, 24, 2628), which are the closest relatives to the Metazoa. Moreover, previous reports had suggested the presence of TKs in some non-opisthokont lineages, including the amoebozoans Dictyostelium discoideum and Entamoeba histolytica, the green alga Chlamydomonas reinhardtii, and the oomycete Phytophthora infestans (Fig. 1) (29, 30), although they are far less numerous than those of metazoans and choanoflagellates. These findings prompted us to explore whether a TK-based signaling system is also found in the Filasterea, the sister group to the Metazoa and Choanoflagellata (MeCh) (Fig. 1) (4, 5). The Filasterea consist of only two known species: Capsaspora owczarzaki, a symbiotic amoeba that dwells in the snail Biomphalaria glabrata (31), and Ministeria vibrans, a free-living marine protist (4, 32). Although several putative TKs have been reported in expressed sequence tag (EST) data from M. vibrans (4), a comprehensive genomic analysis of filasterean TKs has not yet been conducted.

Fig. 1

Schematic phylogenetic tree of eukaryotes depicting the presence of group A and B TKs. A widely accepted consensus phylogeny (32, 55) based on phylogenomic studies is shown. The presence (circle) and absence (x) of the group A and B TKs are shown on the right. Representative genera used in this study are shown after the clade names. The holozoan TKs (group A) most likely originated from the group B TKs. The asterisk indicates that the genome sequencing of ichthyosporeans is ongoing (36).

Here, we analyzed the whole genome sequence of C. owczarzaki and performed targeted polymerase chain reaction (PCR)–based cloning of M. vibrans complementary DNAs (cDNAs) to provide a picture of the earlier stage of TK evolution focused on these unicellular relatives of metazoans. Through comprehensive phylogenetic approaches including two filastereans and other premetazoans, together with a detailed comparison of their protein domain organizations, we have elucidated in detail the early evolution of TK diversity. Our data show very different divergence patterns between CTKs and RTKs. The basic repertoire of CTKs had already been established before the separation of filastereans and the MeCh, whereas the RTKs show an extensive and independent diversification in each of the filasterean, choanoflagellate, and metazoan clades.


Diversity of TKs in the Filasterea

To better understand the early evolution of TKs, we conducted a genomic and PCR-based survey for TK-encoding genes in the Filasterea. We identified 103 putative TK-encoding genes in the whole genome sequence of C. owczarzaki (Fig. 2 and fig. S1). Of these, 92 are predicted to be RTKs, which contain an intracellular KD, a transmembrane (TM) segment, a signal peptide, and known protein domains and motifs in the extracellular region. The other 11 TKs lack a signal peptide and a TM and are thus classified as CTKs. We also isolated seven RTKs and eight CTKs by a PCR-based survey in M. vibrans (Fig. 2 and fig. S1).

Fig. 2

Classification of C. owczarzaki and M. vibrans TKs. One hundred three C. owczarzaki TKs and 15 M. vibrans TKs were divided into 27 and 7 distinct families, respectively, by domain architecture and KD phylogeny. A typical domain organization is schematically displayed for each family, with the number of the genes that belong to each family shown in parentheses. The domain organizations of all the proteins can be found in fig. S1. The Pfam or SMART domain names are shown on the bottom. The sizes of the illustrations are proportional to the actual lengths of amino acid sequences. Scale bar, 200 amino acid residues.

According to their protein domain organization and the phylogenetic relationship between KDs, we classified C. owczarzaki TKs into 27 families (8 CTK and 19 RTK) and M. vibrans TKs into 7 families (6 CTK and 1 RTK) (see Materials and Methods for the classification criteria). The CTK repertoire of M. vibrans is similar to that of C. owczarzaki, but the only RTK family identified in M. vibrans is unique and does not share its domain architecture with any of the known RTK families, including those of C. owczarzaki. RTKs of C. owczarzaki show an extensive divergence in their architectures, containing 19 families with distinct organizations of protein domains. Among them, the RTK11 family is the largest with 40 genes, which encode 4 to 46 leucine-rich repeats (LRRs) (fig. S1). Many of the extracellular domains found in these filasterean RTKs, such as the epidermal growth factor (EGF)–like and fibronectin type III (FN3) domains, generally interact with other proteins, including extracellular ligands or other receptors.

Phylogenetic position of TKs in the ePK superfamily

To elucidate the origin of TKs, we first conducted a preliminary phylogenetic analysis of the ePK superfamily, including TKs and STKs. We included 23 diverse STK families, with a particular focus on close relatives of TKs (21, 22), as well as a partial TK repertoire from filastereans, metazoans, and the choanoflagellate Monosiga brevicollis, in addition to the previously reported TKs from pre-opisthokonts (29, 30). We also included 5 putative KD sequences of TKs that were found in the genome sequence of the apusozoan Thecamonas trahens (33), which is the putative sister group to the opisthokonts (Fig. 1) (34), and another 17 found in the genome of the amoebozoan Acanthamoeba castellanii. We found no TK-encoding genes in any of the available fungal genomes, including those of the two early-branching species Allomyces macrogynus and Spizellomyces punctatus (35), which have been sequenced under the UNICORN project (36). This suggests that TKs originated before the divergence between metazoan and fungi and were secondarily lost from fungi (30), if the consensus phylogeny is correct (Fig. 1).

The Bayesian phylogenetic tree inferred from the alignment of KD sequences provided a nearly maximum statistical support (a Bayesian posterior probability of 0.97) to a monophyletic clustering of holozoan [metazoan, choanoflagellate, and filasterean (Fig. 1)] TKs (group A TKs) (Fig. 3 and fig. S2). We deduced a similar tree with the maximum likelihood (ML) method (fig. S3A). Removal of TK sequences from the data set did not alter the overall topology of the ePK tree (fig. S4). The group A TKs branch within another type of TK (group B TKs), which includes all of the putative pre-opisthokont TKs except for some amoebozoan TKs and shows a diversification independent of that of group A. The monophyletic clustering of group A and B TKs is supported with a maximum Bayesian posterior probability of 1.0.

Fig. 3

Phylogenetic positions of TKs in the ePK superfamily. A Bayesian tree was inferred from 173 amino acid sites of the KD alignment. Twenty-three diverse STK families were chosen (21, 56). MLK, mixed lineage kinase; GC-PK, guanylate cyclase–coupled protein kinase; CTR, constitutive triple response; EDR, enhanced disease resistance; DPYK, D. discoideum protein tyrosine kinase; ILK, integrin-linked kinase; CaMK, calcium/calmodulin-dependent protein kinase; MLCK, myosin light chain kinase; βARK, β-adrenergic receptor kinase; LRRK, leucine-rich repeat kinase; Plant RLK, plant receptor–like kinase; TESK, testis-specific protein kinase. The Bayesian posterior probabilities are shown at two key branches. Predicted or experimentally proven TKs are shown by red lines. The asterisk indicates that the TK motif is not clear but was experimentally shown to have only TK activity (29). Holozoan group A TKs are highlighted by red shading. Abbreviated species names are as follows: A, A. castellanii; C, C. reinhardtii; D, D. discoideum; E, E. histolytica; T, T. trahens; and P, P. infestans. Further details can be found in fig. S2.

To further corroborate this topology, we performed two additional tests. First, we challenged the position of group A by statistically evaluating the 144 alternative topologies that are different from each other by the group A position; all topologies that placed group A outside group B were statistically rejected (fig. S5). Second, we statistically assessed the diversification pattern between group A TKs, group B TKs, and the STK group also by bootstrapping: 2000 hypothetical ML trees generated by bootstrapping mostly support the relationship among these three groups suggested by the ML tree in Fig. 3 (fig. S6). It is thus likely that the canonical TKs of metazoans, choanoflagellates, and filastereans originated from pre-opisthokont TKs and independently diversified. Any of the experimentally proven TKs from D. discoideum (29) and a few predicted TKs from A. castellanii do not branch within either group A or group B, but they are included in the ancestral STK class, reminiscent of multiple origins of TKs. However, ML trees optimized either under the constraint that all of the A. castellanii TKs included within group A and B, or under the constraint that all the amoebozoan TKs (that is, those from both A. castellanii and D. discoideum) included within group A and B are not denied by the one-tailed, Kishino-Hasegawa test (fig. S3) (37). Therefore, we cannot confidently conclude whether TKs and their catalytic loop motifs evolved once or multiple times in eukaryote evolution.

Evolution of architectural diversity of the holozoan TKs

To further explore the evolutionary history of holozoan TKs, we performed a more focused ML phylogenetic analysis on the TKs from C. owczarzaki and M. vibrans as well as those from the choanoflagellate M. brevicollis and some representative metazoans, including Homo sapiens, Drosophila melanogaster, and the sponge Amphimedon queenslandica (Fig. 4 and fig. S7). TKs showing identical domain architecture and highly similar KD sequences with others in a single organism are considered to be recently duplicated paralogs, which were removed to simplify the analysis. Two large monophyletic clusters of C. owczarzaki and A. queenslandica TKs (clusters I and II, respectively) were also trimmed and analyzed separately (Fig. 5 and figs. S8 and S9). Caution should be exercised in using published genome annotations, which were usually automatically performed, because long protein sequences such as those of TKs are often mispredicted. Thus, we manually revised the published M. brevicollis and A. queenslandica TK-encoding gene predictions (24, 25) and optimized the predictions of their domain organizations. To minimize phylogenetic artifacts, we also inferred an ML tree that excluded sequences with biased amino acid composition and rapidly evolving sequences, using as an outgroup the group B TKs, which are the closest relatives of group A TKs (fig. S10); the tree topology was mostly in agreement with the original tree including the complete data set (Fig. 4), except for the position of A. queenslandica immunoglobulin (Ig)7–RTK, which is suggested to be a colon carcinoma kinase 4 (CCK4) family homolog (fig. S10). Consistent with the preliminary analysis (Fig. 3), the tree showed that all holozoan TKs (group A) form a clade distinct from that of the group B pre-opisthokont TKs (Fig. 4).

Fig. 4

Comprehensive phylogenetic tree of holozoan TKs. The ML tree was inferred from an alignment consisting of 190 holozoan TKs (group A), 12 pre-opisthokont TKs (chosen from the group B TKs in Fig. 3), and 7 ancestor STKs (chosen from Fig. 3 as an outgroup). All TKs found in the genomes except for highly similar paralogs were included. The genes belonging to clusters I (C. owczarzaki) and II (A. queenslandica) (red shading) are not fully included but were separately analyzed. The sequences of A. queenslandica, M. brevicollis, C. owczarzaki, and M. vibrans are in blue, green, red, and orange lines, respectively. Families are shaded in gray, and the names of 29 major metazoan families are indicated. Black square indicates a putative M. brevicollis ortholog of the Syk family, which is classified into an independent family as a result of a minor architectural difference (gray stripe). Jak family proteins share the same domain architecture with this protein. It can thus also be a Jak homolog whose KD had been replaced with that of a Syk-related family. Black circle indicates a possible A. queenslandica ortholog of the CCK4 family, as suggested by another ML analysis excluding biased or fast-evolving sequences (fig. S10). Asterisks indicate RTKs. The detailed tree with individual gene names, bootstrap values, and protein architectures are shown in fig. S7.

Fig. 5

Expansion of C. owczarzaki–specific TKs in cluster I. The tree was inferred by the ML method. Different C. owczarzaki TK families are indicated by distinct symbols and colors. A typical domain organization is shown for each family in the lower panel. The abbreviated names of the protein domains or motifs were taken from the SMART or Pfam databases, except for “Cys-rich,” which are not mapped to any of the known protein domains or motifs but just contain periodically arranged cysteines. The TKs with Cys-rich domains were classified by the number of cysteines (shown in the diagrams) usually seen in a repeating unit. C indicates CTKs. A cross indicates an RTK whose extracellular domain organization has been entirely altered by domain duplication, shuffling, and conversion that occurred relatively recently (more than 85% KD identity with its counterpart). Blue bars indicate the evolutionary points or branches where extracellular domain alterations are supposed to have occurred by parsimonious inference. Red and green arrowheads indicate CoPTK-207 and CoPTK-219, respectively, whose extracellular sequences are highly similar (see fig. S14). The detailed tree with the bootstrap supports and protein architectures is presented in fig. S8.

As shown by previous PCR-based studies of a freshwater sponge (17, 18), the A. queenslandica genome strengthened the notion that the basic repertoire of the CTK and RTK families found in most eumetazoan lineages (cnidarians and bilaterians) had already been established before the divergence of eumetazoans and sponges (25); 15 (16 if the CCK4 family is included) of the 29 major eumetazoan TK families were identified in the genome of A. queenslandica (Fig. 4). Phylogenetic analyses suggest gene losses in sponge rather than innovations in eumetazoans, even for eumetazoan families that sponge lacks (17, 18). Their protein domain organizations are also well conserved (figs. S7 and S11) (17, 25). Moreover, this anciently established repertoire has not suffered a marked change in metazoan evolution (17, 18). Some metazoans, however, exceptionally diversified their TK repertoire independently in each lineage (22, 25) by creating TKs with diverse architectures, for example, in A. queenslandica (Fig. 4, cluster II, and fig. S9).

The TK repertoire of filastereans shows a divergence pattern markedly different from that of sponges; most CTKs are clearly orthologous to 6 of the 10 major metazoan CTK families (Src, Tec, Csk, Abl, Fak, and Fes) on the basis of KD sequence similarity and overall domain architecture, whereas none of the RTKs can be confidently assigned to any metazoan family by the same criteria (Figs. 2 and 4 and fig. S11). Similarly, the PCR-based sampling of the TK repertoire of M. vibrans identified eight CTKs, of which seven are assigned to five metazoan families, and seven RTKs, all of which belong to a unique family (Figs. 2 and 4 and fig. S11). A similar situation was also seen in choanoflagellates, which have orthologs of at least six metazoan CTK families (Fig. 4 and fig. S11), and a large RTK expansion that appears independent of both filastereans and metazoans (19, 24).

Diversity and commonality of the holozoan TK repertoires

We summarized the numbers of unique and shared TKs among these five holozoan genomes, H. sapiens, D. melanogaster, A. queenslandica, M. brevicollis, and C. owczarzaki (Fig. 6 and fig. S11), highlighting the ancient establishment of CTK families and the rapid turnover of RTKs in the premetazoan stage. Of the 10 CTK families that are mostly in common among the three metazoans (fig. S11), 6 are also present in at least one filasterean. Another ML analysis focusing only on CTK families confirmed this finding (fig. S12). The basic Src homology 2 (SH2)–SH3–KD architecture of the Src-related families (Src, Tec, Csk, and Abl) was established before the divergence of filastereans and the MeCh and had already diversified into the four families. The presence of homologs of Fes and Fak in filastereans demonstrates that they were secondarily lost from M. brevicollis, although a putative Fes homolog is present in the choanoflagellate Codosiga gracilis (19). In addition, we identified a previously unreported Shark/HTK16 homolog in the M. brevicollis genome. This gene, together with the possible Syk ortholog or Jak homolog of this choanoflagellate (Fig. 4 and fig. S11), brings the number of metazoan CTK families with a premetazoan origin to eight. In addition to these common CTK families, each holozoan clade has also increased the repertoire independently by gene duplication and domain shuffling (fig. S11). In contrast, we did not observe a clear orthology between the RTKs of metazoans, choanoflagellates, and filastereans (Fig. 6), either by the phylogeny of KD sequences or by comparison of domain organization (Fig. 4 and figs. S7 and S11).

Fig. 6

Diversity and commonality of holozoan TK families. Numbers of TK families of five holozoans are summarized in Venn diagrams. The numbers were counted on the basis of the classification shown in fig. S11. Gene duplications that occurred within a family were not taken into account. An asterisk indicates that although the Syk family is shared only by H. sapiens and A. queenslandica, having the same domain architecture, a possible ortholog is present in M. brevicollis with a B41 domain added. This gene could also be a Jak homolog. See the legend to Fig. 4 for further details. The dagger indicates that ML analysis on the data set excluding biased or fast-evolving sequences (see fig. S10) suggests the A. queenslandica Ig7-RTK as a CCK4 family gene, although we do not include it here according to the ML analysis on the complete data set (Fig. 4 and fig. S7).

We then focused on the pattern of clade-specific RTK expansion within the Choanoflagellata. As described earlier, most metazoan-specific families seem to have been generated within early metazoans before the divergence between sponges and the rest of the metazoans. To determine whether a similar pattern was seen in choanoflagellates, we compared the complete TK repertoire of M. brevicollis (family Codonosigidae) with partial ones of Monosiga ovata (family Salpingoecidae) (38), C. gracilis (family Codonosigidae), and Stephanoeca diplocostata (family Acanthoecidae), which were obtained by PCR surveys (fig. S13) (19). Five choanoflagellate-specific CTK families and three RTK families are shared between M. brevicollis and other choanoflagellate species (figs. S11 and S13). In contrast, three CTK and seven RTK families found in M. ovata, C. gracilis, and S. diplocostata are not present in the whole genome of M. brevicollis (fig. S13). Thus, unlike in metazoans, the generation of novel TK families early in choanoflagellate evolution continued to at least beyond the divergence of these three taxonomic families of the Choanoflagellata.

Extensive lineage-specific diversification of the C. owczarzaki RTKs

Of the 92 C. owczarzaki RTKs, 88 belong to a single cluster, cluster I (Figs. 4 and 5 and fig. S8), on the basis of KD sequence similarity. They display diverse domain organizations, which is indicative of frequent domain shuffling. Given that their common ancestor belonged to the RTK11 family (Fig. 5), which is most common in this cluster (Fig. 2), we parsimoniously estimated 22 extracellular architecture alterations in total (Fig. 5). Fourteen RTKs show particularly high KD sequence identities (more than 85%) to their close relatives that have distinct extracellular domain organizations, which suggest recent duplication and domain swapping (Fig. 5). It is thus likely that frequent gene duplication and domain shuffling that occurred in C. owczarzaki augmented the diversity of RTKs of this protist. The A. queenslandica RTKs in cluster II show a similar pattern of diversification (fig. S9), indicating that the rapid expansion of RTKs is not a phenomenon specific to unicellular protists but occurred also in the earliest branching metazoans with primitive multicellularity (39, 40). These mechanisms of quick expansion of RTKs are likely to have contributed to increasing the variety of protein tyrosine phosphatases (PTPs), which are antagonists of TK signaling (41). For example, the extracellular region of the C. owczarzaki RTK CoPTK-207 (Fig. 5) shows a high sequence identity not only to that of the other RTK CoPTK-219 (Fig. 5) but also to that of a PTP (CoPTP-14FN3) that has two PTP catalytic domains instead of a KD (fig. S14).


Our data show that holozoan TKs originated from ancestral pre-opisthokont TKs and diversified independently, and that fungi secondarily lost TKs. The two known filastereans, C. owczarzaki and M. vibrans, have TK repertoires that rival those of choanoflagellates and metazoans in size. This suggests that an elaborate TK repertoire emerged before the divergence of filastereans from choanoflagellates and metazoans. Moreover, our data show that the early evolution of RTKs is markedly different from that of CTKs. Of the 10 common metazoan CTK families, eight were already in place even before the divergence of the Filasterea from the MeCh. In contrast to the early maturation of CTKs, no orthology was found among RTKs in metazoans, choanoflagellates, and filastereans. This indicates that the last common ancestor of holozoans had (i) one or a few RTKs, from which each clade has expanded the repertoire by frequent gene duplication, and duplication and shuffling of the extracellular domains, or (ii) many RTKs, but extensive domain shuffling, gene loss, or KD conversion in each clade has obscured the orthology on the basis of the architectures of the extracellular regions and KD sequence homology. In all of these three lineages, RTKs substantially outnumber CTKs, and some families are highly expanded.

Within metazoans, we saw substantial RTK generation at the earliest stage of metazoan evolution before the split between sponges and eumetazoans but far fewer changes in domain architecture in more evolutionarily recent times. Although some metazoan lineages (for example, sponges) seem to have expanded their TK repertoire also recently (fig. S9), most recent large-scale changes are limited to simple duplications, including whole-genome duplications within vertebrates (12, 16, 42, 43). On the other hand, choanoflagellates seem to have continued the diversification of TKs even after the divergence of their three taxonomic families. However, more data are necessary to assess the continued TK diversification within choanoflagellates and filastereans.

We have no experimental evidence for the biological functions of filasterean TKs. Although a previous publication suggested that the phosphotyrosine-mediated signaling system of choanoflagellates might serve a function related to the cell cycle or cell survival (44), still nothing is known about the extracellular signals received by the RTKs. However, the clade-specific divergence pattern of RTKs is consistent with the hypothesis that they act to detect changes in the extracellular environment or to recognize and catch prey (44), because each organism has to be adapted to its own environment and nutrient conditions. This might explain why the extracellular regions of RTKs in filastereans and choanoflagellates, which are usually exposed to the environment, are highly divergent in each lineage. The metazoan RTK repertoire, however, seems to be largely stable after the initial expansion, with a unique set of metazoan RTKs retained after the emergence of multicellularity. The recruitment of this initial set for functions such as intercellular communication and control of cell adhesion, instead of for receiving environmental cues, might be a key to the evolution of metazoan multicellularity. The sponge-specific RTK repertoire, which diversified independently of the common RTK set of metazoans, may reflect the constant exposure of their tissues to the environment (45), in contrast to eumetazoans, in which the RTKs are generally exposed to internal fluids. The more stable evolution of CTKs would then reflect a relatively stable intracellular environment; CTKs generally act downstream of RTKs or other receptors to transmit extracellular signals (46).

We have shown the extensive diversification of holozoan TKs after the split from fungi, as well as the differences in the patterns of expansion between RTKs and CTKs before and after the evolution of multicellularity. We hypothesize that the generation of a new RTK repertoire and its use to encode cell-to-cell communication tools has been one of the key genetic changes at the transition from the unicellular to the multicellular system. The genome sequencing of the other filasterean M. vibrans, as well as the remaining holozoan taxon, the Ichthyosporea (Fig. 1), and other pre-opisthokonts (36), together with functional assays on TKs, will further clarify the evolutionary origin and functional transition of TK signaling.

Materials and Methods

Cultures of filastereans

Live cultures of C. owczarzaki and M. vibrans were purchased from the American Type Culture Collection (ATCC) and maintained at 23°C in ATCC medium 1034 and at 17°C in ATCC medium 1525.

Cloning of TK-encoding genes from C. owczarzaki and M. vibrans

M. vibrans cDNAs were obtained either by PCR with degenerate primers as described previously (17, 19) or by searching available ESTs (4). The 15 obtained cDNAs were extended to the 5′ and 3′ termini by rapid amplification of cDNA ends (RACE) (47). All of the putative TK-encoding sequences previously found in ESTs (4), except the clearly redundant ones, were included in this study. The uncertain exon-intron boundaries and 5′ and 3′ ends of five Capsaspora TKs (CoPTK-96, CoPTK-102, CoPTK-219, CoPTK-235, and CoPTK-238) were confirmed by PCR and RACE. Sequences were deposited in GenBank under accession numbers AB591048 to AB591054.

Data mining

The genome sequences and predicted protein collections of H. sapiens, D. melanogaster, M. brevicollis, and C. owczarzaki were obtained from,,, and, respectively. The predicted TK-encoding sequences of M. brevicollis and A. queenslandica were obtained from The genome sequence of A. queenslandica was provided by B. Degnan (University of Queensland). The predicted proteomes were searched for TKs by the BLAST and HMMER (48) programs. The highly characteristic motif in the catalytic loop [typically HRDLAARN/HRDLRAAN (24)] was also used to find TKs. Note that none of the published gene predictions automatically performed on whole genome sequences is perfect even though they are covered by theoretically sufficient raw reads, and it is still important to analyze directly on the DNA sequences and the raw reads even if official gene predictions are available. We therefore manually revised all the published prediction of TK-encoding sequences of A. queenslandica, M. brevicollis, and C. owczarzaki by searching the genomic DNA sequences with two ab initio gene prediction programs, GENSCAN (49) and Augustus (50). For the Augustus prediction, we optimized the model parameters specifically for each organism. Their genomic DNA assemblies were also revised and reconstructed from the raw data by the use of the wgs-assembler ( and phrap ( programs when necessary. We then checked all of the predictions manually by comparing with closely related sequences and by inspecting the independent predictions, including suboptimal ones, by GENSCAN and Augustus, paying special attention to obtaining complete sequences. Such reannotations were indispensable both to the classification of TKs based on their protein domain organization and to the reliable phylogeny by KD sequence alignment. The integrity of each protein domain and the presence of the signal peptide in the N terminus of RTK are good indicators of the correct prediction. Comparison of domain architectures of closely related proteins also improved the gene prediction and even the genome sequence assembly. The word “mod” or “new” is added to the name of a M. brevicollis gene if the previous prediction (24) is modified, or if the gene is newly discovered in the genome. We totally renamed the A. queenslandica genes that were originally predicted in the whole genome sequence (25) because most of them have been radically modified in this study. Not all A. queenslandica and M. brevicollis TKs were included for the analyses, but they were selected such that the representative sequences cover all the diversity of TK families. We also searched the whole genome sequences of five pre-opisthokonts, E. histolytica, T. trahens, A. castellanii, C. reinhardtii, and P. infestans, which were retrieved from,,,, and, respectively. Here, we completed, if possible, only the KD sequences, because the quality of assembly was not always satisfactory for obtaining the full-length sequences. We also went back to the raw data and assembled them if necessary. The whole genome sequences of two basal fungal species A. macrogynus and S. punctatus were retrieved from All of the sequences (re)annotated in this study are found in the Supplementary Materials.

Protein domain architecture

Protein domain organization was analyzed with HMMER software by searching the SMART ( and Pfam ( databases. We confirmed the presence of these predicted domains by manually inspecting the alignments by HMMER. Cysteine-rich sequences that were not mapped to any known domain were classified by the number of the cysteines in a repeating unit, which was identified by aligning the repeats. The TM and signal peptides of RTKs were predicted by the TMHMM and SignalP programs (, respectively (51).

Classification of TKs

We defined “cognate” TKs as those that had identical domain organization or differed by the loss of one or more copies of a single domain. Repeats of a domain were counted as one (for example, proteins with five and seven Ig-like repeats are cognate, as in the CCK4 family; Fig. 4 and fig. S7). Noncognate TKs include those that differ by more than one deletion or by one or more domain insertions. Insertions were distinguished from deletions by phylogenetic analysis of their timing. Domain insertions that occurred specifically in bilaterian lineages after the separation from nonbilaterians (or protists) were not taken into account, because such domains might have been defined only by alignments of bilaterian sequences, which are generally much better investigated than others. We classified TKs into families according to both their domain organization and, with some exceptions (discussed later), their phylogenetic relationship on the basis of the KD sequence similarity. Cognate TKs showing a close phylogenetic relationship were classified into a single family. Those that show a cognate domain organization but do not take the neighboring position in the phylogenetic tree were thus classified into distinct families because they were considered to be products of evolutionary convergence. However, cognate TKs of filasterean, choanoflagellate, or sponge lineages that were not mapped to any common metazoan family (thus, they are likely to have diversified specifically in each lineage) were classified into a single family even if they do not show the closest phylogenetic relationship to each other. As seen in Fig. 5 and figs. S8 and S9, frequent domain swapping and conversion have obscured the orthology of such TK genes during their diversification process. We therefore classified them only on the basis of the domain organizations to avoid overestimation of clade-specific expansion of the TK families of these lineages. By the same reasoning, we also classified TKs from a single species having no known domain but KD (or KD, TM, and signal peptide in the case of RTKs) into a single family, which can be subdivided into multiple families when new protein domains or motifs are described in the future.

Inference of phylogenetic trees

Phylogenetic trees were inferred on the basis of the comparison of KD sequences by the ML method with the four categories WAG-Γ model by the use of RAxML v7.0.4 software (52). For large trees with more than 90 operational taxonomic units, we refined the initial near-ML topology found by the RAxML program with the same program in a genetic algorithm (GA)–based heuristic approach, similar to a method previously described (53). The difference of our approach is that, after the normal GA-based optimization, we applied a final tuning of the discovered near-ML topology by the tree bisection and reconnection (TBR) and crossover (tree recombination) algorithms (53); we explored all possible topologies generated by TBR from the GA-optimized topology and all of the possible topologies generated by crossing-over all topologies of the GA population in every possible combination, and confirmed that there was no more improvement. With this approach, we could effectively avoid the local maxima of likelihood for large trees (53). The calculation was performed with a computer cluster composed of six iMacs (Apple) with two cores, a MacPro (Apple) with eight cores, and two PCs (Dell) with four cores. The bootstrap replicates were also performed by RAxML. One C. owczarzaki TK (CoPTK-243) was not included because its KD is highly divergent. Phylogenetic inference of the holozoan TK position in the protein kinase superfamily was performed also by the Bayesian method with MrBayes v3.1.2 (54) with the same model. We ran the program for 10,000,000 generations (average SD of split frequencies < 0.013). The confidence of clustering is represented by Bayesian posterior probability. The alignments are shown in figs. S15 and S16.

Statistical test on the group-level clustering

ML bootstrap values for the trees including TKs and STKs (fig. S3A) are relatively low, mostly as a result of the use of divergent sequences and the low number (173) of alignment sites. To test the robustness of higher-level topology (that is, the relationship between group A TKs, group B TKs, and members of the STK group), we developed a new method, which assesses the overall group-level diversification in each tree of bootstrap replicates. We first inferred 2000 bootstrap-replicated trees, each optimized by the ML method (one example is shown in fig. S6B). We then determined for each tree two putative boundary branches that most likely separate group A and B TK clusters from STKs and group A TK clusters from group B TKs, allowing some violations of included members. We tracked the tree branches by the minimum path starting from a group A TK-encoding gene (1 in fig. S6A) to an STK-encoding gene (84 in fig. S6A) and sought a branch (STK-TK boundary, blue bars in fig. S6, A and B) that maximized the number of group A and group B TKs below this boundary and the number of STKs on the other side of this boundary. We then sought another branch (group A–group B boundary, red bars in fig. S6, A and B) that maximized the number of group A TKs below this boundary and the number of group B TKs above this boundary but still below the STK-TK boundary. We defined the area below the group A–group B boundary as the red area, and that above the group A–group B boundary, but still below the STK-TK boundary, as the blue area. The numbers of the group A TKs (1 to 16), group B TKs (17 to 37), and TKs that belong to neither group A nor B (38 to 45) were counted in each of the red and blue areas and presented as histograms.

Supplementary Materials

Fig. S1. Protein domain organizations of all of the TKs of C. owczarzaki and M. vibrans.

Fig. S2. Phylogenetic position of TKs in the ePK superfamily.

Fig. S3. ML analysis of the phylogenetic positions of TKs and a test of the hypothesis of the multiple origins of TKs.

Fig. S4. Bayesian and ML analyses of ePKs without TKs.

Fig. S5. Statistical tests of the group A position.

Fig. S6. Statistical test of the group-level clustering of TKs.

Fig. S7. Phylogenetic tree and domain architectures of holozoan TKs.

Fig. S8. ML tree and protein domain organizations of cluster I TKs of C. owczarzaki.

Fig. S9. Diversification of A. queenslandica RTKs in cluster II.

Fig. S10. An ML tree excluding biased sequences and rapidly evolving sequences.

Fig. S11. TK families of six holozoans.

Fig. S12. Phylogenetic tree of CTKs.

Fig. S13. Protein domain organizations of TKs from three choanoflagellates and their comparison with the M. brevicollis TK repertoire.

Fig. S14. Comparison of three C. owczarzaki receptor proteins with very similar extracellular regions.

Fig. S15. Alignment of ePKs.

Fig. S16. Alignment of holozoan TKs.


Sequence data

References and Notes

Acknowledgments: The genome sequences of C. owczarzaki, A. macrogynus, S. punctatus, and T. trahens are being sequenced by the Broad Institute, Massachusetts Institute of Technology and Harvard University, under the auspices of the National Human Genome Research Institute. We thank B. Degnan (University of Queensland) for access to the A. queenslandica genome sequence database; K. Worley and her colleagues in the Human Genome Sequencing Center of the Baylor College of Medicine for allowing us to analyze the A. castellanii genome sequence; and F. Lang (University of Montreal) for access to the unpublished genome data of T. trahens, A. macrogynus, and S. punctatus. Funding: H.S. is supported by the Marie Curie Intra-European Fellowship within the 7th European Community Framework Programme. A.d.M. is supported by the grant Formación del Personal Investigador from Ministerio de Ciencia e Innovación awarded to I.R.-T. K.S.-T. is supported by starting grants from the University of Oslo. This study was financially supported by the European Research Council Starting Grant ERC-2007-StG-206883 (to I.R.-T.), grant BFU2008-02839/BMC from Ministerio de Ciencia e Innovación (to I.R.-T.), and NIH grant HG004164 (to G.M.). Author contributions: H.S. and A.d.M. performed the experiments; H.S., M.D., G.M., and A.d.M. performed computational work; H.S. and I.R.-T. designed the experiments; H.S., I.R.-T., and G.M. designed computational analyses; and H.S., I.R.-T., G.M., M.D., and K.S.-T. wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: Sequences have been deposited in GenBank under accession numbers AB591048 to AB591054.
View Abstract

Stay Connected to Science Signaling

Navigate This Article