Research ResourceStructural Biology

Identifying three-dimensional structures of autophosphorylation complexes in crystals of protein kinases

See allHide authors and affiliations

Science Signaling  01 Dec 2015:
Vol. 8, Issue 405, pp. rs13
DOI: 10.1126/scisignal.aaa6711

Autophosphorylation sites revealed

Three-dimensional structural data from crystals of protein kinases have aided the development of drugs and provided insights into kinase regulation and substrate recognition. Many protein kinases trans-autophosphorylate; one kinase phosphorylates another molecule of the same kinase. Anticipating that published crystallographic data may include undescribed information, Xu et al. developed a bioinformatics method to analyze the crystals of kinases for the presence of complexes representing the conformation of kinases during autophosphorylation. The authors identified 15 autophosphorylation complexes in the Protein Data Bank, including five that had not been previously described. With this additional information, structural motifs involved in autophosphorylation become identifiable, which may aid in rational drug design and understanding disease-associated mutations.


Protein kinase autophosphorylation is a common regulatory mechanism in cell signaling pathways. Crystal structures of several homomeric protein kinase complexes have a serine, threonine, or tyrosine autophosphorylation site of one kinase monomer located in the active site of another monomer, a structural complex that we call an “autophosphorylation complex.” We developed and applied a structural bioinformatics method to identify all such autophosphorylation complexes in x-ray crystallographic structures in the Protein Data Bank (PDB). We identified 15 autophosphorylation complexes in the PDB, of which five complexes had not previously been described in the publications describing the crystal structures. These five complexes consist of tyrosine residues in the N-terminal juxtamembrane regions of colony-stimulating factor 1 receptor (CSF1R, Tyr561) and ephrin receptor A2 (EPHA2, Tyr594), tyrosine residues in the activation loops of the SRC kinase family member LCK (Tyr394) and insulin-like growth factor 1 receptor (IGF1R, Tyr1166), and a serine in a nuclear localization signal region of CDC-like kinase 2 (CLK2, Ser142). Mutations in the complex interface may alter autophosphorylation activity and contribute to disease; therefore, we mutated residues in the autophosphorylation complex interface of LCK and found that two mutations impaired autophosphorylation (T445V and N446A) and mutation of Pro447 to Ala, Gly, or Leu increased autophosphorylation. The identified autophosphorylation sites are conserved in many kinases, suggesting that, by homology, these complexes may provide insight into autophosphorylation complex interfaces of kinases that are relevant drug targets.


Protein kinases play important roles in many cellular signaling pathways, such as cell cycle regulation and apoptosis (1). Problems in kinase regulation can lead to diverse illnesses ranging from cancer (2) to obesity (3). Activity of most kinases is partially regulated by the phosphorylation status and position of the activation loop, which begins with the highly conserved DFG (Asp-Phe-Gly) motif and ends with a sequence usually similar to APE (Ala-Pro-Glu) (4). In many kinases, the nonphosphorylated activation loop occupies a position that interferes with substrate binding. When phosphorylated, usually by trans-autophosphorylation (meaning, by a second instance of the same kinase), the activation loop becomes repositioned, providing access to the active site for substrates and rearranging several residues required for catalysis (5). Many kinases contain additional sites outside the activation loop that are also trans-autophosphorylated (6).

Several kinase structures have been reported in which a serine, threonine, or tyrosine autophosphorylation site of one kinase monomer is present in the active site of another monomer of the same protein in the crystal (715). In these structures, the position of the phosphorylation site and adjacent residues resembles those of substrates in structures of substrate peptides bound to kinases (1618). Phosphorylation sites reported in autophosphorylation complexes in crystals include a tyrosine in the juxtamembrane region that is N-terminal to the kinase domain of the receptor tyrosine kinase c-KIT [Protein Data Bank (PDB: 1PKG)] (7), a tyrosine in the “kinase insert region” of fibroblast growth factor receptor 1 (FGFR1) [PDB: 3GQI (8)] and of FGFR3 [PDB: 4K33 (14)], a tyrosine in the C-terminal tail of FGFR2 [PDB: 3CLY (9)] and of epidermal growth factor receptor (EGFR) [PDB: 4I21 (19)], and a tyrosine in the activation loop of insulin-like growth factor 1 receptor (IGF1R) [PDB: 3D94 (10)]. In all of these, the tyrosine side chain of the substrate kinase is hydrogen-bonded to the catalytic Asp side chain of the active site HRD (His-Arg-Asp) motif of the enzyme kinase [the site in FGFR1 (PDB: 3GQI) has been mutated to Phe but is correctly positioned if it were Tyr]. Furthermore, each of these residues is an experimentally verified autophosphorylation site in these kinases. For serine/threonine kinases, autophosphorylation complexes of the activation loop Thr residues of p21-activated kinase (PAK1) [PDB: 3Q4Z (11)] and interleukin-1 receptor–associated kinase 4 (IRAK4) [PDB: 4U97 and 4U9A (15)] have been described, as have autophosphorylation complexes of the C-terminal regulatory regions of human [PDB: 2WEL (12)] and Caenorhabditis elegans [PDB: 3KK8 and 3KK9 (13)] calcium/calmodulin-dependent kinase II (CaMKII).

Because of the importance of understanding kinase activation processes and kinase-substrate recognition, we sought to identify undetected autophosphorylation complexes in crystals of kinases in the PDB using a structural bioinformatics approach. Using the symmetry information for each crystal provided by the PDB, we constructed all distinct interfaces between monomers in 3525 kinase crystals in the PDB (as of 24 October 2015) and measured the distance between the Asp oxygen atoms of the HRD motif in one monomer and the hydroxyl groups on Ser, Thr, and Tyr of the other monomer, and vice versa.

This approach properly identified the 10 previously described autophosphorylation complexes listed above and identified five more that were not described as such in the relevant papers. The newly identified autophosphorylation complexes include (i) the activation loop Tyr of the human nonreceptor tyrosine kinase LCK [PDB: 2PL0 (20)], which is similar to the IGF1R structure (10); (ii) a second tyrosine autophosphorylation site (Tyr1166) in the activation loop of human IGF1R [PDB: 3LVP (21)]; (iii) a Tyr in the N-terminal juxtamembrane region of human colony-stimulating factor 1 receptor (CSF1R) [PDB: 3LCD (22)] that is homologous to the tyrosine observed in the c-KIT complex (7); (iv) a Tyr in the N-terminal juxtamembrane region of ephrin receptor A2 (EPHA2) [PDB: 4PDO (23)] that represents a phosphorylation site near but distinct from that in the c-KIT and CSF1R complexes; and (v) a Ser in the C-terminal tail of human CDC2/CDC28-like kinase 2 (CLK2) [PDB: 3NR9 (24)]. We also identified several additional structures of autophosphorylation complexes of PAK1 including PDB: 4O0R, 4O0T (25); 4P90 (26); 4ZY4, 4ZY5, 4ZY6 (27); and 4ZLO, 4ZJI, and 4ZJJ (28). Two of these structures, PDB: 4ZY4 and 4ZY5, contain an autophosphorylation complex with a fully ordered activation loop in the substrate kinase, whereas the others, including PDB: 3Q4Z (11), are missing the coordinates of 8 to 11 residues within the activation loop.

Comparison of all the newly identified sites to the structures of previously published peptide substrate–kinase complexes in the PDB indicated that these sites are consistent with catalysis of phosphorylation. However, comparison of one of the previously described autophosphorylation complexes, EGFR [PDB: 4I21 (19)], was not consistent with recently published structures of EGFR in complexes with peptide substrates.

Because of the central importance of activation loop autophosphorylation in the regulation of tyrosine kinases, we performed site-directed mutagenesis of the autophosphorylation complex interface of LCK that we identified in PDB: 2PL0, changing several residues in the G helix that contacts the activation loop of the opposing monomer, and analyzed the phosphorylation of these when expressed in cells. A mutation in Pro447 in LCK has previously been identified as activating and is associated with T cell leukemia (29). Whereas some mutations in the interface region impaired autophosphorylation, mutation of Pro447 to Gly, Ala, or Leu maintained or increased autophosphorylation activity.

By comparing all autophosphorylation complexes with each other and with the structures of peptide-bound kinases, we discovered common structural elements in subsets of the complexes, providing insights regarding the substrate specificity of kinases in general and autophosphorylation sites in particular, and the importance of domain-domain interactions in the phosphorylation of some substrates in the context of full-sized proteins as distinct from peptide substrates. Furthermore, because several of the autophosphorylation sites are conserved in other kinases, the relevance of these complexes likely extends to other clinically relevant drug targets. We show that this is particularly true of the IGF1R, LCK, and PAK1 activation loop sites, which are found at homologous positions within the activation loop in the sequences of a large number of kinases.


Potential autophosphorylation complexes in crystals of protein kinases

As of 24 October 2015, the PDB contained 3525 crystal structures of proteins that contain a Ser or Thr kinase domain or a Tyr kinase domain as identified with the associated hidden Markov models (HMMs) provided by Pfam (30) (Pfam models “Pkinase” and “Pkinase_Tyr,” respectively). These PDB entries contain structural information on 237 human kinases, 25 mouse kinases, and 103 kinases from other species.

To identify possible autophosphorylation complexes in each crystal, we needed to build coordinates for a portion of the crystal large enough to contain at least one example of all distinct protein-protein interactions that make up the crystal, using methods described previously (31, 32). We started our analysis with the asymmetric unit (ASU), the smallest part of the crystal structure to which symmetry rules can be applied to produce coordinates for the entire crystal (33). We used information on the symmetry of each crystal provided by the PDB to build a unit cell of the crystal, a cuboid object (or parallelepiped) consisting of multiple copies of the ASU. Once we built a compact unit cell for a crystal, we built a 3 × 3 × 3 collection of unit cells of the crystal by copying each unit cell and translating it in the ±x, ±y, and ±z directions as needed. This was done to guarantee that we have all possible interactions of the original ASU with all other neighboring ASUs in the crystal.

Each protein in the assembly of 27 unit cells is given an identifier in a standard format, such as “A:2_555.” The letter A is the chain identifier of the original ASU, which might consist of chains A, B, and C, for instance, if there are three such chains in the ASU. The first number after the colon is the identifier of the symmetry operator (a rule to copy, rotate, and translate the coordinates of the ASU) used to orient the ASUs in the unit cell (“2” in this example). The number 555 indicates the unit cell at the center of our 3 × 3 × 3 box; neighboring unit cells of the center cell in the x, y, and z directions are designated by adding or subtracting 1 from these numbers. As another example, “B:3_655” indicates author chain B, symmetry operator 3 for the space group of the crystal form, and a neighboring unit cell one step in the positive x direction. After building a 3 × 3 × 3 collection of unit cells for each crystal, we determined the list of distinct protein-protein interfaces in the crystal for further analysis.

Formation of a hydrogen bond between the substrate hydroxyl group of the Ser, Thr, or Tyr side chain and the catalytic aspartic acid residue side chain of the active site HRD motif is a critical step in the mechanism of phosphorylation by protein kinases (34). Therefore, we calculated the distances between hydroxyl oxygen atoms of Thr, Ser, or Tyr of each monomer with the Asp carboxylate atoms (Oδ1 and Oδ2) of the HRD motif for each of the other monomer in each of the distinct homodimers in each crystal. Because some kinase structures in the PDB contain phosphorylated amino acids and phosphomimetic mutations of phosphorylation sites to Asp or Glu, we also measured the distances of the appropriate oxygen atoms of these side chains and the active site Asp. Additionally, substrate residue side chains often form hydrogen bonds with the side chain of arginine or lysine residues located two or four amino acids after the HRD motif (HRD+2 and HRD+4) (35), and where this residue was present, we calculated potential hydrogen bond distances between hydrogen bond acceptor oxygen atoms in the substrate side chain and hydrogen bond donor nitrogens in the Arg or Lys side chain of HRD+2 or HRD+4 (usually only one of these is present in the sequence).

With this procedure, we identified 15 potential unique autophosphorylation complexes in 26 PDB entries. These structures and the autophosphorylation sites contained in them are listed in Table 1, which provides details on each autophosphorylation complex including the chains involved and their symmetry relationship, the relevant substrate and enzyme residues, the substrate sequence about the phosphorylation site, the surface area of interaction of the two kinases, and the HRD-phosphosite hydrogen bond distance. Some crystals contain more than one unique instance of the autophosphorylation complex, and these are indicated in Table 1. Often, the complexes are very similar, but in some cases, they may have distinct structural features, such as the locations and extent of domain-domain interactions between the kinases. The mean and median hydrogen bond lengths (oxygen-to-oxygen atom distances) are 3.16 and 2.91 Å, respectively (picking only the shortest distance for each phosphosite, if there are multiple structures), which are typical of hydroxyl/carboxylate hydrogen bonds. The mean and median interface surface areas are 913 and 848 Å2, respectively, indicating extensive interactions between the kinases beyond the residues adjacent to the phosphorylation site in the sequence. For each structure, we mapped the interaction areas separately to the surfaces of the enzyme and substrate kinases (fig. S1).

Table 1 New and previously documented autophosphorylation complexes in the PDB.

All kinases in the autophosphorylation complexes listed are human kinases except ceCaMKII from C. elegans. Residues in the sequence that are disordered in the structure are in lowercase. The phosphorylation site is the middle of the 13 residues of each sequence shown in red. Distance (Dist) is in Å. The HRD column provides the residue number of the catalytic site aspartic acid and the OH column indicates phosphorylated residue number (Y, T, or S). The enzyme and substrate columns provide the ASU chain ID given by the authors. When the enzyme or substrate kinase does not come from the original ASU but from a copy of the ASU, the symmetry operator and unit cell identifier are given (see the text for details). The ASA column is the solvent accessible surface area in square Å. The BA column indicates biological assemblies deposited in the PDB that contain the autophosphorylation dimer. These are defined either by the authors (“auth.”) or the program PISA. “—” indicates that no biological assemblies deposited in the PDB contain the autophosphorylation dimer. Juxtamem, juxtamembrane; act. loop, activation loop; kinase ins, kinase insert region; C-term tail, C-terminal tail; N-term tail, N-terminal tail.

View this table:

Table 1 also lists the symmetry operators required to determine the position of the enzyme and substrate kinase in each autophosphorylation complex. We found that 26 PDB entries contain 33 autophosphorylation complexes; 18 of these 33 are between kinases in different copies of the ASU and 13 are between kinase monomers in different unit cells, demonstrating the importance of considering the full symmetry of each crystal in identifying potential autophosphorylation complexes.

We expect that, in autophosphorylation complexes, the enzyme kinase should be in a conformation consistent with known active kinase structures in the PDB. The substrate kinases may be in an active or inactive conformation, and the state observed in the crystal may be informative in determining under what conditions trans-autophosphorylation might take place. For these reasons, in Table 2, we list three features that distinguish active from inactive kinases for both enzymes and substrates listed in Table 1: (i) the position of the phenylalanine ring of the DFG motif; (ii) the presence or absence of a salt bridge between a Lys residue of the N-terminal β sheet and a Glu of the C helix; (iii) the existence (or lack thereof) of van der Waals interactions of residue side chains of the “regulatory spine” (36), which comprises the His side chain of the HRD motif, the Phe of the DFG motif, and two additional hydrophobic residues in the N-terminal domain. The presence of an intact regulatory spine is associated with active enzyme structures (36). Most of the enzyme kinases in the autophosphorylation complexes are consistent with the active conformation (Table 2): the DFG motif is labeled “DFG-in,” meaning that the Phe ring is under the C helix and the activation loop is positioned such that a substrate may bind in the enzyme active site; the distance of the Nζ atom of the Lys side chain and Oε1 or Oε2 atoms of the Glu side chain are within hydrogen-bonding distance (<3.5 Å); and the four regulatory spine residues are in contact along the spine, meaning that the first residue contacts the second (within 4.5 Å), the second contacts the third, and the third contacts the fourth. Table 2 also lists the ligands bound to the enzyme and substrate kinases with the three-letter codes provided by the PDB and the distance of the HRD+2 or HRD+4 Arg or Lys side-chain hydrogen bond donors (if present) with the phosphorylation site hydroxyl atom. In complexes of kinases and peptide substrates, this hydrogen bond (distance <3.5 Å) is typically present (35). The residues within nine residues of the phosphorylation site of the substrate kinase that contact the enzyme kinase are highlighted in Table 3.

Table 2 Structural features of autophosphorylation complexes.

Features indicating active or inactive kinases are given for the enzyme and substrate kinases, including the position of the DFG-motif residues (“in” indicates active; “out” and “up” indicate two forms of inactive conformations, named for the position and orientation of the Phe residue of the DFG motif); the distance between the Lys and Glu atoms (“K/E”) typically involved in a salt bridge in active kinases (a distance <3.5 Å indicates a salt bridge), whether the regulatory spine is intact (“Y” indicates active). The PDB’s three-letter codes of ligands in the active site are given for the enzyme and substrate kinases. “—” indicates there is no ligand bound. The HRD+2 and HRD+4 columns indicate the distance between the phosphosite hydroxyl atom and the hydrogen bond donor site on Arg or Lys residues either two or four residues after the HRD motif. This interaction is commonly observed in peptide-kinase phosphorylation complexes and indicates that the complex may be a phosphorylation complex when this distance is <3.5 Å. The autophosphorylation site Tyr1166 of activation loop is disordered in the second dimer of PDB: 3LVP, and Tyr583 is mutated to phenylalanine in PDB: 3GQI; therefore, HRD+2 and HRD+4 sites are labeled “—”. Enz, enzyme kinase; Sub, substrate kinase.

View this table:
Table 3 Specificity contacts of the substrate regions with the enzyme kinase.

Residues of the substrate kinase that are in contact with the enzyme kinase are highlighted in yellow. The autophosphorylation site is residue 0. The Phe and Glu residues at position 0 are mutants. X indicates pTyr residues. The residues in lowercase letters are disordered (that is, present in the chain sequence but not present in the coordinates). An asterisk in the residue number after the autophosphorylation site indicates that the C terminus of the protein sequence is present in the crystal.

View this table:

For each of the kinase autophosphorylation structures listed in Tables 1 to 3, we provide coordinates for the complexes (data file S1) and describe multiple properties of the complexes. In particular, we examined several aspects of the structures within each group: (i) the activity state of the enzyme kinase—whether the activation loop and C helix are in positions consistent with active kinases (36) (Table 2); (ii) the position of the substrate loop or the N- or C-terminal substrate segment in the substrate kinase compared to non-autophosphorylation structures of the same or closely related kinases; (iii) apparent relevance in some cases of specific domain-domain contacts between the enzyme and substrate kinases; (iv) indications of specific, important determinants of substrate specificity at sequence positions neighboring the phosphorylation position (Table 3) and comparison with peptide-kinase crystal structures of closely related kinases; and (v)conservation and annotations of the phosphorylation sites at homologous sites in other kinases in addition to the autophosphorylation complex listed in Table 1.

Autophosphorylation complexes of the juxtamembrane regions of tyrosine kinase receptors

In our search of interfaces between kinases in the PDB, we identified the structures of two previously undescribed autophosphorylation complexes similar to the previously described autophosphorylation complex of c-KIT [Fig. 1A; PDB: 1PKG (7)]: CSF1R [Fig. 1B; PDB: 3LCD (22)] and EPHA2 [Fig. 1C; PDB: 4PDO (23)]. Although not described by Mol et al. (7), we determined that the crystal of c-KIT (PDB: 1PKG) contains two distinct versions of the autophosphorylation complex of Tyr568 with different orientations of the substrate kinase domain (colored magenta and pink in Fig. 1A) relative to the enzyme kinase (colored blue and periwinkle). The ASU of this crystal contains two copies of c-KIT (chains A and B) that are in the active DFG-in conformation (4) (Table 2), both of which contain phosphate groups on Tyr568 and Tyr570, which are located in the juxtamembrane region of the kinase between the membrane-spanning helix (residues 525 to 545) and the kinase domain (residues 589 to 937). Phosphorylated Tyr568 (pTyr568) of chain B is located in the active site of chain A in the ASU, with atom Oη of Tyr568 within hydrogen-bonding distance of the side chain of Asp792 of the HRD motif (a hydrogen bond would be formed if the Asp was protonated). In chain B, pTyr568 forms a hydrogen bond with Arg796 at the HRD+4 position of the enzyme kinase (Table 2). The pTyr568 of chain A of the ASU is located in the active site of chain B of another copy of the ASU in an adjacent unit cell (chain B:2_655).

Fig. 1 Autophosphorylation complexes for N-terminal juxtamembrane segments of c-KIT, CSF1R, and EPHA2.

(A) Superposition of two distinct c-KIT autophosphorylation complexes identified in PDB: 1PKG. The first homodimer is composed of chain A (enzyme, in marine) and B (substrate, in magenta) from the ASU. The substrate Tyr (in this case, phosphorylated Tyr) is in magenta spheres and the Asp of the HRD motif in the enzyme kinase is in orange spheres. ADP (adenosine 5′-diphosphate) is shown in sticks. The second homodimer contains chain B:2_655 (enzyme, in periwinkle) and A (substrate, in pink) from neighboring unit cells. (B) Human CSF1R autophosphorylation complex (PDB: 3LCD) showing one ASU monomer and one neighbor. BDY is the bound ligand. (C) Human EPHA2 autophosphorylation complex (PDB: 4PDO), showing one ASU monomer and one neighbor. (D) Superposition of c-KIT monomers from available PDB entries showing differences in the position of the N-terminal juxtamembrane region in inactive structures and autophosphorylation structures. Structures are marine. The activation loops and juxtamembrane regions of the inactive structures and the autophosphorylation complex (PDB: 1PKG) are indicated in different colors as labeled. The phosphorylation sites (Tyr568 and Tyr570) are shown in sticks for all monomers in which they are present. (E) Superposition of CSF1R structures from the PDB. The autophosphorylation site, Tyr561, is indicated in sticks, and the juxtamembrane region in the autophosphorylation complex from PDB: 3LCD is green. The activation loop and juxtamembrane segment coloring are the same as in panel (D). (F) Superposition of EPHA2 structures from the PDB. Only in PDB: 4PDO is the juxtamembrane region ordered (green). The autophosphorylation site, Tyr594, is shown in sticks. The coloring is the same as in panel (D). (G) Similarity of backbone conformations of substrate peptides (from P−4 to P+3) in the EPHA2 autophosphorylation complex (PDB: 4PDO) and the EPHA3 peptide complexes (PDB: 3FXX and 3FY2). The substrate regions are in stick figures with each kinase indicated by a different color. The active site Asp residues are in orange ball-and-stick, and the substrate Tyr is also in protein-specific colored ball-and-stick. (H) Close-up of the active site and the substrate region of c-KIT (PDB: 1PKG), CSF1R (PDB: 3LCD), and EPHA2 (PDB: 4PDO). The EPHA2 substrate deviates from c-KIT and CSF1R at P+4; the CSF1R and c-KIT substrates deviate from each other at P+9. (I) Sequence alignment of the juxtamembrane region of CSF1R, c-KIT, EPHA2, and homologs that contain a Tyr residue (in red) at the same position as the ones present in the autophosphorylation complexes shown in panels (A), (B), and (C). The residue numbers of the red tyrosines that are homologous to the c-KIT, CSF1R, and EPHA2 structures are given after the sequence. Other tyrosine autophosphorylation sites annotated in UniProt for each kinase are in blue bold type. The conserved Asp and Pro residues (DP) are in green bold type. EPHAA_HUMAN (EPHA10) is a pseudokinase.

Although not mentioned in the report of the crystal structure of CSF1R [PDB: 3LCD (22)], we found that Tyr561 of the sole monomer (chain A) in the ASU is located in the active site of a neighboring ASU of the CSF1R crystal in the same unit cell (chain A:4_555) (Fig. 1B). Like the monomers in the c-KIT autophosphorylation complex (PDB: 1PKG), the kinase in CSF1R (PDB: 3LCD) is in an active DFG-in conformation (Table 2). The other structures of both c-KIT (PDB: 1T45, 1T46, 3G0E, 3G0F, 4HVS, and 4U0I) and CSF1R (PDB: 2I0V, 2I0Y, 2I1M, 2OGV, 3BEA, 3DPK, 3KRJ, 3KRL, 3LCO, 4HW7, 4R7H, and 4R7I) are all in inactive “DFG-out” conformations. Superposition of c-KIT monomers from available PDB entries illustrated that in inactive structures, the N-terminal juxtamembrane region blocks the substrate-binding site in an autoinhibitory manner, whereas in the autophosphorylation complexes, the N-terminal juxtamembrane region extends from the surface of the kinase and would be accessible for phosphorylation (Fig. 1D). Similarly, in the inactive structures of CSF1R, the N-terminal juxtamembrane region is not disordered and blocks the substrate-binding site by displacing the activation loop (Fig. 1E). Conversely, in the autophosphorylation complex, the N-terminal juxtamembrane region of CSF1R extends from the surface of the kinase, accessible for phosphorylation (Fig. 1E). The differences between the active and inactive structures for c-KIT and CSF1R are similar with the exception that the inactive structure of c-KIT (PDB: 3G0F) forms an intermolecular inhibitory complex and the other inactive structures form intramolecular inhibitory interactions.

Similar to c-KIT and CSF1R, we found that EPHA2 forms a potential autophosphorylation complex in a crystal structure of the kinase domain [PDB: 4PDO (23)], with a hydrogen bond between juxtamembrane region Tyr594 and Asp739 of the HRD motif (Fig. 1C). Both chains are in the active DFG-in conformation (Table 2), as are the other available structures of EPHA2 (PDB: 1MQB, 4P2K, and 4TRL) (Fig. 1F), none of which contains a visible Tyr594 in the electron density. In the EPHA2 (PDB: 4PDO) structure, the salt bridge atom-atom distance in the N-terminal domain is too long to form a hydrogen bond (Table 2), but the backbone positions of the two residues are in the same positions as in the other EPHA2 structures. A small change of side-chain conformation would bring them into hydrogen-bonding distance, consistent with an active kinase conformation. This EPHA2 crystal (PDB: 4PDO) contains two nearly identical autophosphorylation complexes [root mean square deviation (RMSD) of 0.03 Å calculated over the backbone atoms of 480 residues, 240 in each chain]. One complex contains two copies of chain A in different unit cells (asymmetric monomer chains A and A:2_764), and the other contains two copies of chain B in different unit cells (chains B and B:2_654) (Table 1). The biological assembly deposited in the PDB by the authors is the same as the ASU, which is composed of chains A and B, and does not represent a potential autophosphorylation complex. Although Wei et al. (23) noted that the juxtamembrane segments of each chain are ordered, point away from the kinase domain, and contact neighboring chains in the crystal, they did not mention the presence of the autophosphorylation site of these segments in the active site of the neighboring kinase protein.

We compared the autophosphorylation complex of EPHA2 that we identified in PDB: 4PDO with two crystal structures of EPHA3 bound to optimized substrate peptides [PDB: 3FXX and 3FY2 (18)]. Structure alignment of the peptides in EPHA3 (PDB: 3FXX and 3FY2) with the substrate kinase residues in the autophosphorylation complex of EPHA2 (PDB: 4PDO) revealed very similar backbone conformations of the substrate from P–4 to P+3 (Fig. 1G) [RMSD values of 0.48 and 0.98 Å over backbone atoms for EPHA3 (PDB: 3FXX and 3FY2) to EPHA2 (PDB: 4PDO), respectively]. Both the peptide substrates and the autophosphorylation substrate contain a type VIII β turn consisting of residues P–2, P–1, P0, and P+1 (where P0 is the phosphorylation site). A β turn is defined as four consecutive residues where the Cα atoms of the first and fourth residues are less than 7.0 Å apart. A type VIII turn occurs when residues 2 and 3 are in conformations in the α and β regions of the Ramachandran map, respectively (37). The same type VIII β turn occurs in the CSF1R structure (PDB: 3LCD, residues 559 to 562 of the substrate kinase).

Superimposition of the enzyme kinase monomers of c-KIT, CSF1R, and EPHA2 showed that the substrate sequence adjacent to the phosphorylation site assumed similar positions in all three autophosphorylation complexes (Fig. 1H). The substrate regions of CSF1R, c-KIT, and EPHA2 make direct contacts with the enzymatic kinase from positions P–2 to P+9 in c-KIT and CSF1R and positions P–4 to P+9 of EPHA2 (see Table 3 for direct contacts). In particular, the c-KIT and CSF1R substrate sequences and structures are very similar, and there are direct interactions of residues P+7, P+8, and P+9 (sequence Gln-Leu-Pro) of the substrate with the G helix of the enzyme in both structures (Fig. 1H and Table 3). In contrast, the kinase domains of the substrate kinases adopted different positions (relative to the enzyme kinase domains) in both complexes of c-KIT (Fig. 1A; PDB: 1PKG), CSF1R (Fig. 1B; PDB: 3LCD), and EPHA2 (Fig. 1C; PDB: 4PDO), suggesting that interactions between the kinase domain of the enzyme monomer and the kinase domain of the substrate monomer are unlikely to be critical for autophosphorylation of these juxtamembrane Tyr residues.

To determine whether the autophosphorylation sites identified in these structurally related complexes are conserved in other kinases, we used BLAST (Basic Local Alignment Search Tool) (38) to compare the sequences of c-KIT, CSF1R, and EPHA2 with the sequences of all other human kinases, which we collected from the UniProt Web site (39). With the program Clustal W (40), we aligned sequences with tyrosines in similar positions to those in the query sequences (Fig. 1I). The alignment showed that Tyr568 of c-KIT and Tyr561 of CSF1R are in homologous positions, whereas Tyr594 of EPHA2 aligns with Tyr570 of c-KIT, which is also an autophosphorylation site (7). Although the kinase domains of CSF1R and c-KIT are 59% identical, in the juxtamembrane region surrounding the autophosphorylated tyrosine residues, the proteins are 67% identical (18 of 27 identical residues), including many of the residues of the substrate region that contact the enzyme kinase (Table 3).

The autophosphorylation complexes illustrated in Fig. 1 provide information on 24 potential autophosphorylation sites in 20 different human kinases (Fig. 1I), many of which are annotated as autophosphorylation sites and known to be involved in the regulation of protein-protein interactions in cell signaling. In this aligned region, c-KIT, FMS-like tyrosine kinase 3 (FLT3), and platelet-derived growth factor receptors α and β (PDGFRA and PDGFRB) contain two conserved tyrosines that are two residues apart, and autophosphorylation of these sites regulates their interactions with various proteins (4153). The first of these sites corresponds to c-KIT Tyr568 and CSF1R Tyr561, and the second is homologous to EPHA2 Tyr594. Inspection of the sequence alignment of the EPHA and EPHB family members showed that the Tyr594 site is conserved in human type A and type B ephrin receptors (A1 to A8 and B1, B2, B3, B4, and B6) with the exception of the pseudokinase ephrin receptor A10 (EPHAA_HUMAN in Fig. 1I). Autophosphorylation at Tyr594 and Tyr588 in EPHA2 is required for the interaction of EPHA2 with the SRC homology 2 (SH2) domains of the guanine-nucleotide exchange factors VAV2 and VAV3 and for transduction of extracellular signals in vascular endothelial cells during tumor angiogenesis (54). Mutation of Tyr600 in EPHB1 (homologous to Tyr594 in EPHA2) to Phe abrogates the interaction of the receptor with kinase c-SRC and the adaptor Shc (55). The kinases KSYK and ZAP70 are autophosphorylated at Tyr352 (56) and Tyr319 (57), respectively, positions that are homologous to the EPHA2 Tyr594 site (Fig. 1I) and that control their activity in cell signaling events.

The ephrin kinases have another autophosphorylation site [Tyr588 in EPHA2 (54)] six residues before Tyr594. These two sites share a common sequence motif {TVIP}XYXDP. The P+3 residue at both these sites is proline (Pro591 in EPHA2); therefore, the PDB: 4PDO structure of EPHA2 may serve as a model for the phosphorylation of similar nearby sites in the ephrin receptor kinases.

A comparison of the c-KIT, CSF1R, and EPHA2 autophosphorylation structures, as well as the EPHA3 substrate–peptide complex, indicates a particular element that defines the specificity of these kinases for their phosphorylation sites (Fig. 2). In all four structures, the residue at P+3 is a hydrophobic amino acid that is in van der Waals contact with several hydrophobic amino acids near the C-terminal end of the activation loop and at least one residue on the G helix of the enzyme kinase. In c-KIT, the P+3 residue is Ile571, making contact with Val833 of the activation loop, Ile841 immediately following the activation loop, and Tyr880 of the G helix (Fig. 2A). In CSF1R, Ile564 at P+3 contacts activation loop residues Val819 and Met822, Ile827 after the activation loop, and G-helix residue Tyr866 (Fig. 2B). Similarly, in EPHA2, the P+3 residue is Pro597, which contacts Ile779 and Ile781 of the activation loop, Ile789 following the loop, and Met827 of the G helix (Fig. 2C). Similarly, in the structures of EPHA3 with optimized peptides (PDB: 3FY2 and 3FXX), the P+3 residue is an isoleucine making contact with Ile788 of the activation loop, Ile796 just after the activation loop, and Ile834 of the G helix (Fig. 2D).

Fig. 2 The P+3 substrate specificity contacts of c-KIT, CSF1R, EPHA2, and EPHA3.

(A) Hydrophobic contacts of the c-KIT (PDB: 1PKG) P+3 side chain (Ile571) and Val833 of the activation loop, Ile841 immediately following the activation loop, and Tyr880 of the G helix. (B) Hydrophobic contacts of the P+3 side chain (Ile564) of CSF1R (PDB: 3LCD) and Val819 and Met822 of activation loop, Ile827 after the activation loop, and Tyr866 of G helix. (C) Hydrophobic contacts of the EPHA2 (PDB: 4PDO) P+3 side chain (Pro597) and Ile779 and Ile781 of the activation loop, Ile789 following the loop, and Met827 of the G helix. (D) Hydrophobic contacts of the P+3 side chain (Ile9) of the peptide bound to EPHA3 (PDB: 3FY2) and Ile786 and Ile788 of the activation loop, Ile796 following the loop, and Ile834 of the G helix.

Autophosphorylation complexes of the activation loops of kinases

In our search of kinase interfaces in the PDB, we identified an autophosphorylation complex of LCK [PDB: 2PL0 (20)] in which Tyr394 of the activation loop of the sole monomer in the ASU (chain A) is positioned in the active site of chain A:2_655 and vice versa in a head-to-tail dimer (Fig. 3A). Jacobs et al. (20) mentioned this interaction but dismissed its biological relevance because of the bound inhibitors; they did not show the complex, and they deposited a monomer as the biological assembly in the PDB. The complex that we identified strongly resembles a previously described structure of an autophosphorylation complex of Tyr1165 in the activation loop of IGF1R [PDB: 3D94 (10)], which is also a symmetric head-to-tail structure with Tyr1165 of one monomer in the active site of another monomer in a neighboring ASU in the same unit cell (chain A:4_555) and vice versa (Fig. 3B). Wu et al. (10) verified the importance of the intramolecular hydrogen-bonding interactions within the activation loop of IGF1R in this structure by introducing the mutations E1162A, R1167, and R1167A in the activation loop, which all reduced autophosphorylation of Tyr1165.

Fig. 3 Autophosphorylation complexes for IGF1R and LCK.

(A) Autophosphorylation complex of LCK (PDB: 2PL0) composed of the ASU monomer and a neighboring ASU in the crystal. The coloring scheme is the same as in Fig. 1. The activation loop of one monomer lies in the active site of the other monomer and vice versa. STI denotes imatinib. (B) Autophosphorylation complex of IGF1R (PDB: 3D94) composed of one monomer from the ASU and one copy from a neighboring ASU. (C) Superposition of LCK monomers from the PDB (28 active, 4 inactive, and 1 autophosphorylation site). The active activation loops are magenta; the inactive activation loops are yellow. The activation loop of PDB: 2PL0 is green. (D) Superposition of IGF1R monomers from the PDB (3 active and 10 inactive). The activation loop of PDB: 3D94 is green. The activation loop of PDB: 3LVP is dark blue. (E) Model of an asymmetric dimer of LCK involved in autophosphorylation of Tyr394 of the activation loop. The model was created by superposition of an active monomer of LCK (PDB: 1QPC, chain A) (periwinkle with yellow Phe residue of DFG) onto one monomer of the symmetric PDB: 2PL0 autophosphorylation complex of LCK, followed by energy minimization with Gromacs. The Phe residue of DFG from PDB: 1QPC is yellow. The Asp of the active-site HRD motif in PDB: 2PL0 is shown as orange spheres, and the Asp from PDB: 1QPC is shown as blue spheres and is completely superposed with Asp in PDB: 2PL0. (F) Model of an asymmetric dimer of IGF1R involved in autophosphorylation of Tyr1165 of the activation loop. The model was created by superposition of an active monomer of IGF1R (PDB: 1K3A) (periwinkle) onto one monomer of the symmetric PDB: 3D94 autophosphorylation complex of IGF1R, followed by energy minimization with Gromacs. The coloring scheme is the same as in (E).

In both the LCK and IGF1R structures, the activation loops are in unusual, extended conformations that are not typical of other structures of LCK (Fig. 3C) or IGF1R (Fig. 3D). In the structure that we identified, the activation loop of LCK (PDB: 2PL0) is in a conformation that does not resemble inactive (PDB: 2OFV, 2OG8, 3BYS, and 3BYU) or active structures of LCK (PDB:1QPC, 1QPD, 1QPE, 1QPJ, 2OF2, 2OF4, 2OFU, 2ZM1, 2ZM4, 2ZYB, 3AX1, 3AC1, 3AC2, 3AC3, 3AC4, 3AC5, 3AC6, 3AC8, 3ACJ, 3ACK, 3AD4, 3AD5, 3AD6, 3BYM, 3BYO, 3KMM, 3KXZ, and 3LCK). Instead, it is extended away from the body of the kinase domain (green in Fig. 3C). For IGF1R, inactive kinases (PDB: 1M7N, 1P40, 2OJ9, 3I81, 3LW0, 3NW5, 3NW6, 3NW7, 3QQU, and 4D2R) usually have the activation loop folded against the kinase with Tyr1165 intramolecularly hydrogen-bonded to the active site Asp1135 of the HRD motif (58). Active IGF1R kinase domains (PDB: 1K3A, 2ZM3, and 3F5P), such as other kinases, have the activation loop pointed toward the right when the kinase is oriented with the N-terminal domain above the C-terminal domain and looking into the active site. The activation loop of IGF1R (PDB: 3D94) (green in Fig. 3D) points outward, away from both active and inactive kinase loops (magenta and yellow, respectively, in Fig. 3D).

The symmetric autophosphorylation complexes of LCK (PDB: 2PL0) and IGF1R (PDB: 3D94) do not contain active kinase monomers in conformations that we would expect of a potential autophosphorylation complex (Table 2). For example, they are not in the DFG-in conformation like the conformations observed for the c-KIT (PDB: 1PKG), CSF1R (PDB: 3LCD), and EPHA2 (PDB: 4PDO) complexes (Table 2). The regulatory spines are not intact because of the position of the phenylalanine side chain of the DFG motif. In the imatinib-bound LCK structure (PDB: 2PL0), the starting residues of the activation loop are in an inactive DFG-out conformation, as noted by Jacobs et al. (20), and the regulatory spine is therefore not intact (Table 2). In the IGF1R structure, the DFG motif is in what is sometimes called a “DFG-up” conformation (59), in which the phenyalanine ring points upward into the N-terminal domain instead of pointing inward and under the C helix as it does in active kinase structures. The Glu-to-Lys distance of the N-terminal domain is 9.25 Å in monomer A (enzyme and substrate kinase in different copies of the ASU), which is inconsistent with an active kinase.

We hypothesized that the autophosphorylation complexes for LCK and IGF1R likely consist of asymmetric dimers with an active enzyme kinase monomer and a substrate kinase in a conformation resembling one of the monomers in the symmetric structures in PDB: 2PL0 and PDB: 3D94. To investigate this, we built models of asymmetric complexes by superposing active monomers onto one of the proteins in the symmetric dimer complexes prior to energy minimization with the program Gromacs (60). We used the active monomer from PDB: 1QPC (61) to successfully model LCK (Fig. 3E) and from PDB: 1K3A (17) to successfully model IGF1R (Fig. 3F) without significant clashes and with very minimal structural changes during energy minimization (data file S2). Table 2 provides information on the structural features of these asymmetric complexes (data marked with asterisks).

Given the physiological importance of the autophosphorylation of the activation loop of tyrosine kinases, we performed mutagenesis experiments to test the biological relevance of the interface between the LCK monomers that were present in both the structure from PDB: 2PL0 (Fig. 3A) and the modeled structure (Fig. 3E). The primary contact in both the symmetric and asymmetric (modeled) autophosphorylation complexes is between the activation loop of the substrate kinase and the G helix of the enzyme kinase (Fig. 4A). Because mutations of the activation loop might affect autophosphorylation, we made the following point mutations on the G-helix (enzyme kinase) side of the interface: T445V, N446A, P447A, P447L, P447G, and Q451E.

Fig. 4 Autophosphorylation activity of mutants of LCK.

(A) Residues mutated in LCK based on the enzyme kinase in PDB: 2PL0. The targeted sites (Thr445, Asn446, Pro447, and Gln451) in the G helix and residues Glu448 and Asp364 in the enzyme kinase are shown in sticks (nitrogen atoms in blue, oxygen atoms in red, and carbon atoms in rainbow colors). The residues of the substrate activation loop are shown in magenta. (B) HEK293 cells were transfected with plasmids encoding Myc-tagged wild-type (WT) and mutant LCK proteins. Representative Western blots with antibodies recognizing Myc and phosphorylated LCK are shown. Equal volumes from each independent lysate were pooled for each protein and blotted with indicated antibodies to create the representative blots. (C) The phosphorylation activity of each mutant relative to WT (1.0, horizontal line in cyan) is plotted along with the 95% CI in these values. Four experiments were conducted for each mutant and WT. Details of the calculations are provided in table S1.

To assess LCK autophosphorylation activity, we expressed full-length LCK tagged with Myc in human embryonic kidney (HEK) 293 cells, which lack endogenous LCK (62), and monitored the production of the protein by Western blotwith an antibody against the Myc epitope and the amount of phosphorylated LCK with an antibody recognizing LCK phosphorylated at Tyr394, which is in the activation loop (Fig. 4B). Because the amounts of the LCK proteins varied, we normalized the signal obtained for the phosphorylated LCK to the signal for the total protein (Myc antibody), setting the ratio of these signals in the samples from cells expressing wild-type LCK as 1 (see table S1 for complete data). To calculate the average ratio and confidence interval (CI) for each mutant relative to wild type, we transformed the ratios by taking the log, calculated the average log and its 95% CI, and transformed back with the antilog (63). With this approach, we calculated the relative autophosphorylation activity of the mutants to that of the wild-type LCK expressed in HEK293 cells (Fig. 4C).

Thr445 is a helix N-cap residue, forming a side chain to backbone hydrogen bond with Glu448 in the ASU monomer of LCK in PDB: 2PL0 (Fig. 4A). Threonine is a common residue in this position of an N-cap, and valine is the least favored amino acid at this position (64). A mutation to Val is likely to disturb the conformation of the helix N-cap and, therefore, the interactions of the G helix with the activation loop of the other monomer. Indeed, we found that the T445V mutant had substantially reduced autophosphorylation activity (Fig. 4, B and C). Asn446 of the G helix forms a hydrogen bond with the backbone oxygen of Thr395 of the activation loop (Fig. 4A). Consistent with the importance of this interaction in stabilizing the autophosphorylation complex, the N446A mutant exhibited a substantial reduction in autophosphorylation activity (Fig. 4, B and C).

The second residue of the G helix is Pro447. This residue is mutated from Pro to Leu in a T cell leukemia cell line and is associated with constitutive activation of LCK (29). We individually mutated this residue to Ala, Leu, or Gly. The 95% CI levels show that each of the three mutations resulted in an increase in autophosphorylation activity relative to wild type (Fig. 4C and table S1). This is possibly due to a loosening of the helix N-cap so that specific interactions were more easily formed. We also mutated Gln451 to Glu. This residue forms a hydrogen bond with Glu398 of the activation loop of the other monomer; thus, we expected charge-charge repulsion to reduce the formation or stability of the complex, which we predicted would reduce autophosphorylation activity. However, this mutant had similar, although more variable, autophosphorylation activity as the wild-type LCK (Fig. 4C). We hypothesize that this autophosphorylation activity resulted from a conformational change in the side chain of the mutated residue enabling access to solvent. The data from the mutant LCK autophosphorylation analysis indicated that the interaction between Gln451 and Glu398 was not necessary for formation of the autophosphorylation complex, but that the interaction between Asn446 and Thr395 was important for formation of the autophosphorylation complex.

IGF1R is autophosphorylated at Tyr1161, Tyr1165, and Tyr1166 of the activation loop (17, 65). We identified a previously undescribed autophosphorylation complex of IGF1R [PDB: 3LVP (21)] in which Tyr1166 of the activation loop of chain B in the ASU is in the active site of chain C (Fig. 5A). This structure has four IGF1R monomers in the ASU. Monomers C and D have activation loop conformations similar to the conformations of the triphosphorylated activation loops in active IGF1R kinases [PDB: 1K3A (17), PDB: 2ZM3 (66), and PDB: 3F5P (67)] and have properties consistent with active structures (Table 2). Nemecek et al. (21) described the IGF1R structure in PDB: 3LVP as a bis-phosphorylated IGF1R in complex with a bis-azaindole compound. However, the coordinates for the phosphate groups are not in the PDB file, and the electron density map (68) does not indicate the presence of phosphate groups. The activation loop of monomer B (substrate) is in a conformation not observed in other structures of IGF1R (Fig. 3D). The loop conformation extends away from the kinase domain such that Tyr1166 points outward and sits in the active site of monomer C of the ASU in the manner of a kinase substrate. The activation loop of monomer A is disordered between residues 1161 and 1169, although it has a similar conformation to monomer B for the residues that have coordinates in the PDB file. A comparison of the substrate-enzyme complex B-C and a complex of chain A and a symmetry copy of chain D (chain D:4_555) showed that the two complexes are similar (RMSD of 0.68 Å calculated for 503 residues in the two chains) (Fig. 5B). Thus, this crystal exhibits two versions of the same autophosphorylation complex, although in one of them, the activation loop and the phosphorylation site of the substrate chain A are disordered.

Fig. 5 Autophosphorylation dimers for IGF1R Tyr1166.

(A) In the ASU of IGF1R structure PDB: 3LVP, monomer chain C (marine) is the enzyme kinase and monomer chain B (magenta) is the substrate kinase. Tyr1165 of the substrate is shown in magenta sticks, and the autophosphorylation site, Tyr1166, is shown in spheres. (B) Superposition of the two dimers of IGF1R in PDB: 3LVP. ASU monomer chains D (periwinkle, enzyme kinase) and A (pink, substrate kinase) of PDB: 3LVP. The dimer exhibits the same interface as the C-B dimer, but the activation loop region around Tyr1166 in chain A is disordered and Tyr1166 is not present in the coordinates. (C) Substrate specificity of P−1 residues of three autophosphorylation complexes and three peptide complexes. The autophosphorylation complexes comprise two IGF1R structures (PDB: 3D94 and 3LVP) and one LCK structure (PDB: 2PL0). The peptide complexes consist of one IGF1R structure (PDB: 1K3A) and two INSR structures (PDB: 3BU5 and 1IR3). Substrates are indicated by different colors. The residues at P–1 are Tyr1165 of IGF1R (PDB: 3LVP), Asp1164 of IGF1R (PDB: 3D94), Glu393 of LCK (PDB: 2PL0), Glu7 of IGF1R-bound peptide (PDB: 1K3A), Asp8 of INSR-bound peptide (PDB: 3BU5), and Asp9 of INSR-bound peptide (PDB: 1IR3). P−1 residues have hydrogen bonds (dashes) with Lys1088 of IGF1R enzymes and Lys1122 of INSR enzymes. P−1 residue of LCK structure PDB: 2PL0 interacts with Arg366 (HRD+2). (D) Substrate specificity of P+3 residues. The P+3 residues of the peptide substrates are hydrophobic [Ile (PDB: 1K3A and 3BU5) and Met (PDB: 1IR3)]; the P+3 residues of the autophosphorylation complexes are hydrophilic or neutral [Arg (PDB: 2PL0), Gly (PDB: 3LVP), and Lys (PDB: 3D94)], and do not interact substantially with the enzyme kinase.

The relative orientation of the monomers in the autophosphorylation complex of Tyr1166 of IGF1R (PDB: 3LVP) differs substantially from that of the autophosphorylation complex of Tyr1165 IGF1R (PDB: 3D94). The Tyr1165 complex is a head-to-tail dimer with the substrate activation loop extended away from the kinase monomers (Fig. 3B). The Tyr1166 complex is an asymmetric head-to-head dimer with interactions between the two C-terminal domains of the kinases (Fig. 5A). The superposition of the IGF1R Tyr1165 and Tyr1166 and the LCK tyrosine autophosphorylation complexes, along with a peptide-IGF1R complex [PDB: 1K3A (17)] and two peptide-INSR complexes [PDB: 3BU5 (69) and 1IR3 (70) (INSR and IGF1R are 80% identical in the kinase domain)], reveals the similarity of the position of the Tyr phosphorylation site and its two neighboring residues on either side (Fig. 5C).

However, the positions of the residues farther away from the Tyr site in the autophosphorylation complexes differ substantially from those in the peptide-substrate structures. The peptide substrates C-terminal to the phosphosite (PDB: 1K3A, 3BU5, and 1IR3) form extended β sheets with the activation loop (green, cyan, and yellow β-sheet strands in Fig. 5C), whereas the substrates in the autophosphorylation complexes loop away from the enzyme kinase. This is likely due to differences in the P+3 amino acid type in the peptides versus the activation loops (Fig. 5D). We noted above that the juxtamembrane substrates and an EPHA3-optimized substrate contain a hydrophobic residue at position P+3 that interacts with a pocket consisting of three hydrophobic residues from the activation loop, a region just after the activation loop, and a residue in the G helix (Fig. 2). These same interactions occur for the IGF1R (PDB: 1K3A) and INSR peptide substrates (PDB: 3BU5 and 1IR3) but do not occur for the activation loops of LCK and IGF1R (Fig. 5D). For LCK (PDB: 2PL0), the P+3 residue is Arg397, which makes only intramolecular interactions with the substrate activation loop. For IGF1R-1165 (PDB: 3D94) and IGF1R-1166 (PDB: 3LVP), the P+3 residues are Lys1168 and Gly1169, respectively, neither of which makes direct interactions with the enzyme kinase (Fig. 5D).

The activation loop autophosphorylation complexes and the peptide substrates of tyrosine kinases have one feature in common—a polar side chain at P−1 that makes hydrogen-bonding interactions with positively charged residues of the enzyme kinase (Fig. 5C). In LCK (PDB: 2PL0), this is a salt bridge of Glu393 at P−1 with Arg366, which is located two residues after the HRD motif (HRD+2). The substrate Tyr residue also forms a hydrogen bond with Arg366. LCK shares an Arg at the HRD+2 position and a Glu at the P−1 position of the activation loop Tyr autophosphorylation site with other type A and type B SRC family kinases (FGR, FYN, HCK, LYN, SRC, YES, and BLK). In the two IGF1R autophosphorylation complexes (PDB: 3D94 and 3LVP) (Fig. 5C), the interaction of the P−1 residue (Glu1164 and Tyr1165, respectively) is with a lysine residue in the D helix (Lys1088). The peptide substrates exhibit a homologous interaction of P−1 with the same Lys in the D helix of IGF1R (Lys1088 in PDB: 1K3A) and a homologous Lys in the D helix of INSR (Lys1085 in PDB: 3BU5 and 1IR3) (Fig. 5C).

Most tyrosine kinases contain one or more tyrosine residues in the activation loop that are autophosphorylated during the course of kinase activation (71, 72). We produced an alignment of the activation loops of all 90 human tyrosine kinases beginning with the DFG motif and ending with the APE motif (fig. S2). The two similar phosphorylation structures, LCK in PDB: 2PL0 and the Tyr1165 autophosphorylation complex of IGF1R in PDB: 3D94, contain a substrate tyrosine at position 13 (counting the DFG motif as residues 1 to 3); the Tyr1166 in the IGF1R complex (PDB: 3LVP) is at position 14 of the activation loop. Sixty-five of the 90 human tyrosine kinases contain a tyrosine at sites +13, +14, or both positions (fig. S2).

For 21 kinases with a tyrosine at +13, but not at +14, 17 are annotated in UniProt as autophosphorylated, two are phosphorylated by other kinases, and two are unannotated. Of the 16 kinases with a tyrosine at +14, but not at +13, 14 of them are annotated as autophosphorylated, one is unspecified, and one is unannotated (the pseudokinase EPHA10). Six kinases have a third tyrosine in this region at +15: SYK, TYK2, ZAP70, JAK1, JAK2, and JAK3. Of the 27 kinases with tyrosines at both positions 13 and 14, 17 have both tyrosines annotated as phosphorylation sites (all but one by autocatalysis), five have annotations of autocatalysis at one site but not the other, and only three are unannotated at both positions. The two IGF1R structures (Figs. 3B and 5A) and the LCK structure (Fig. 3A) provide hypothetical models for the autophosphorylation at sites +13 and +14 (relative to the DFG motif) in most human tyrosine kinases. It is possible that kinases with Tyr residues at positions 13 or 14, or both positions, may use either of the two orientations that we observed: head-to-tail for IGF1R (PDB: 3D94) and LCK (PDB: 2PL0) or head-to-head for IGF1R (PDB: 3LVP). It is not necessarily the case that position 13 is head-to-tail and position 14 is head-to-head for all Tyr kinases.

In our search of kinase interfaces, we identified a set of 10 crystal structures that contain autophosphorylation complexes of the activation loop Thr423 of PAK1 (Fig. 6 and Tables 1 and 2). For one of these, PAK1 (PDB: 3Q4Z), Wang et al. (11) described the autophosphorylation complex, a head-to-head asymmetric complex of PAK1 in the crystal ASU, whereas the others were not shown or discussed in the relevant publications (2528). Wang et al. (11) made a number of mutations in both the enzyme kinase (PAK1-KDK299R) and a kinase-dead substrate kinase (PAK1-KDK299R,D389N), demonstrating asymmetric effects on autophosphorylation rates that were consistent with the structure of the interface in PAK1 (PDB: 3Q4Z). The newly identified structures provide additional detail and insights not available in PAK1 (PDB: 3Q4Z).

Fig. 6 Autophosphorylation complexes for the activation loops of serine/threonine kinases in the PDB.

(A) PAK1 asymmetric autophosphorylation dimers in crystal form 1. Crystal form 1 comprises PDB: 3Q4Z; 4O0R, 4O0T; 4P90; 4ZY4, 4ZY5, 4ZY6; and 4ZLO. The interacting residues from enzyme G helix (Pro469, Leu470, and Leu473; periwinkle) and substrate G helix (Leu473, Tyr474, Ala477; pink) are shown in stick representation. The substrate activation loops are rainbow colored from N to C terminus. N/D389, HRD motif Asp (mutated to Asn in some structures); T/E423, autophosphorylation site Thr (mutated to Glu in some structures). (B) PAK1 asymmetric autophosphorylation dimers in crystal form 2. Crystal form 2 comprises two distinct complexes each in PDB: 4ZJI and PDB: 4ZJJ. Contacts between enzyme G helix and substrate G helix are shown in periwinkle and pink stick representation, respectively. The substrate activation loops of the substrate kinases are rainbow-colored from N to C terminus. (C) Activation loops of the PAK1 enzyme kinases of crystal form 1 and 2 (CF1 and CF2). Activation loops are shown in cartoon (magenta for crystal form 1 and green for crystal form 2), and DFG conformation is indicated. Inhibitors bound to crystal form 2 are indicated as light stick representation beneath the C helix. (D) Activation loops of PAK1 substrate kinases. The DFG conformations are indicated for each crystal form structure. Activation loops of crystal form 2 are DFG-out; the activation loops of crystal form 1 substrates are DFG-in (PDB: 4O0R), DFG-out (PDB: 4P90 and 4ZLO), and DFG-up (PDB: 3Q4Z, 4O0T, 4ZY4, 4ZY5, and 4ZY6). Two of the newly identified crystal form 1 structures contain coordinates for the complete activation loop and are indicated (PDB: 4ZY4 and 4ZY5). (E) Substrate specificity of P−2 residues of PAK1 (PDB: 4ZY4) and peptide-bound PAK4 structures (PDB: 2Q0N, 4JDH, 4JDI, 4JDJ, and 4JDK). The P−2 residue in these structures is Arg and forms hydrogen bonds (dashes) to conserved Glu residues in PAK1 and PAK4. (F) The conformational change between the activation loops of PAK1 crystal form 1 enzymes and substrates occurs at positions 426 to 428 (seven to nine residues before the end of the activation loop). φ and ψ were measured in PyMOL. (G) IRAK4 autophosphorylation dimers (PDB: 4U97 and 4I9A) of activation loop Thr345. STU is the bound ligand. (H) The conformational change between the activation loops of IRAK4 enzymes and substrates occurs at positions 350 and 351 (eight to nine residues before the end of the activation loop). The activation loops are orientated similarly to (F).

These structures fall into two crystal forms, which we refer to as crystal form 1 and crystal form 2. Within each crystal form, the PAK1 monomers form lattices of protein-protein interactions that are similar. Crystal form 1 (Fig. 6A) comprises PDB: 3Q4Z (11); 4O0R, 4O0T (25); 4P90 (26); 4ZY4, 4ZY5, 4ZY6 (27); and 4ZLO (28). Crystal form 2 (Fig. 6B) comprises PDB: 4ZJI and 4ZJJ (28). In Fig. 6, A and B, the enzyme monomers (blue) are oriented identically. Comparison of the two images shows that the position of the substrate kinase domains (magenta) in crystal form 2 are shifted downward relative to their position in crystal form 1, whereas the substrate autophosphorylation site of the activation loop still reaches the active site of the enzyme kinase. The contacts between the G helices (residues 469 to 478) are different in the two crystal forms: in crystal form 1, the side chains of Pro469, Leu470, and Leu473 of the G helix of the enzyme kinase are in contact with the side chains of Leu473, Tyr474, and Ala477 of the substrate kinase; in crystal form 2, Tyr474, Ala477, and Thr478 of the enzyme kinase make contact with Pro469, Leu470, and Leu473 of the G helix of the substrate kinase.

For the two crystal forms, we examined the positions and conformations of the activation loops of the enzyme (Fig. 6C) and substrate monomers (Fig. 6D). In crystal form 1, the activation loops of the enzyme kinases are in active DFG-in conformations in all structures, and the entire loop is fully ordered except in PAK1 (PDB: 4P90) (Fig. 6C). However, in crystal form 2, inhibitors in the type II pocket between the C helix and the N-terminal domain β sheet push the DFG phenylalanine ring out of the type II pocket, resulting in the DFG-out conformation (Fig. 6C); they are also all partially disordered. The activation loop of the substrate kinases is fully ordered in two of the newly recognized structures of crystal form 1 [PDB: 4ZY4 and 4ZY5 (27)], but not in the others, including the original structure of PAK1 (PDB: 3Q4Z) (Fig. 6D). The C-terminal region of the activation loop of the crystal form 1 substrate kinases are all similar to one another, whereas the N-terminal residues 405 to 408 exhibit some differences. Five of the structures are in DFG-up inactive conformations (PDB: 3Q4Z, 4O0R, 4ZY4, 4ZY5, and 4ZY6), two are DFG-out (PDB: 4P90 and 4ZLO), and one is DFG-in (PDB: 4O0T). This may indicate that autophosphorylation of the activation loop of PAK1 can occur no matter which conformational state the beginning of the activation loop is found in. The activation loops of the substrate kinases of crystal form 2 are all DFG-out conformations and are partially disordered from residue ~411 to 415. They differ substantially in the C-terminal segment and beyond the end of the activation loop (position 434) all the way to position 443 (Fig. 6D).

To verify the substrate interactions within the PAK1 autophosphorylation complexes, we compared the substrate segments of representative structures from each of the crystal forms of PAK1 (PDB: 4ZY4 and 4ZJI) with PAK4 structures with bound peptides [PDB: 2Q0N, 4JDH, 4JDI, 4JDJ, and 4JDK (73)] with hydrogen bonds between the phosphorylation site Ser or Thr and the HRD motif Asp440 (Fig. 6E). After superposition of the enzyme kinases, the backbone positions of the substrates in the PAK1 autophosphorylation complexes and the PAK4-bound peptides overlap very well from positions P−3 to P+1 but deviate substantially before and after this region. The PAK1 and PAK4 structures share one feature of specificity, which is an interaction of Arg at P−2 with two negatively charged residues in the enzyme kinase, which is a common feature of many Ser/Thr kinases (74) (Fig. 6E).

We compared the enzyme and substrate conformations of crystal form 1 of PAK1 to determine where and how the substrate kinase’s activation loop differs from active structures (Fig. 6F). Working backward from the end of the activation loop, the PAK1 substrate activation loop deviates from the enzyme activation loop at Pro428 and Gly426, which are seven and nine residues from the end of the loop, respectively. The values of the backbone dihedrals ϕ and ψ of these residues are consistent with this: for instance, ψ of Pro428 in the enzyme and substrate kinases is −61° and +61°, respectively, in PAK1 (PDB: 4ZY4), and the ϕ of Gly426 is 129° and 77° in the enzyme and substrate kinases, respectively.

For some purposes, such as molecular dynamics simulations, it is important to have complete coordinates without chain breaks in the backbone due to disorder. Most of the structures in crystal form 1 have such breaks within the N-terminal domain between or adjacent to β-sheet strands. Only the enzyme kinase of PAK1 (PDB: 4O0R, chain A) and the substrate kinases of PAK1 (PDB: 4O0T, chain B and 4ZY6, chain B) are complete in the N-terminal domain.

Given the inactive characteristics of the enzyme kinase structures of crystal form 2 (Table 2) and displacement of the N-terminal region of the enzyme kinase activation loop (Fig. 6C) and the C-terminal regions of the activation loops of both the enzyme and substrate kinases activation loop (Fig. 6C and 6D), it appears as if the crystal form 1 structures are better models of the autophosphorylation complex, but we cannot rule out that both crystal forms may contain relevant information for the autophosphorylation of PAK1 Thr423.

Two human IRAK4 structures (PDB: 4U97 and 4U9A) (15) also have an asymmetric enzyme-substrate dimer in the ASU with the autophosphorylation of an activation loop threonine (Thr345) (Fig. 6G), as discussed by Ferrao et al. (15). The sequence alignment of IRAK family members showed that this threonine is conserved in all four IRAKs (fig. S3). The enzyme kinases are similar to an active structure of IRAK4 [PDB: 2NRY (75)], and the substrate kinases are in the DFG-up conformations (Table 2).

The autophosphorylation complexes of PAK1 in crystal form 1 (Fig. 6A) and crystal form 2 (Fig. 6B) can be compared with that of IRAK4 (PDB: 4U97 and 4U9A) (Fig. 6G) because the enzyme kinases are oriented similarly in all three figures. Ferrao et al. (15) noted the difference between PAK1 (PDB: 3Q4Z) of crystal form 1 and their structures of IRAK4 (PDB: 4U97 and 4U9A). We determined that the IRAK4 and PAK1 crystal form 2 structures are also different. The phosphorylation sites of the PAK1 and IRAK4 structures are 12 and 14 residues before the end of the activation loop, respectively, indicating that the sites are not completely homologous. For IRAK4, working backward from the end of the activation loop, the first large changes occur at Thr 351 and Gly350; the ϕ dihedral of Thr351 changes from −121° to −64° from enzyme to substrate (PDB: 4U9A), and the ϕ dihedral of Gly350 changes from +155° to −56° from enzyme to substrate (Fig. 6H). This Gly-Thr motif (residues 350 and 351) located at −8 and −9 residues before the end of the activation loop is conserved in many kinases. The change in conformation of PAK1 occurs at a similar location (Fig. 6F).

As we noted above, the autophosphorylation sites across many tyrosine kinases are at consistent locations (positions 13 and 14) relative to the start of the activation loop at the DFG motif. We sought to determine if this was also the case for Ser/Thr kinases, which fall into several different families. A sequence alignment of the activation loop of Ser/Thr kinases (fig. S4) shows that 212 human kinases contain a Ser or Thr residue at a position aligned with the autophosphorylation site Thr423 of PAK1. This position is 12 residues from the end of the activation loop (counting the APE motif as residues 3, 2, and 1, respectively). Forty-three of the 212 human kinases with Ser or Thr at this position are annotated as having these sites as autophosphorylation sites in UniProt (fig. S4). Thus, regardless of insertions and deletions at positions before the phosphorylation site, the crystal form 1 (and possibly crystal form 2) structures of PAK1 may represent autophosphorylation complexes for many of the kinases listed in fig. S4.

Autophosphorylation complexes of FGFR kinase domains

Several receptor tyrosine kinases have a region called the kinase insert domain, which is a long loop segment inserted between the D and E helices of the C-terminal domain. Bae et al. [PDB: 3GQI (8)] and Huang et al. [PDB: 4K33 (14)] described autophosphorylation complexes of homologous sites in the kinase insert domains of FGFR1 and FGFR3, respectively. The asymmetric dimer structure of FGFR1 with mutations in two autophosphorylation sites (Y583F and Y585F) [PDB: 3GQI (8)] contains an autophosphorylation site mutated from Tyr583 to Phe583 of the kinase insert domain of one monomer that sits in the active site of a second monomer in a neighboring unit cell (A:4_456). Using site-directed mutagenesis, Bae et al. (76) demonstrated that mutations of domain-domain contacts between the monomers (PDB: 3GQI) reduced autophosphorylation of the kinase insert domain. Huang et al. (14) reported that the FGFR3 structure (PDB: 4K33) contains an autophosphorylation complex of Tyr577 of the kinase insert loop [a known autophosphorylation site (77)], consisting of two symmetry copies of chain A of the ASU in neighboring unit cells (chains A:3_645 as substrate and chain A:1_555 as enzyme kinase).

We compared the autophosphorylation complexes of FGFR1 and FGFR3 and observed a previously unrecognized similarity of these two structures, both around the substrate autophosphorylation site and in domain-domain interactions distal to the substrate-binding site (Fig. 7A). The RMSD between the complexes is 0.84 Å (calculated for the backbone atoms of 510 residues in each complex, 255 in each chain). Both complexes contain an adenosine 5′-triphosphate (ATP) analog, adenosine-5′-[β,γ-methylene]-triphosphate, and exhibit DFG-in conformations and intact salt bridges in both monomers (Table 2). In both structures, the long loop that connects the F and G helices in the C-terminal domain of the substrate kinase makes extensive contact with the loops of the N-terminal domain β sheet of the enzyme kinase. The sequences of the αF-αG loop are similar and make similar contacts in the two complexes (Fig. 7B). An alignment of sequences of the kinase insert domains shows that FGFR1-Tyr583, FGFR3-Tyr577, and FGFR2-Tyr586 (a known autophosphorylation site) are homologous to one another (fig. S5). FGFR4 does not contain a tyrosine at this position.

Fig. 7 Autophosphorylation complexes for FGFR1, FGFR3, and FGFR2.

(A) Superposition of autophosphorylation complex of FGFR1 (PDB: 3GQI) and FGFR3 (PDB: 4K33). The substrate residues, Y583F (mutant) of FGFR1 and Tyr577 of FGFR3, are positioned as a substrate in the enzyme active sites of monomers in neighboring unit cells of the crystals. (B) Specific contacts between substrate αF-αG loops and enzyme β1-β2 loops in the FGFR1 and FGFR3 autophosphorylation complexes. (C) Autophosphorylation complex of FGFR2 in PDB: 3CLY. The C-terminal segment of the substrate sits in the active site of the enzyme chain. The loop between αG and αH is colored in pink and makes contact with the loop (colored light blue) from the enzyme between the β1 and β2 strands of the N-terminal β sheet. ACP is the bound ligand. (D) Structural superposition of FGFR2 (PDB: 3CLY), FGFR1 (PDB: 3GQI), FGFR1 (PDB: 3GQL), MET (PDB: 1R0P), and RON (PDB: 3PLS). The conserved regions from the sequence alignment (fig. S6) are colored periwinkle.

Huang et al. (78) compared the substrate binding in FGFR3 (PDB: 4K33) with their earlier structure of an autophosphorylation complex of Tyr769 of the C-terminal tail of FGFR2 [PDB: 3CLY (9)] (Fig. 7C), a site that when phosphorylated is important for binding of phospholipase Cγ (PLCγ). All four FGFR protein sequences contain a tyrosine phosphorylation site at the same C-terminal position as Tyr769 of FGFR2 (fig. S6). Indeed, the autophosphorylation complex of the kinase insert of FGFR1 (PDB: 3GQI) also contains PLCγ bound to the C-terminal tail of FGFR1 with a phosphorylated Tyr766, which is homologous to Tyr769 of FGFR2. A superposition of FGFR1 and FGFR2 shows the different positions of the phosphorylation site in the C-terminal tail in the PLCγ-bound structure (PDB: 3GQI) and the autophosphorylation complex (PDB: 3CLY) (Fig. 7D). We also superposed the only other FGFR kinase domain structure that contains the C-terminal phosphorylation site in the coordinates, FGFR1 [PDB: 3GQL (8)], which contains the Tyr766 bound to the kinase insert of the same monomer (Fig. 7D). We also superposed structures of RON [PDB: 3PLS (79)] and MET [PDB: 1R0P (80)], which contain Tyr1353 and Tyr1349 residues, respectively, at homologous positions to Tyr766 of FGFR1. The superposition showed that in the structures in which the Tyr is not bound to another protein (either kinase or PLCγ), the C-terminal Tyr residues are in the same position up against the E helix. In the two bound structures (PDB: 3GQI and 3CLY), the Tyr residues occupy different positions from each other and from the unbound Tyr residues (Fig. 7D).

Autophosphorylation of long disordered tails away from the kinase domain

Five structures listed in Table 1 contain autophosphorylation complexes of N- and C-terminal tails such that the phosphosite is at least 20 amino acids away from the start or end of the folded kinase domain (that is the first β strand of the N-terminal domain or the I helix, which is the last helix of the C-terminal domain). These structures have in common the folded domains of the enzyme and substrate kinases making little or no contact with each other; the substrates bind as if they were peptides. These structures are an autophosphorylation complex of Ser142 from an N-terminal region of CLK2 (24); three autophosphorylation complexes of the C-terminal tail of CaMKII [one from human CaMKII subunit δ (12) and two from C. elegans CaMKII (13)]; and an autophosphorylation complex of Tyr1016 in the C-terminal tail of EGFR (19). The CLK2 structure has not previously been discussed as an autophosphorylation complex (24), whereas the CaMKII structures (12, 13) and the EGFR (19) structure have been previously described as such.

CLK2 is a dual-specificity kinase that is autophosphorylated at Ser142 (in the human sequence; Ser141 in the mouse sequence). When unphosphorylated at this site, mouse CLK2 resides in nuclear speckles, whereas autophosphorylation at Ser141 results in localization within the nucleoplasm (81). This region of the protein is N-terminal to the beginning of the kinase domain, which begins at residue 163. We identified a structure of human CLK2 [PDB: 3NR9 (24)] that contains a symmetric intermolecular interaction of Ser142 of chain C of the central ASU with the active site of chain B of a neighboring ASU in the same unit cell of the crystal (chain B:5_555), and vice versa (Fig. 8A). There is a similar dimer between chain A of the ASU and chain A of chain A:4_645 (backbone RMSD of 1.0 Å calculated over 680 residues in each complex). The enzyme-substrate complex comprising chains B and C has a slightly longer distance (5.36 Å) between the hydrogen bond donor (Ser142) and the hydrogen bond acceptor (Asp290) than other autophosphorylation complexes (Table 1). However, the position of the main chain of Ser142 is consistent with a kinase substrate as made evident by superposition of chain B:5_555 with the kinase domain of a structure of human dual-specificity tyrosine phosphorylation–regulated kinase 1A (DYRK1A) with a bound peptide from crumbs homolog 2 (CRB2) [PDB: 2WO6 (82)] (Fig. 8B). This structure is the closest peptide-kinase complex of known structure to CLK2 in the PDB (35% identity). A common element of substrate-enzyme interaction is the P−3 position, which is an Arg residue in both CLK2 (Arg139) and the DYRK1A-bound peptide, and this Arg interacts with homologous Glu residues [Glu294 and Glu373 in CLK2 (PDB: 3NR9); Glu291 and Glu353 in DYRK1A (PDB: 2WO6)]. This interaction has been noted for several Ser/Thr kinases (74).

Fig. 8 Autophosphorylation complexes for CLK2, CaMKII, and the C-terminal tail of EGFR.

(A) CLK2 (PDB: 3NR9) symmetric autophosphorylation of an N-terminal nuclear regulatory signal sequence (Ser142). NR9 is the bound ligand. (B) The P−3 Arg (ARG) residues of CLK2 (PDB: 3NR9) and DYRK1A (PDB: 2WO6) peptide structures form hydrogen bonds (dashes) with conserved Glu residues in the enzyme kinases. (C) Superposition of the CLK2 autophosphorylation complex (PDB: 3NR9) and CLK3 monomers (PDB: 2WU6, 2WU7, and 3RAW) shows that the homologous phosphorylation sites Ser142 in CLK2 (magenta spheres) and Ser283 in CLK3 (pink spheres) are 27 Å apart. (D) Three CaMKII autophosphorylation complexes (PDB: 3KK8, 3KK9, and 2WEL) are superposed by their enzyme kinase domains (marine blue). The substrate kinases are indicated in magenta (PDB: 3KK9), green (PDB: 3KK8), and yellow (PDB: 2WEL). Active site Asn134 and Asp136 are indicated by orange spheres; phosphosite pThr284 (TPO284) of PDB: 3KK8 (green spheres), 3KK9 (magenta spheres), and Thr287 of PDB: 2WEL (yellow spheres) are indicated. (E) Similar substrate specificity of P−3 and P+1 residues of the three CaMKII complexes and a structure of rabbit phosphorylase kinase with a bound peptide (PDB: 2PHK). The P−3 arginine residues interact with two Glu residues in the D helix and one Glu residue in the HRD+4 position. The P+1 residues interact with the C-terminal segment of the activation loop (from positions −8 to −4 from the end of the loop). (F) Superposition of human (PDB: 2VN9) and C. elegans (PDB: 2BDW) CaMKII monomers from nonautophosphorylation complexes and the substrate kinases of CaMKII (PDB: 2WEL, 3KK8, and 3KK9). The autophosphorylation sites of PDB: 3KK8 and PDB: 3KK9 are colored in red and shown in spheres. The autophosphorylation site of PDB: 2WEL is colored in blue and is separated from the substrate kinase domain by a disordered segment. The phosphorylation sites in the nonsubstrate kinases are shown in magenta spheres, buried between the I helix and the H helix. (G) Autophosphorylation complex of the C-terminal tail of EGFR (Tyr1016, magenta spheres) from the ASU of PDB: 4I21. MIG-6 is an inhibitory peptide. (H) Superposition of the enzyme kinases of EGFR-peptide complexes (PDB: 2GS6, 4R3P, 4R3R, 5CZH, and 5CZI) and the EGFR autophosphorylation complex (PDB: 4I21). The substrate in PDB: 2GS6 is covalently linked to an ATP analog. Tyrosine residues are shown in sticks in different colors for each complex. Tyr1016 of PDB: 4I21 is shifted one residue from the peptide substrates and is aligned with their P−1 residues. The distance between the HRD+4 residue (Arg841) and Tyr1016 of PDB: 4I21 is 6.01 Å. In the peptide complexes, the hydrogen bond length between the HRD+4 residue and the phosphorylation site Tyr is ~2.8 Å. Dashes indicate hydrogen bonds. (I) Superposition of EGFR monomers shows the intrachain contacts between Tyr1016 (magenta) and Glu736 (yellow) in several nonautophosphorylation structures. In the autophosphorylation complex (PDB: 4I21), Glu736 (green) is exposed and Tyr1016 (red) extends away from the substrate kinase domain by ~26 Å.

This phosphorylation site is conserved in the four human CLK kinases (fig. S7). Although the only structure of CLK2 available is PDB: 3NR9, structures of CLK3 include Ser283, which is homologous to Ser142 of CLK2. A superposition of these CLK3 structures [PDB: 2WU6, 2WU7 (83), and 3RAW (84)] with CLK2 (PDB: 3NR9) demonstrated that the phosphorylation site moves 27 Å from its position in the nonautophosphorylation complexes to its position in the potential autophosphorylation complex in CLK2 (PDB: 3NR9) (Fig. 8C). The N-terminal tails (residues 136 to 155) about the phosphorylation site (Ser142) are sandwiched between the enzyme and substrate monomers, which otherwise make essentially no contact with each other (Fig. 8A).

Three structures have been described that exhibit an autophosphorylation complex of the Thr residue of the regulatory region of CaMKII. PDB entries 3KK8 and 3KK9 (13) are structures of C. elegans CaMKII that contain the kinase domain and a portion of the R1 regulatory region, whereas human CaMKII subunit δ (PDB: 2WEL) contains the structure of a longer construct that includes the calmodulin-binding site of the regulatory region (12). The autophosphorylation site in the regulatory region is conserved within the four human CaMKII isoforms (fig. S8).

The substrate kinase domains are in three distinct positions relative to the enzyme kinase domains in these structures (Fig. 8D), indicating that interaction of the kinase domains, apart from the C-terminal tails, is not likely to be important for autophosphorylation. As with the other N- or C-terminal tails that are distant from the kinase domain in sequence (CLK2 described above and EGFR described below), the enzyme and substrate kinase domains make little contact in these structures, particularly in CaMKII PDB: 3KK9 and 2WEL.

Chao et al. compared their structure of C. elegans CaMKII autophosphorylation with a structure that they determined of CaMKII inhibitor 1 bound to C. elegans CaMKII [PDB: 3KL8 (13)]. This inhibitor binds like a substrate peptide but has an Arg residue (instead of Ser or Thr) at the position of the phosphorylation site. Instead, we compared the interaction of the residues adjacent to the autophosphorylation site of the substrate kinases with the enzyme kinases in these structures with a peptide bound to rabbit phosphorylase kinase [PDB: 2PHK (85)], which is the closest kinase-peptide-substrate complex in the PDB to both C. elegans and human CaMKII (44% sequence identity). A common substrate-enzyme interaction in the peptides and the CaMKII autophosphorylation complexes occurs at the P−3 positions, which consists of an Arg residue that interacts with one or both of two Glu residues in or near the D helix (Fig. 8E, colored as in Fig 8D; peptide substrate in pink). Furthermore, hydrophobic residues at the P+1 position in the peptides and autophosphorylation substrates interact with hydrophobic residues near the end of the activation loop (four and eight residues from the end of the activation loop).

In structures of these enzymes that are not autophosphorylation complexes, the phosphorylation site of CaMKII interacts with the substrate kinase domain, rather than extending away from the kinase and toward an enzyme kinase within the crystal. Indeed, the phosphorylation sites in the nonautophosphorylation complexes of C. elegans CaMKII [PDB: 2BDW (86)] and human CaMKII subunit δ [PDB: 2VN9 (12)] are buried between the H and I helices (Fig. 8F).

EGFR catalyzes the trans-autophosphorylation of several tyrosine residues in a disordered region C-terminal to the kinase domain (residues 712 to 979), including Tyr998, Tyr1016, Tyr1092, Tyr1110, Tyr1172, and Tyr1197 (8790). The structure of residues 694 to 1022 of EGFR with a segment of the inhibitory protein mitogen-inducible gene 6 (MIG-6) bound to the top surface of the N-terminal domain (PDB: 4I21) contains a potential autophosphorylation complex described as such by Gajiwala et al. (19). In this structure, the C-terminal tail of chain B in the ASU sits in the active site of chain A, such that Tyr1016 forms a hydrogen bond with Asp837 of the HRD motif in the active site (Fig. 8G). Additionally, the C-terminal tail of the enzyme kinase in the ASU (chain A) sits in the active site of chain B of a neighboring unit cell (chain B:1_655). These two complexes are similar (RMSD of 2.7 Å calculated for backbone atoms of both chains excluding the C-terminal tails), and only the one contained in the ASU is shown in Fig. 8G. This phosphorylated tyrosine site in EGFR (Tyr1016) is conserved in ErbB2 (Tyr1023) and ErbB4 (Tyr1022) but is not present in ErbB3 (fig. S9). These conserved Tyr sites in ErbB2 and ErbB4 may be autophosphorylation sites or may be phosphorylated by ErbB family members when heteromeric complexes are present (91).

Gajiwala et al. compared their structure with a structure of the kinase domain of EGFR with a peptide covalently bonded to ATP [PDB: 2GS6 (92)]. They observed that the Tyr substrate residue was shifted by one amino acid compared to the peptide conjugate’s Tyr residue, which was covalently attached to ATP through a nitrogen atom at the Oη hydroxyl position. They proposed a mechanism in which the hydrogen of the Tyr OH group is removed by the HRD Asp residue, which is followed by a shift of the peptide by one position to facilitate phosphorylation. However, this peptide conjugate is artificial, and we investigated whether noncovalent substrates bound to EGFR, now available in the PDB, adopted conformations that were more similar to those of the substrate in the autophosphorylation complexes found in PDB: 4I21.

Structures of the EGFR kinase domain bound to SHC1 peptide [PDB: 5CZI (93)], an optimized peptide [PDB: 5CZH (93)], or a MIG-6 peptide bound as a substrate [PDB: 4R3P and 4R3R (94)] are available in the PDB. A superposition of the enzyme kinases of the EGFR autophosphorylation complex (PDB: 4I21) and the kinase domains of these peptide-bound structures reveals similarities and differences (Fig. 8H). All of these structures contain the L858R mutation at the DFG+1 site of EGFR.The peptide-ATP conjugate conformation in PDB: 2GS6 is different from the noncovalently bound peptide substrates in EGFR (PDB: 5CZH, 5CZI, 4R3R, and 4R3P), which are similar to other tyrosine kinase structures that we described, for example, EPHA3 [PDB: 3FXX and 3FY2 (Fig. 1G)] and IGF1R [PDB: 1K3A (Fig. 5C)]. The backbone conformation in EGFR (PDB: 4I21) is similar to the noncovalently bound substrates except that it is shifted by one amino acid so that the Tyr (P0) residue overlaps with the P−1 position of the peptide substrates. The potential autophosphorylation complex, EGFR (PDB: 4I21), is missing the interaction of the substrate Tyr residue with the HRD+4 Arg, an interaction that is present in all of the peptide structures. Thus, it seems likely that the interaction of substrate with kinase in EGFR (PDB: 4I21) is an artifact of crystallization and not an authentic autophosphorylation complex.

Nevertheless, we examined the position of Tyr1016 in other structures of EGFR and observed that this residue is bound to the N-terminal domain with a hydrogen bond to the side chain of Glu736 (between the second and third β-sheet strands) in almost all structures in which it is present in the coordinates. This usually occurs intramolecularly but in several structures (PDB: 1M14, 1M17, 4G5J, 4I23, 4JQ7, 4JQ8, 4JR3, 4JRV, and 4LI5) appears as a contact between different monomers of EGFR within the crystal. For Tyr1016 to reach the position it occupies in PDB: 4I21, it must move a distance of about 25 Å compared with its position in the other structures (Fig. 8I).

Kinase homodimers with swapped activation loops

Several published structures of Ser/Thr kinases are described as potential autophosphorylation complexes in which a portion of the activation loops are symmetrically swapped between the two monomers (95100). In all of these Ser/Thr kinase structures, about 10 residues of the C-terminal end of the activation loop of one monomer displace the same residues in the structure of another monomer, and vice versa (Fig. 9A). These structures are members of a cluster of similar interfaces in ProtCID (Protein Common Interface Database) (32) that contains 13 unique kinases, 17 distinct crystal forms, and 58 PDB entries (table S2).

Fig. 9 Activation loop–swapped dimers of serine/threonine kinases.

(A) Superposition of activation loop–swapped dimers from 17 distinct crystal forms and 58 PDB entries in ProtCID. The chains are colored in green and cyan. (B) Comparison of the substrate segment of PAK1 (PDB: 4ZY4) bound to the enzyme kinase with the substrate segment of an activation loop–swapped structure of AURKA (PDB: 4C3P). The substrate peptides are shown in ribbons and colored from blue to red from N to C terminus. The substrate of AURKA (PDB: 4C3P) is positioned in the opposite direction to the substrate of PAK1 (PDB: 4ZY4), and the potential autophosphorylation site of AURKA is 6 Å from the HRD motif Asp side chain. (C) Interchain hydrogen bond between Asp256 and Thr292 in AURKA structure PDB: 4C3P. (D) Intrachain hydrogen bond in other AURKA structures (PDB: 1OL5, 3E5A and 3HA6).

In none of these complexes, however, does the annotated autophosphorylation site 12 residues from the end of the activation loop (from the APE motif) make a hydrogen bond with the Asp Oδ atoms of the HRD motif, nor does it seem possible that they could do so. For example, Thr288 of the kinase Aurora A (AURKA) in PDB: 4C3P (99) is 7.7 Å from Asp256 of the HRD motif (Fig. 9B). In addition, if the −12 site was a substrate in these structures, the polypeptide chain is running in the wrong direction across the surface of the potential enzyme kinase when compared to substrate peptides or the autophosphorylation structures described above. In these structures, the residues adjacent (±3) to the potential phosphorylation site run from right to left (N to C terminus) across the surface (as oriented in Fig. 9B), whereas substrates run from left to right, as represented by the PAK1 autophosphorylation-complex substrate (PDB: 4ZY4) (Fig. 9B). The PAK1 structure contains the hydrogen bond between the phosphosite residue (Thr423) and the HRD motif Asp (D389N in PDB: 4ZY4). Given the large number of such activation loop–swapped structures (table S2) and the absence of an autophosphorylation Ser or Thr hydrogen bond with the HRD-motif Asp side chain in all of them, it seems unlikely that these common dimer structures represent autophosphorylation complexes.

Nine of these entries have an interchain hydrogen bond between a threonine side chain at position −8 of the activation loop of one monomer (typically in a Gly-Thr sequence) and the aspartic acid side chain of the HRD motif of the other monomer, shown for human AURKA (PDB: 4C3P) (table S2 and Fig. 9C). This highly conserved Gly-Thr sequence is often referred to as the “GT motif” (101, 102). Structures with an intermolecular GT-motif hydrogen bond include human MKNK2 [PDB: 2AC3 and 2AC5 (95); PDB: 2HW7 (96)], human STK10 [PDB: 2J7T (97) and 4EQU (100)], human OXSR1 [PDB: 2VWI (98)], human AURKA [PDB: 4C3P (99)], and mouse Aurora A [PDB: 3DJ5 (103) and 3DJ6 (104)] and are represented by multiple AURKA structures (Fig. 9C). The GT-motif interaction with the aspartic acid side chain of the HRD motif is often intramolecular and present in many kinase structures (Fig. 9D). The intramolecular hydrogen bond occurs in active structures, including those with substrates bound to the active site, including the structures described above [for example, PAK1 (PDB: 4ZY4) and CLK2 (PDB: 3NR9)]. A few kinases are annotated as exhibiting phosphorylation at the −8 position (in the GT motif) (105107), but none of them is present in the substrate-binding site of these activation loop–swapped structures in a manner consistent with autophosphorylation (Fig. 9A).


Here, we have used a structural bioinformatics approach to identify the structures of potential autophosphorylation complexes in crystals of protein kinases. Of the 15 unique autophosphorylation sites in such complexes, 10 had been described in the relevant papers as autophosphorylation structures, and five were either not detected by the authors or described in the publications presenting the structures. The observed autophosphorylation complexes now comprise the N-terminal juxtamembrane tyrosines of c-KIT, CSF1R, and EPHA2; the N- or C-terminal tails of FGFR2, CLK2, EGFR, C. elegans CaMKII, and human CaMKII subunit δ; the activation loop residues of LCK, IGF1R (Tyr1165 and Tyr1166), PAK1, and IRAK4; and the kinase insert region tyrosines of FGFR1 and FGFR3. Given the rapidly increasing number of kinase structures released by the PDB (414 from 1 October 2014 to 30 September 2015), we expect, on average, the availability of two to three new autophosphorylation complexes per year.

By sequence alignment, we associated the 15 complexes with phosphorylation sites in 170 other kinases that are either annotated as phosphorylation or autophosphorylation sites. Many more potential sites in homologous positions in other kinases exist but are not currently annotated as phosphorylation sites. The autophosphorylation complexes described here can serve as hypothetical structures for these sites to be tested both computationally and experimentally to ascertain the physiological relevance of the identified structures.

A number of observations can be made when considering the complexes collectively. The position of the substrate phosphorylation site and residues immediately adjacent to it in the substrate kinase are generally consistent with the structures of peptide substrates bound to kinases. Comparison with the peptide-kinase complexes was highly informative. In several cases, we identified similar interactions of substrate side chains with specific residues within the enzyme kinase domain: (i) the P+3 hydrophobic residues of the c-KIT, CSF1R, and EPHA2 substrate kinases, which interact in a similar manner with the enzyme kinase as do similar residues in this position in peptide substrates bound to EPHA3, INSR, and IGF1R; (ii) polar P−1 residues in the activation loops of IGF1R (both Tyr1165 and Tyr1166) and LCK, which interact with enzyme kinase in a manner similar to the interactions of the residues in peptide substrates of IGF1R and INSR; (iii) P−2 Arg residues of PAK1 autophosphorylation complexes, which had similar interactions with the enzyme kinase as did P−2 Arg residues in PAK4-bound peptides; (iv) the P−3 Arg residue of CLK2, which interacts with the enzyme kinase in a manner similar to the P−3 Arg of a peptide bound to the closely related kinase DYRK1A; and (v) P−3 Arg residues of the CaMKII structures, which interact with the enzyme kinase in a manner similar to P−3 Arg residues in a peptide bound to the related kinase rabbit PHGK.

There are some distinct differences in the positions of substrate residues in the autophosphorylation complexes compared to those of peptides bound to closely related kinases. For example, the P+3 site of phosphorylation sites in the activation loops of LCK and IGF1R (Tyr1165 and Tyr1166) are polar and do not interact with the enzyme kinase, whereas similar kinases with bound peptides have hydrophobic residues at this position, and these hydrophobic residues fill a pocket formed by the G helix and the end of the activation loop of the enzyme kinase. Chen et al. compared their structure of an autophosphorylation complex of the C-terminal tail of FGFR2 [PDB: 3CLY (9)] with a structure of a peptide of the same site (residues 764 to 778, including the phosphosite Tyr769) bound to FGFR2 [PDB: 2PVF (108)]. The homodimer binds residues from P−6 to P+3 of the substrate kinase, whereas the peptide complex makes contacts only with residues P−2 to P+1. In the homodimer structure, the Glu residue in the P−2 position relative to Tyr769 makes specific hydrogen bonds to Arg573 and Ser702 of the enzyme kinase. In the peptide complex, this Glu in the P−2 position points away from the enzyme kinase. It is likely that the autophosphorylation complex of a protein substrate is a more accurate structural depiction than the peptide-substrate structure.

For all of the autophosphorylation complexes we identified, it was possible to compare the locations of the autophosphorylation sites within the substrate kinase structure compared to the positions of the same residues (or homologous residues in closely related kinases) in nonautophosphorylation structures. As might be expected, in the autophosphorylation complexes, the modified site has to bind to the peptide-binding groove of the enzyme kinase and, therefore, is typically extended away from the body of the substrate kinase in a way that it is not in the nonsubstrate structures of the same or related kinases. For the activation loop substrates, the autophosphorylation sites in nonsubstrate kinases typically form hydrogen bonds with other parts of the activation loop in both the active and inactive kinases. In the autophosphorylation structures, they extend away from the body of the kinase domain. For Ser/Thr kinases, the conformational change seems to initiate at a common position, the conserved GT motif located eight to nine residues from the end of the activation loop. The conservation of these residues may be due to the requirements of dynamic motion of the activation loop prior to autophosphorylation. For the N- and C-terminal autophosphorylation sites, the residues and their surrounding segments are bound closely to the kinase domain in the nonsubstrate structures but are extended outward in the autophosphorylation complexes.

An important issue in phosphorylation of proteins (as opposed to peptides) is whether domain-domain contacts distant from the substrate phosphosite are specific and whether they contribute to the affinity of the enzyme kinase for the substrate kinase. Information on whether a particular interface (or portion of an interface) is relevant for binding may come from similarity of the interfaces in the structures of homologous protein complexes (31, 32) and from direct experiments, including mutation of interface residues or cross-linking and mass spectrometry. Only two pairs of structures in Table 1 are from homologous autophosphorylation sites that share similar interfaces, which we identified in this work. These consist of the activation loop autophosphorylation complexes of IGF1R (PDB: 3D94) and LCK (PDB: 2PL0) and the kinase insert autophosphorylation complexes of FGFR1 (PDB: 3GQI) and FGFR3 (PDB: 4K33).

Mutation experiments support all four of these complexes. Wu et al. (10) mutated two residues involved in a salt bridge within the activation loop of IGF1R (PDB: 3D94) and found significant decreases in autophosphorylation. Here, we presented mutation data of the LCK (PDB: 2PL0) G helix of the enzyme kinase, which is in contact with the activation loop of the substrate kinase that is autophosphorylated at Tyr394. Two mutations resulted in significant loss of autophosphorylation activity as detected by an antibody recognizing the phosphorylated Tyr394 site, and mutations at Pro447 resulted in increases in autophosphorylation, consistent with the identification of the activating mutation P447L in two different cancer cell lines, HSB2 (29) and CTV1 (109).

Bae et al. (76) and Huang et al. (9) mutated the P−6 residues of the kinase insert phosphorylation sites in FGFR1 (PDB: 3GQI) and FGFR3 (PDB: 4K33), respectively. This residue is near the phosphorylated site in sequence but makes hydrogen bonds with an aspartic acid residue in the enzyme kinase in the loop between the third β-sheet strand of the N-terminal domain and the C helix. Both mutations in the similar FGFR1 and FGFR3 interfaces reduced autophosphorylation at the kinase insert site.

Of the other sites listed in Table 1, only three of them—PAK1 (PDB: 3Q4Z), IRAK4 (PDB: 4U97 and 4U9A), and FGFR2 (PDB: 3CLY)—have been tested experimentally. Wang et al. (11) showed that mutation D393A or E456I of PAK1 completely prevented Thr423 autophosphorylation in its activation loop (PDB: 3Q4Z). Charge reversal mutations R361E, E401K, and K440E of IRAK4 impaired dimerization (PDB: 4U97 and 4U9A), resulting in loss of autophosphorylation at Thr345 (15). Chen et al. (9) mutated residues in the large interface of the autophosphorylation of the FGFR2 C-terminal segment (PDB: 3CLY) and found that N730A on the substrate kinase side slowed down trans-phosphorylation. They also mutated several sites near in sequence to the phosphorylation site, but N730A was the only one distant in sequence from the phosphosite, Tyr769.

From comparison of structures of autophosphorylation complexes of homologous positions, we can also infer that kinase domain–kinase domain interactions are not important for the interaction of the enzyme and substrate kinases in some cases. The c-KIT crystal (PDB: 1PKG) contains two autophosphorylation complexes of Tyr568 that exhibit two very different orientations of the enzyme and substrate kinases with respect to one another. The newly identified complex of a homologous site in CSF1R (PDB: 3LCD) contains yet another orientation of the two kinase domains. We hypothesize that the domain-domain interaction may not be relevant for this phosphorylation event as well.

Similarly, the kinase domains of the C. elegans CaMKII complexes are oriented differently from one another in the two structures (PDB: 3KK8 and 3KK9) and differently from the structure of the homologous complex of human CaMKII subunit δ (PDB: 2WEL). In two of these structures (PDB: 3KK9 and 2WEL), there is little interaction of the enzyme and substrate kinases away from the phosphorylation site. This is also true of the EGFR (PDB: 4I21) C-terminal tail complex (PDB: 4I21) and the CLK2 N-terminal tail complex (PDB: 3NR9), which indicates that domain-domain interactions are not likely to be important for these reactions.

From these observations, the general trend is that autophosphorylation of internal loops, whether the activation loop or the kinase insert loop, utilizes domain-domain interactions away from the phosphorylation site. Conversely, the autophosphorylation of N- and C-terminal residues generally does not require a specific interaction of the kinase domains. The only exception is potentially the C-terminal tail of FGFR2 (PDB: 3CLY), but in this case, the phosphorylation site is only two residues from the end of helix I of the substrate-kinase domain. The other N- or C-terminal sites in Table 1 (in c-KIT, CSF1R, EPHA2, CLK2, CaMKII, and EGFR) are 10 or more residues away from ordered secondary structural elements of the substrate kinase domain.

An advantage of having a larger number of autophosphorylation complexes to study is that, as a group, these 15 complexes provide new information on the determinants of the substrate specificity of kinases and particularly of tyrosine kinases. These 15 autophosphorylation complexes join the 20 heterodimeric structures of kinase substrate–enzyme interactions as identified in the ProtCID database (32). The peptide-kinase complexes consist of 13 Ser or Thr phosphorylation sites and 7 Tyr phosphorylation sites, and so the autophosphorylation complexes, which comprise 10 Tyr and 5 Ser or Thr sites, provide a substantial increase in the amount of structural information on tyrosine phosphorylation. Most previous analyses of kinase-substrate specificity have been performed on a small number (<5) of structures of kinase domains with peptide substrates (110114).

Each of the structures that we described provide insight into the mechanisms of autophosphorylation because our criterion was the presence of a hydrogen bond between the side chain of the autophosphorylation site in the substrate kinase and the aspartic acid of the active site HRD motif of the enzyme kinase (or a direct contact of a mutated autophosphorylation site in a few cases). We also looked for other evidence favoring a phosphorylation reaction, such as an active conformation of the enzyme kinase and interaction of the phosphorylation site of the substrate kinase with the HRD+2 or HRD+4 Arg or Lys residue.

Other crystal structures have been used as evidence of potential autophosphorylation complexes without the presence of a direct interaction of the substrate side chain and the active site Asp. For example, PAK1 as represented by PDB: 1YHV lacks this direct interaction (115), and the relative orientations of the enzyme and substrate kinase domains are different from the structures that do contain the hydrogen bond between the phosphosite and the Asp side chain of the active site HRD motif (for example, PDB: 3Q4Z and 4ZY4). Several structures of Ser/Thr kinases with swapped activation loops have been described as autophosphorylation complexes (105107), but our analysis indicated that none of them contains a direct interaction of the phosphosite side chain and the Asp in the active site HRD motif. In these structures, the conserved phosphosite at position −12 of the activation loop (12 residues from the end of the loop) is 10 Å or more from the Asp side chain, and the orientation of the main chain about the Ser or Thr site is opposite that of bona fide peptide and autophosphorylation substrates. Thus, we predict that these are not autophosphorylation complexes, and the biological relevance of these structures remains to be determined.

The structures described in this paper may provide information that leads to understanding the roles of certain driver mutations in the development of cancer. It is likely that some mutations affect the structure, dynamics, and affinity of these kinase autophosphorylation complexes. As an example, we found evidence that the activating mutation P447L of LCK in a T cell leukemia cell line is located in the interface of the autophosphorylation complex (PDB: 2PL0) and significantly increased the rate of autophosphorylation of LCK in vitro. Oncogenic c-KIT mutations result in ligand-independent kinase activation (116). Mutations (K550I, V559D, and V560G) in the c-KIT juxtamembrane region around the two main autophosphorylation sites (Tyr568 and Tyr570) are associated with human gastrointestinal stromal tumors and are present in HMC-1, a human mast leukemia cell line (117).

Finally, most kinase inhibitors target the ATP binding site, which is highly conserved, thus leading to difficulties in the development of inhibitors specific for individual kinases (118). A number of other locations in kinase domains have been targeted as binding sites for allosteric inhibitors (119). The structures here may be used to identify pockets that could be targeted by an inhibitor to prevent autophosphorylation of specific sites and the downstream effects of phosphorylation at that site.

We anticipate that as more kinase structures are deposited in the PDB, application of the bioinformatics approach described here should readily reveal the structures of additional autophosphorylation complexes, thereby providing further insights into the mechanisms of kinase regulation and kinase-substrate recognition and opportunities for understanding the role of cancer-driving mutations and the development of anticancer therapeutics.


Procedure to identify autophosphorylation complexes

PDB structures with kinase domains were collected from the PDBfam database (120), which is produced by searching the PDB with HMMs defined by Pfam (30). There are two Pfam families, Pkinase and Pkinase_Tyr, defined for protein kinases that we used to identify the structures of kinases in the PDB. This list was organized by UniProt ( (39) codes, and the atom coordinates in the PDB coordinate files were renumbered to reflect UniProt numbering according to the SIFTS (Structure Integration with Function, Taxonomy, and Sequence) database (121).

The PDB provides coordinates of the ASU of the crystal. The ASU is the smallest element of structure, which, when copied, rotated, and translated many times, can be used to build up a model of the crystal lattice consistent with the x-ray experimental data. The symmetry operators for building up a single unit cell of the crystal are provided by the PDB for each entry in the form of 3 × 3 rotation matrices and translation vectors. We built a single unit cell for each entry with the symmetry operators of the appropriate space group. The unit cell is a cuboid or parallelepiped object, which, when copied and translated in the x, y, and z directions, will produce a model of the full crystal lattice. From each unit cell, we produced a 3 × 3 × 3 ensemble of 27 unit cells, which is sufficient for generating all possible protein-protein interfaces that exist within the crystal. This was performed as described previously (32) to produce a set of files for each entry with the coordinates of a single homodimer of the kinase with a potential autophosphorylation interface.

The HRD motif in each structure was identified in each kinase by sequence alignment of each chain’s sequence to the sequence of AURKA with the program Position-Specific Iterated (PSI)–BLAST (115) and by structure alignment to the structure of AURKA in PDB: 3E5A (122) for ambiguous cases. For the coordinates of each homodimer for each entry, we calculated the distances between all hydroxyl oxygens of one protein with the Oδ1 and Oδ2 atoms of the active site HRD Asp of the other protein and vice versa. This was performed with a custom-written Perl script. Because some potential autophosphorylation sites are mutated to Asp or Glu side chains or are already phosphorylated, we also calculated distances between the oxygen atoms of these side chains with the carboxylate atoms of the HRD Asp residue side chain.

Those interfaces with OH to Oδ distances less than 6 Å were examined and compared with structures of known kinase substrates bound to kinase domains to verify that the interaction was consistent with the known structures of substrate-kinase complexes. The structures that we used for this purpose included PAK4 with a bound consensus peptide substrate [PDB: 2Q0N (33)], IGF1R with a peptide substrate from the IRS1 protein [PDB: 1K3A (17)], and CDK2 with a bound CDC6 substrate peptide [PDB: 1GY3 (123)]. The kinase acting as enzyme (the one containing the HRD motif of the interaction) was superposed on the kinase domains of the peptide substrate–kinase complexes. Those for which the autophosphorylation site and one residue on either side visually coincided with the same positions in the peptide substrate were considered potential autophosphorylation complexes and examined further. For each kinase with a potential autophosphorylation site, we checked the UniProt Web site for annotation and experimental data confirming that the site was experimentally validated as an autophosphorylation site.

Multiple structure alignments of the kinase domain structures for each UniProt were performed with the program Theseus (124). Structures were visualized with the program PyMOL (125). Surface areas of the complexes and of the individual monomers were calculated with the program NACCESS (126). The interface surface area was calculated as the surface area of the complex minus the surface areas of the two monomers divided by 2.

Mutation experiments of LCK

Residues 1 to 509 of human LCK were cloned into the pCMV-Myc vector (Clontech). Primers were designed to generate the indicated mutants. We used a two-stage polymerase chain reaction (PCR) protocol for site-directed mutagenesis adapted from Stratagene (127). After Dpn I digest and heat inactivation, PCR products were transformed into DH5α cells. Purified plasmids from colonies were sequenced to confirm the mutations. Plasmids were transfected into HEK293 (American Type Culture Collection) with Lipofectamine 2000 according to the manufacturer’s protocol (Invitrogen). Twenty-four hours after transfection, cells were lysed with 50 mM tris (pH 7.4), 150 mM NaCl, 2 mM EDTA, 1% NP-40, 1% sodium doxycholate, 0.1% SDS, protease inhibitor cocktail (Roche), and 1 mM Na3VO4 [adapted from Chiang and Sefton (128)]. Loading buffer was added to lysates, and proteins were separated by SDS–polyacrylamide gel electrophoresis. Total LCK and phosphorylated LCK bands were visualized by Western blot with antibodies recognizing Myc or Y505pLCK (Cell Signaling). Bands were quantitated with ImageJ. For each LCK protein, four independent transfections were performed to obtain an average and SD for the degree of autophosphorylation achieved by that protein.

To average ratios that may be less than or greater than 1.0, we took the average of their logs and then the antilog (for example, the mean of 2 and ½ is 1.0 using this method, but 1.25 using the arithmetic mean). For each transfection, we calculated the ratio of pLCK to Myc for each mutant and the wild type and calculated the ratio of ratios:pLCK(Mut)Myc(Mut)/pLCK(WT>)Myc(WT)The average (geometric mean) and the 95% CI for this ratio is calculated asμ=exp(μlog)CI±=exp(μlog±1.96σlogN)where μlog and σlog are the mean and SD of log(pLCK(Mut)iMyc(Mut)i/pLCK(WT)iMyc(WT)i).

Sequence alignment

Homologous sites of the autophosphorylation sites observed in kinase crystals were identified by a PSI-BLAST (115) search of all human kinases downloaded from UniProt (39) with the relevant kinase domain as query. Kinases with Ser/Thr or Tyr aligned to the query (for Ser/Thr and Tyr kinases, respectively) were aligned with Clustal W (116). The structure alignment program FATCAT (Flexible structure AlignmenT by Chaining Aligned fragment pairs allowing Twists) (117) was used to resolve some ambiguous identifications of the DFG and APE motifs in some kinases.


Fig. S1. Surfaces of autophosphorylation complexes and the individual enzyme and substrate kinases.

Fig. S2. Sequence alignment of the activation loops of tyrosine kinases.

Fig. S3. Sequence alignment of the activation loops of kinases of the IRAK family.

Fig. S4. Sequence alignment of the activation loops of all human serine/threonine kinases with Ser or Thr at the −12 position of the activation loop.

Fig. S5. Sequence alignment of the kinase insert loops of FGFR kinases.

Fig. S6 Sequence alignment of C-terminal extensions of FGFR2 and related tyrosine receptor kinases.

Fig. S7. Sequence alignment of the N-terminal regions of CLK family members.

Fig. S8. Sequence alignment of the C-terminal tails of CaMKII family members.

Fig. S9. Sequence alignment of the C-terminal tails of EGFR, ErbB2, ErbB3, and ErbB4.

Table S1. LCK autophosphorylation experimental data.

Table S2. Fifty-eight domain-swapping PDB dimers in 17 distinct crystal forms from ProtCID.

Data file S1. Coordinates of kinase autophosphorylation complexes.

Data file S2. Models of asymmetric autophosphorylatin complexes of IGF1R and LCK.


Acknowledgments: We thank J. Wu for useful discussions and V. Modi for preparing the models of the asymmetric LCK and IGF1R dimers. Funding: This work was funded by NIH grants R01 GM084453 (R.L.D.) and R01 GM083025 (J.R.P.). Author contributions: R.L.D. conceived the project. R.L.D. and J.R.P. supervised the project. Q.X. and R.L.D. wrote the paper. Q.X., R.L.D., E.D., E.J.J., and S.K. performed calculations and analysis. K.L.M. and L.F. performed experiments on LCK and analyzed the experimental data. Competing interests: R.L.D. is a member of the Scientific Advisory Board of Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). Data and materials availability: The coordinates of the structures described in this paper are available in the Supplementary Materials.
View Abstract

Stay Connected to Science Signaling

Navigate This Article