Research ArticleCancer

Truncation- and motif-based pan-cancer analysis reveals tumor-suppressing kinases

See allHide authors and affiliations

Science Signaling  17 Apr 2018:
Vol. 11, Issue 526, eaan6776
DOI: 10.1126/scisignal.aan6776

Sorting through the noise

Genomic sequencing has been a boon to understanding, diagnosing, and treating cancer and other diseases, but it can be difficult to sort the “driver” mutations from natural variants and silent mutations, particularly in such heterogeneous samples as tumors. Hudson et al. used a combination of bioinformatics, structural modeling, and biochemistry to identify loss-of-function, driver mutations in kinases that as yet have been lost in the noise of sequencing data in the TCGA and CCLE databases. By focusing their analysis on the sequences surrounding the generally conserved catalytic domain, the authors identified a broadly tumor-suppressive kinome, which revealed critical loss-of-function mutations in the kinase MAP2K7 in stomach cancers. Restoring mutant gastric cancer cells with a functional kinase reduced their growth in culture models, indicating an avenue to explore further for clinical benefit.


A major challenge in cancer genomics is identifying “driver” mutations from the many neutral “passenger” mutations within a given tumor. To identify driver mutations that would otherwise be lost within mutational noise, we filtered genomic data by motifs that are critical for kinase activity. In the first step of our screen, we used data from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas to identify kinases with truncation mutations occurring within or before the kinase domain. The top 30 tumor-suppressing kinases were aligned, and hotspots for loss-of-function (LOF) mutations were identified on the basis of amino acid conservation and mutational frequency. The functional consequences of new LOF mutations were biochemically validated, and the top 15 hotspot LOF residues were used in a pan-cancer analysis to define the tumor-suppressing kinome. A ranked list revealed MAP2K7, an essential mediator of the c-Jun N-terminal kinase (JNK) pathway, as a candidate tumor suppressor in gastric cancer, despite its mutational frequency falling within the mutational noise for this cancer type. The majority of mutations in MAP2K7 abolished its catalytic activity, and reactivation of the JNK pathway in gastric cancer cells harboring LOF mutations in MAP2K7 or the downstream kinase JNK suppressed clonogenicity and growth in soft agar, demonstrating the functional relevance of inactivating the JNK pathway in gastric cancer. Together, our data highlight a broadly applicable strategy to identify functional cancer driver mutations and define the JNK pathway as tumor-suppressive in gastric cancer.


It was estimated that by the end of 2017, more than 1.6 million cancer samples would have been sequenced by next-generation sequencing (NGS) (1). The greatest challenge now lies in interpreting these data to dissect tumorigenic mechanisms and identify therapeutic targets. A major problem is that the data are often noisy with many inconsequential “passenger” mutations obscuring the detection of driver mutations (2, 3). Now that most cancer subtypes have been characterized by large-scale sequencing studies, the common drivers have been identified (4). However, the fact that many of the samples in these studies do not have an identifiable common driver suggests that there are a multitude of lower-frequency drivers that we struggle to detect above the noise (5, 6). The best method to discover more cancer drivers is under debate (7, 8). Should we continue sequencing more and more samples, or do we focus on functional studies? Currently, in silico methods are already widely used to attempt functional analysis of large genomic data sets (2, 9); however, these assessors are limited and may miss functional driver mutations (1013). Therefore, there is a need to improve genomic analysis to assist in unlocking the potential of these huge public data sets. By better linking existing knowledge of a protein’s function to the associated structural features, we can begin to functionally screen genomic data. Protein kinases are a well-characterized class of proteins with documented mechanisms linking structural motifs to protein function (1416). This makes them ideal candidates to develop motif-driven bioinformatics screens.

We initially produced a list of candidate tumor-suppressing kinases using the frequency of truncating mutations from The Cancer Genome Atlas (TCGA) and the Cancer Cell Line Encyclopedia (CCLE) that would abolish catalytic activity. Sequence alignment of this tumor-suppressing kinome enabled the identification of mutational hotspots in conserved regions. The top 12 mutational hotspots were all within motifs already known to be critical for kinase function—validating this approach of hotspot loss-of-function (LOF) mutation identification. Two novel hotspot residues were biochemically analyzed and found to also result in inactivation of the kinases harboring the mutation. We then developed a bioinformatics screen to identify mutations in the top 15 hotspot residues in 411 canonical kinases in the TCGA and CCLE data sets. Kinases were ranked by the frequency of these mutations. Alongside known tumor suppressors, such as serine/threonine kinase 11 (STK11; also known as liver kinase B1), we identified and validated a high incidence of mitogen-activated protein kinase kinase 7 (MAP2K7) LOF mutations in gastric cancer and highlight a tumor-suppressive role for MAP2K7 and the c-Jun N-terminal kinase (JNK) pathway in this cancer subtype. There has been great debate regarding the role of the JNK pathway in cancer as indicated by the numerous contradictory publications in the literature (17). The genetic makeup of the tumor and the tumor microenvironment will dictate an oncogenic or tumor-suppressive role for this pathway in various cancers (18). Our study provides a framework to assess the role of kinases in various cancers, and our results highlight a tumor-suppressive role for the JNK pathway in gastric cancer.


Alignment of a predicted tumor-suppressing kinome reveals mutational hotspots in conserved regions

To define hotspot, inactivating missense mutations, it was important to first establish a list of potential tumor-suppressing kinases, identified by the frequent incidence of truncation mutations. The top kinases identified were then used to identify highly conserved residues that are mutated at a high frequency and were likely to abolish catalytic activity, consistent with the kinase being a tumor suppressor. This screen was performed by locating the highly conserved Ala-Pro-Glu (APE) motif or its equivalent sequence in 411 “catalytically active” human kinases having the classical kinase domain motifs from Manning et al. (15) (table S1). The APE motif, which is a critical component of the kinase domain and acts to stabilize the C-lobe and mediate substrate interactions, was used as a conservative cutoff point for identifying truncating mutations that would abolish kinase activity (Fig. 1). Although there are additional critical kinase regions C-terminal to this conservative cutoff point, including the αF helix that functions to anchor both the catalytic and regulatory spines (C-spine or R-spine) of the kinase domain, the APE motif was chosen because it is highly conserved and can be easily aligned across the kinase family. When adjusting the cutoff to include the entire protein, there are 11 unique kinases included in the top 30; therefore, a majority of the top kinases would still be identified as tumor-suppressing kinases (Fig. 2 and table S2). The frequency of truncating mutations found within the TCGA and CCLE data sets occurring N-terminal to this cutoff for each kinase were length-corrected (Fig. 1) to produce a list of the top 30 kinases by truncating mutation density (Fig. 2 and table S3). This list is composed of known tumor suppressors, such as STK11 and MAP2K4 (19, 20), along with kinases that lack a previously published role in tumor suppression. As a proof of concept, we verified one such truncating mutation in MAP2K4 (E221*) that is found in a pancreatic cancer cell line (CAPAN1) to show that when wild-type signaling was restored, a significant decrease in anchorage-dependent and anchorage-independent colony-forming potential was observed (fig. S1, A to C). The kinase domains of these 30 tumor-suppressing kinases were then sequence-aligned to identify areas of high conservation. The TCGA and CCLE data sets were queried to capture all missense mutations occurring at each position of the aligned sequences, and a combined score based on conservation and mutational frequency was produced for each position (Fig. 2 and table S4). The top 12 residues identified by this combined score (with a conservation score above 20) were located within a motif known to be critical for kinase activity (Table 1) (21, 22). A total of five uncharacterized regions were identified within the top 20 hotspot residues (Table 1, highlighted blue or gray). Two of these five residues, APE − 6 and His-Arg-Asp (HRD) − 6 [six residues downstream of the APE and HRD motifs, respectively], were further validated to determine their effect on kinase catalytic activity, as discussed below.

Fig. 1 Schematic illustrating the screen of 411 kinases to produce the list of hotspot inactivating residues.

A pan-cancer analysis across TCGA and the CCLE data sets was performed to identify truncating mutations occurring N-terminal to the APE motif. The truncation mutation frequency was length-corrected to account for intrakinase variability because of the position of the kinase domain within the overall protein. The kinase domains (GxGxxG to APE motif) of the top 30 kinases determined by length-corrected truncation mutation frequency were sequence-aligned to identify conserved codons (between the 30 top kinases). This enabled requerying of TCGA and CCLE data sets for mutational frequency at each residue. The conservation and mutational scores were combined to rank each residue of the kinase domain to generate a list of hotspot residues for kinase inactivating mutations.

Fig. 2 Output of the truncation mutation screen with the top 30 kinases illustrating a region of kinase sequence analyzed for conservation and number of kinases harboring missense mutations at each position.

The top 30 kinases found from the truncation mutation screen, ranked by descending length-corrected truncation mutation score (Tr. score). A portion of the kinase sequence alignment (DFG to APE motif) is shown as an example to illustrate the tumor-suppressing hotspot mutation screen. The vertical gray bar highlights a break in the alignment shown, because this region had very poor sequence alignment. The number of kinases mutated at each position is graphed along the top of this alignment, whereas the conservation score for each residue is graphed along the bottom. The conservation score corresponds to the number of kinases with the most common amino acid at that position. Residues with substantial conservation (>20) are shaded blue. The full alignment, performed from GxGxxG to APE, is shown in table S4.

Table 1 Tumor-suppressing hotspot residues.

Tumor-suppressing hotspot residues, identified through our motif screen, ranked by their total score (the product of the conservation and mutational scores). Residues with conservation scores below 20 were excluded. Blue shading indicates validated new hotspot LOF residues, and gray shading indicates new residues that are predicted to be LOF.

View this table:

Mutations of a hinge residue between the activation and P + 1 loops abolish catalytic function

The APE − 6 residue is a glycine residue found to be highly conserved within the 411 kinases used in this study (81% conservation across 411 kinases). Structural modeling demonstrates that this residue lies at a hinge point between the activation loop and the P + 1 loop, which could allow the typical fluctuations of the activation loop between its active and inactive conformations (Fig. 3A). The small size of the glycine amino acid that occupies this position in a large number of kinases allows for the flexibility of this hinge region. Mutations that limit flexibility of the hinge region may impair catalytic activity. Alternatively, this glycine helps form the P + 1 recognition pocket, and mutations at this residue will affect this pocket to alter substrate recognition and binding, which could also lead to LOF. Molecular dynamics (MD) simulations of one such mutation at this conserved glycine in MAP2K4 (G265D) showed a decreased movement of the activation loop compared to that of the wild-type kinase (Fig. 3B). Transient overexpression of MAP2K4 mutations, G265D and G265C, both seen in cancer samples, demonstrates reduced phosphorylation of the canonical JNK pathway equivalent to a kinase-dead construct (Fig. 3C). When wild-type MAP2K4 was stably reexpressed in CAL51 cells harboring the G265D mutation (fig. S2D), a reduction in colony-forming potential was observed in both two- and three-dimensional assays (Fig. 3, D and E) compared to the parental cell line, in which no significant change was observed. These data indicate that the G265D mutation in MAP2K4 is a substantial LOF driver mutation in the CAL51 cell line. Corresponding glycine mutations observed in mitogen-activated protein kinase kinase kinase 13 (MAP3K13; G315D) and protein kinase Cθ (PRKCQ; G541V) found in other cancer samples also exhibited reduced kinase activity (Fig. 3, F and G).

Fig. 3 Mutations identified in a hinge glycine at position APE − 6 are inactivating.

(A) Position of the conserved APE − 6 glycine at a hinge point of the activation loop (shown in INSR kinase domain). Active kinase conformation is shown in light blue [Protein Data Bank (PDB) ID: 1IR3], and inactive kinase conformation is shown in dark blue (PDB ID: 1IRK). DFG and APE motifs are shown as sticks, and glycine (G) is shown in sticks and spheres. (B) MD simulations predict the amount of movement observed for the activation loop in MAP2K4 (PDB ID: 3ALN). Root mean square fluctuations (RMSFs) of each residue is shown graphically (top) and structurally (bottom), with width and color of the ribbon showing corresponding level of movement. WT, wild type; EV, empty vector; KD, kinase deficient. (C) Representative Western blot assessing the functional effect of mutations in the conserved glycine of MAP2K4 overexpressed in human embryonic kidney (HEK) 293T cells for 48 hours. (D and E) Two-dimensional (2D) (D) and 3D Matrigel–embedded (E) colony formation assays in the CAL51 cell line harboring MAP2K4 G265D and tetracycline-inducible expression of WT MAP2K4 (MKK4 inducible) or not (parental), with (+) or without (−) tetracycline (tet). Quantification of these experiments shown as bar charts, mean ± SEM from three independent experiments; ***P < 0.001 by a two-tailed Student’s t test. (F and G) Western blot analysis to assess the functional effect of mutations in the conserved APE − 7 glycine within MAP3K13 (F) and PKCθ (G) by overexpression of Flag-tagged constructs in HEK293T cells. Phosphorylation of downstream targets was measured: JNK (F) and MARCKS and autophosphorylation of PKC (G).

Mutations at HRD − 6 abrogates kinase activity

Structural modeling of the HRD − 6 residue highlights its close proximity to the R-spine anchoring residue within the αF helix (Fig. 4A). The R-spine is formed as the kinase becomes active and is critical for catalytic activity. MD simulation of a mutation (H131R) at the HRD − 6 position within death-associated protein kinase 3 (DAPK3) shows an increased amount of movement around three of the four R-spine residues, with RS1 at residue position 79 showing a large increase in movement. This increased movement could suggest a destabilization of the R-spine, which would likely result in reduced catalytic activity. Assessment of cancer-associated mutations at this position in DAPK3 (H131R) and BRAF (H568D) demonstrates a loss of kinase catalytic activity similar to that observed for kinase-dead mutants (Fig. 4, B and C). These results highlight that combining conservation at a given residue with mutational frequency will be a successful approach to identify low-frequency functional mutations across cancer genomic databases.

Fig. 4 Mutations identified in HRD − 6 are inactivating.

(A) The HRD − 6 residue (red sticks) lies in close proximity to the R-spine–anchoring residue within the αF helix (orange sticks; marked by “D” in the graph). R-spine shown as yellow spheres (structure used: PKA, PDB ID: 1ATP). RMSFs from MD simulations (bottom) highlight increased movement around R-spine residues RS1, RS2, and RS4 in DAPK3 H131R (PDB ID: 3BHY), which results in an altered movement of the activation and P + 1 loops. (B) In vitro kinase assay assessing autophosphorylation activity by Flag-tagged DAPK3 H131R. WCL, whole cell lysate. (C) Overexpression of hemagglutinin (HA)–tagged BRAF constructs within HEK293T cells shows decreased kinase activity observed for BRAF H568D mutation, as inferred from downstream phosphorylation of MEK (pMEK). IP, immunoprecipitation.

Pan-cancer analysis of mutations found in critical motifs highlights a prevalence of LOF mutations in MAP2K7 in gastric cancer

Having identified and validated novel hotspot residues for kinase inactivation, the top 15 mutational hotspots (13 known critical residues and 2 novel residues validated above) were used in a functional screen to identify novel tumor-suppressing kinases. A pan-cancer analysis was performed by querying the TCGA and CCLE data sets for mutations located in these 15 residues in 411 kinases. The kinases were then ranked by the mutational frequency of these 15 regions (Fig. 5A). BRAF was identified as the top hit with 39 mutations throughout both data sets, followed by STK11 (14 mutations), myosin IIIA (MYO3A; 13 mutations), ephrin type-B receptor 1 (EPHB1; 12 mutations), and MAP2K7 (11 mutations). EPHB1 and STK11, as well as other top hits such as checkpoint kinase 2 (CHEK2), have previously been demonstrated to play a tumor-suppressive role in different cancer subtypes (20, 2325). MAP2K7 was selected for further investigation because 6 of the 11 detected mutations occurred in a single cancer subtype, gastric adenocarcinoma (Fig. 5B). In addition, MAP2K7 mutations and deletions occur in 7% of cases in the TCGA gastric adenocarcinoma series, with 40% of the mutated cases having more than one mutation, suggesting that both alleles are affected. Transient overexpression of the four gastric MAP2K7 mutations that occurred in conserved motifs validated that three of these mutations are LOF with regard to JNK pathway activation (Fig. 5C). Furthermore, in a gastric cancer cell line that harbors an inactivating MAP2K7 mutation (IM95, D290fs; fig. S2A), reconstitution of wild-type signaling (fig. S2C) resulted in significantly decreased anchorage-dependent and anchorage-independent colony-forming potential (Fig. 5, D and E).

Fig. 5 Critical codon screen identifies MAP2K7 as a tumor suppressor in gastric cancer.

(A) Table ranking kinases based on the number of mutations observed within top 15 critical codons (Table 1). (B) Schematic highlighting mutations identified within key motifs of MAP2K7. Mutations are indicated by yellow spheres, those colored blue are found in gastric cancer, and those colored half blue have half of the mutations observed in gastric cancer. (C) Western blot analysis of overexpressed Flag-tagged MAP2K7 mutant constructs in HEK293T cells to assess the functional effects of mutations found in gastric cancer. Downstream activation of phosphorylated JNK (pJNK) is assessed. (D and E) 2D (D) and 3D (E) colony formation assays using IM95 gastric cell line endogenously harboring LOF mutant MAP2K7 and tetracycline-inducible expression of WT MAP2K7 (MKK7 inducible) or not (parental), with (+) or without (−) tetracycline. Quantification of these experiments shown as bar charts, data are means ± SEM from three independent experiments; *P < 0.05, ***P < 0.001 by a two-tailed Student’s t tests.

When other genes in the JNK pathway are considered, about 22% of gastric cancers harbor alterations in MAP2K7, MAP2K4, mitogen-activated protein kinase 8 (JNK1), mitogen-activated protein kinase 9 (JNK2), mitogen-activated protein kinase 10 (JNK3), JUN, or activating transcription factor 2 (ATF2) with a high degree of mutual exclusivity, suggesting a significant role for LOF in the JNK pathway in gastric carcinogenesis (fig. S2D). Reexpression of wild-type JNK1 in a cell line harboring a JNK1 LOF mutation (G177*; fig. S2, B and C) also showed significantly decreased anchorage-dependent growth (fig. S1E). Together, these data indicate that loss of signaling through the JNK pathway is important for gastric cancer tumorigenesis. Our approach to identify MAP2K7 as an important tumor suppressor in gastric cancer demonstrates a substantial advance compared to other mutational bioinformatics pipelines. For example, using mutational density ranking for stomach adenocarcinoma reveals MAP2K7 as the 636th highest driver gene. If more complex bioinformatics methods are used, such as MutSigCV filtering, then MAP2K7 is ranked as the 38th driver gene in gastric carcinomas (table S6). This method moves MAP2K7 further up the candidate list and supports our proposal of it as a frequent driver gene; however, it is unlikely to be investigated as a driver gene at this position.


We present an approach to screen genomic data using functional knowledge of the kinase domain structure and sequence. Our first step was to use a strategy to identify somatic point mutations that will abolish the catalytic activity of a kinase and, through a pan-cancer screen, identify potential tumor-suppressing kinases enriched in LOF mutations. Following this filtering step, we identified a number of kinases, such as MAP2K7, that we would predict to harbor frequent LOF mutations. Our final step was to validate our approach by demonstrating biochemically and through functional assays that MAP2K7 is a tumor-suppressing kinase in gastric cancer. To put this challenge in perspective, at the time of conducting our screen, there were 43,212 missense mutations in the 411 canonical kinases included in our screen. From our conserved motif screen, which included 15 highly conserved and mutated codons, we identified 921 mutations in these 411 kinases, allowing us to pinpoint novel kinases enriched in LOF mutations. Our strategy has not only highlighted many additional kinases to be explored as tumor suppressors but also laid the groundwork for a broadly applicable approach that can be used to identify novel tumor suppressors in other enzyme families, such as ubiquitin ligases.

The initial truncation screen identified a list of candidate tumor-suppressing kinases and the appearance of two well-known tumor-suppressing kinases (STK11 and MAP2K4) occurring as the top two hits for our screen that validated this approach (19, 20). To identify residues that are frequently mutated and will result in LOF, we aligned the sequences of the top 30 tumor-suppressing kinases, and each residue was assigned a mutational and conservational score. Residues were then ranked by their combined score to give a list of LOF hotspot residues. Thirteen of the top 15 residues identified in this way occurred in well-known motifs critical for kinase function, validating this approach as a method for identifying highly functional mutations. The two new residues (HRD − 6 and APE − 6) were biochemically validated to be critical for kinase function, because mutations at these residues abolished kinase activity. Having identified 15 residues within the kinase domain that were either known to be critical to kinase function or validated to result in an LOF phenotype when mutated, a screen to identify kinases that harbor lower-frequency genetic drivers was performed. Because of the small target area being screened in each sample, a large data set was required; therefore, a pan-cancer analysis was performed using the TCGA and CCLE data sets. This analysis identified all mutations occurring in these 15 residues in 411 kinases across all cancer subtypes. Given that the critical codon screen was derived from the truncation screen data, some overlap could be expected. However, there were also marked differences between the results of the two screens, with the oncogene BRAF featuring as the top kinase in the critical motif screen, whereas only ranking 88th of the 411 kinases in the truncation screen. Kinase-dead BRAF can paradoxically act as a scaffold to activate CRAF, resulting in activation of the mitogen-activated protein kinase kinase (MEK)/extracellular signal–regulated kinase pathway (23). It is likely that the majority of truncation mutations interfere with this oncogenic mechanism, resulting in fewer truncation mutations being observed in the BRAF oncogene. With this consideration in mind, comparing the results of our two screens may help to identify mechanisms by which kinase-dead mutations paradoxically activate signaling pathways by a similar mechanism, rather than resulting in LOF of the canonical signaling pathway. Therefore, the discrepancies that exist for the two tumor-suppressing lists can shed light on exciting biology with regard to inactivating mutations that activate a pathway.

Finally, the screens we developed here identified MAP2K7as one of the top hits when identifying mutations in codons critical for kinase function. The majority of these mutations (6 of 11) were observed in gastric cancer samples, highlighting inactivating MAP2K7 mutations as gastric cancer drivers (26). Furthermore, when genomic data from the cBioPortal were interrogated, other JNK pathway components, including JNK, JUN, and ATF2, were observed to be mutated or deleted in gastric cancer. The JNK pathway dictates key processes regulating cancer development, but its role in tumorigenesis remains controversial, with research showing it acting as both a pro-oncogene or tumor suppressor [reviewed in (27)]. This dual nature of the JNK signaling pathway has been shown to be highly dependent on the cellular context and extracellular environment (18). Notably, our tumor-suppressive screens also identified LOF mutations in MAP2K4 and MAP3K13, which are upstream JNK pathway activators. Our experimental data highlight that inactivation of the JNK pathway is important for promoting tumorigenic phenotypes in gastric cancer, which offers insight into a molecular mechanism that could affect up to 22% of gastric cancer patients. Furthermore, our genetic data highlighting MAP2K7, MAP2K4, and MAP3K13 mutations in different cancer subtypes could suggest that loss of signaling in the JNK pathway promotes tumorigenesis in a number of human tumor types.

Gastric cancers carry a high number of mutations per sample, although few specific driver mutations are known (26, 28). By focusing on mutations with a high probability of disrupting kinase function, our screen helps remove mutational noise caused by passenger mutations. Using the MAP2K7 observation as a prompt to query other known pathway members highlights a prominent role for JNK pathway inactivation in almost a quarter of all gastric cancers. It is proposed that sequencing more cancer samples will eventually cause lower-frequency drivers to become more apparent (4). Although this argument may be true, it is clear from our experience with analysis of the TCGA gastric cancer data set, where greater than 400 cancers have been fully sequenced and over 500 genes have a mutational frequency over 5%, that this process is greatly facilitated when precise functionally derived algorithms are integrated into the pipeline (26).

In conclusion, we have developed and validated two approaches that filter freely available NGS data with functional consideration to identify novel genetic drivers that would otherwise be lost within the mutational noise. Our first approach used truncating mutation frequency to identify tumor-suppressive kinases and, from these, identified novel residues critical for kinase function. Using these “critical codons,” we identified kinases harboring high numbers of LOF mutations, revealing more tumor-suppressing kinases with additional information on tumor type enrichment. Together, these screens not only provide novel tumor suppressors to investigate but also highlight novel residues of critical importance in kinase activation. This approach, using the human kinome, provides a wealth of information on inactivated kinases in cancer and could be expanded to identify both gain-of-function and LOF mutations within other proteins containing conserved protein domains.


Truncation mutation screen

R scripts (in data file S1) were prepared to identify all kinase truncating mutations within the TCGA (data file S1, Script1) and CCLE (data file S1, Script2 and Script3) databases occurring before the end of the kinase domain. In brief, the APE motif was identified in the GenBank sequences of 411 catalytically active kinases with conventional VAIK, HRD, and DFG motifs identified in Manning et al. (15) (table S1). The location of the E of the APE motif was defined as the C-terminal limit of a functional kinase domain. Mutational data from TCGA and CCLE were cross-referenced with each kinase APE location to record all truncating mutations occurring N-terminal to this location. Mutational frequencies were length-corrected using the mean transcript length (between the shortest and longest transcripts) of each kinase to the E of the APE motif to account for interkinase differences. The alternate list made is of the top 30 kinases based on alteration of the cutoff at APE to include less-conservative limit including using the whole protein (table S2). A top 30 tumor-suppressing kinase list was constructed by ranking kinases by descending length-corrected score (Fig. 2, fig. S3, and table S3), and the kinase domain sequences of these 30 kinases were identified (data file S1). Where a kinase was annotated as having more than one kinase domain, the domain sequence with the motif configuration most closely resembling a classical kinase domain was selected for analysis.

Sequence alignment and residue conservation and mutation scoring

The kinase domain sequence from the first glycine of the GxGxxG loop to the APE motif was sequence-aligned for each of the top 30 kinases using the Strap Alignment Tool. The conservation score was determined as the number of kinases out of the top 30 that harbored the most common amino acid at that position. If a location had more than five kinases without a corresponding amino acid (that is, this region is missing from five or more kinases), then the conservation score was calculated as zero. All loci with a conservation score below 20 were removed from the final analysis to leave highly conserved locations. The aligned sequence locations were cross-referenced with the mutational data from TCGA and CCLE to identify the mutations at each position, and the mutational score was calculated as the number of kinases with a mutation at that position. Multiplying the mutational score by the conservation score produced a combined score to rank each residue in the kinase domain sequence. The scores and alignment for all residues with at least one mutation are shown in table S4, and the frequency distribution of combined scores for each residue with a least one mutation is shown in fig. S4.

Kinase motif–based screen

R scripts were written to identify any mutations from the TCGA (data file S1, Script5) and CCLE (data file S1, Script6) databases occurring within the hotspot residues identified above (Script 4). In brief, the top 15 hotspot residues (defined by combined score above) were located within the GenBank sequences for each kinase. Mutational data from the TCGA studies obtained through the CGDS-R package (table S5) and CCLE were cross-referenced with each kinase to identify any point mutations occurring within critical kinase motifs. A ranked list was constructed from the number of mutations observed per kinase (data file S1, Script 7).

Structural modeling and MD simulations

Homology models of wild-type and mutant MAP2K4 were created using Modeller 9.16 from Protein Data Bank (PDB) ID: 3ALN. MD simulations were performed using GROMACS version 5.0 with the GROMOS96 53a6 force field parameter set. All titratable amino acids were assigned their canonical state at physiological pH, short-range interactions were cut off at 1.4 nm, and long-range electrostatics were calculated using the particle mesh Ewald summation (29). Dispersion correction was applied to energy and pressure terms accounting for truncation of van der Waals forces, and periodic boundary conditions were applied in all directions. Protein constructs were placed in a cubic box of 100 nM NaCl in simple point charge water with at least 1-nm distance between the protein construct and the box edge in all directions. Neutralizing counter ions were added, and steepest decent energy minimization was performed, followed by a two-step NVT/NPT [constant number (N), volume (V), temperature (T); constant number (N), pressure (P), and temperature (T)] equilibration. Both equilibration steps maintained a constant number of particles and temperature, NVT equilibration was performed for 100 ps maintaining a constant volume, followed by 10 ns of NPT equilibration maintaining a constant pressure. Temperature was maintained at 37°C by coupling protein and nonprotein atoms to separate temperature coupling baths (30), and pressure was maintained at 1.0 bar (weak coupling). All position restraints were then removed, and simulations were performed for 400 ns using the Nose-Hoover thermostat (31) and the Parrinello-Rahman barostat (32). Root mean square fluctuation analysis compared the standard deviation of the atomic position of each α-carbon in the trajectory, fitting to the starting structure as a reference frame. Root mean square deviation analysis compared the structure of specified groups of residues at each time point of a trajectory with the reference starting structure. Images were created using PyMol version

Plasmids and transfections

BRAF and MAP3K13 complementary DNA were prepared from RNA extracted from human embryonic kidney (HEK) 293T cells. PRKCQ was bought in the pENTR vector (Ultimate Human ORF Library, Life Technologies), MAP2K4 and MAP2K7 were bought in pCMV6-Entry vectors (Origene), and MLK4 was purchased in pReceiver-M12 FLAG vector (GeneCopoeia). Primers containing attB flanking sites were used to amplify up the constructs before they were inserted into the pDONR-221 vector using the BP clonase reaction. The ABL1 constructs were purchased in the pDONR-223 vector. From here, the Gateway system was used for cloning into a pDEST-FLAG vector created by E. Trotter from the pReceiver-M12 plasmid (GeneCopoeia). 3X-FLAG DAPK3 vector was provided by T. Haystead (Department of Pharmacology and Cancer Biology, Duke University Medical Center).

Mutants were created using the Quikchange Site-directed Mutagenesis II Kit (Agilent Technologies) using the manufacturer’s protocol. Kinase-dead mutants were BRAF (K483M), DAPK3 (K42M), MAP2K4 (K131M), MAP2K7 (K148M), MAP3K13 (K195M), MLK4 (K151M), and PRKCQ (K409M). All sequences were confirmed using Sanger sequencing. HEK293T cells or CAL51 cells (for MAP2K4 transfections) were seeded into 12-well plates (standard transfections) or 6-well plates (immunoprecipitations) and transiently transfected the following day using either Attractene (QIAGEN) for the HEK293T cells or Lipofectamine 2000 (Thermo Fisher Scientific) for CAL51 cells according to the manufacturer’s protocol.

Protein lysate preparation and immunoblots

Cells were lysed on ice after 24 hours using Triton X-100 Cell Lysis Buffer (Cell Signaling) supplemented with a protease inhibitor tablet (Roche). Lysates were either resolved on SDS–polyacrylamide gel electrophoresis (PAGE) gels followed by Western blotting or used in an in vitro kinase assay (details below). Primary antibodies used were as follows: Flag M2 and α-tubulin (Sigma-Aldrich); MAP2K4, MAP2K7, pJNK (T183/Y185), pMARCKS (S152/S156), pPKC (S676), pThr, and p-cJun (S73) (Cell Signaling Technology). Mouse or rabbit horseradish peroxidase–conjugated secondary antibodies were used (Cell Signaling Technology). All Western blots are representative of at least three independent experiments.

In vitro kinase assays

Cell lysates from DAPK3 transfections were incubated with anti-Flag M2 affinity gel (Sigma-Aldrich) for at least 2 hours. Beads were washed with lysis buffer and kinase buffer (Cell Signaling Technology), and a kinase assay was performed in the presence of 200 μM ATP (adenosine 5′-triphosphate) at 30°C for 30 min to assess autophosphorylation. After the addition of 4× reduced SDS sample buffer, proteins were resolved by SDS-PAGE and analyzed by Western blotting.

Generation of MAP2K4, MAP2K7, and JNK1 tetracycline-inducible cell lines

Parental CAL51 (MAP2K4), CAPAN1 (MAP2K4), IM95 (MAP2K7), and NUGC3 (JNK1) were used to generate cells with tetracycline-inducible expression of wild-type plasmids (cloned into pLenti/TO/V5-DEST vector), and pLenti3.3/TR (for tetracycline repressor expression) was transfected into 293FT cells using Lipofectamine 2000 to generate a lentiviral stock. Cells were transduced with lentiviral stocks and cell lines generated by antibiotic selection [blasticidin (Invitrogen) and geneticin (Gibco)]. Tetracycline (Invitrogen) was used to induce expression of wild-type MAP2K4 (CAL51 and CAPAN1), wild-type MAP2K7 (IM95), or wild-type JNK1 (NUGC3).

Anchorage-dependent colony formation assay

Cells were seeded at about 100 cells per well in a six-well plate format. The following day, tetracycline was added and cells were left to grow for 3 weeks with media changed every 2 to 3 days. Colonies formed and were fixed with ice-cold methanol, stained with 0.5% crystal violet (Sigma-Aldrich) solution made in 25% methanol. Wells were thoroughly washed and air dried. For quantification, 2 ml of 10% acetic acid was added to each well, incubated for 20 min with shaking, and absorbance values were read at 595 nm.

Anchorage-independent colony formation assay

Anchorage-independent colony formation assays were performed with the CAPAN1, NUGC3, and IM95 cell lines. Plates were initially coated with 0.6% soft agar with or without tetracycline containing no cells. Cells were then seeded at about 10,000 cells per well in a six-well plate format in 0.35% soft agar with or without tetracycline. Medium was added (with or without tetracycline) and changed every 2 to 3 days. After 3 weeks, cells were stained using 0.05% crystal violet (Sigma-Aldrich) solution made in 25% methanol.

Three-dimensional Matrigel–embedded growth assay

Three-dimensional Matrigel–embedded growth assays were performed with the CAL51 cell line, because these cells did not grow successfully in the anchorage-independent growth assay. Six-well plates were precoated with a thin layer of Engelbreth-Holm-Swarm murine sarcoma (EHS) matrix (Corning) before cells were seeded at about 50,000 cells per well in EHS. Medium (2 ml) with or without tetracycline was added. Medium was changed every 2 to 3 days, and after 10 days, cells were stained using 0.05% crystal violet (Sigma-Aldrich) solution made in 25% methanol.

Statistical analyses

All statistical significance values were calculated using a two-tailed Student’s t test.


Fig. S1. Truncation mutation screen identifies tumor-suppressing kinases.

Fig. S2. MAP2K7 pathway components are tumor-suppressing in gastric cancer.

Fig. S3. Frequency histogram of truncation mutation scores.

Fig. S4. Frequency histogram of combined scores for each codon in the kinase domain of the top 30 kinases.

Table S1. List of 411 “active” kinases screened in this study.

Table S2. Effect of moving truncation mutation cutoff on the top 30 kinases.

Table S3. Truncation mutation scores for all 411 kinases.

Table S4. Alignment of 411 tumor-suppressing kinases and mutational and conservation score at each residue.

Table S5. TCGA studies from which mutational data were acquired.

Table S6. Mutational frequency of all genes in the TCGA gastric adenocarcinoma data set.

Data file S1. R Script files.


Acknowledgments: We thank A. Newton and T. Haystead for their donation of some of the DNA constructs used in this study. We thank A. Kane, Scientific Publications, Graphic and Media, Frederick National Laboratory for Cancer Research for assistance with figure preparation. Funding: This research was fully supported by Cancer Research UK and the National Cancer Institute. Author contributions: A.M.H., N.L.S., and J.B. designed the study; A.M.H., N.L.S., and A.J.F. wrote the computational script; N.L.S. performed the computational simulations; N.L.S., C.L., E.T., G.K., P.B.-K., and M.H. performed the experiments; A.M.H., N.L.S., E.T., C.W., S.F., C.J.M., and J.B. analyzed the data; and A.M.H., N.L.S., and J.B. wrote the manuscript. Competing interests: The authors declare that they have no competing financial interests. Data and materials availability: All DNA plasmids mentioned in this paper will be made available to the research community through either AddGene or direct requests to the laboratory. Scripts and data files used in the computational analysis can be accessed through the GitHub repository found at

Stay Connected to Science Signaling

Navigate This Article