Research ResourceCancer

Integration of protein phosphorylation, acetylation, and methylation data sets to outline lung cancer signaling networks

See allHide authors and affiliations

Science Signaling  22 May 2018:
Vol. 11, Issue 531, eaaq1087
DOI: 10.1126/scisignal.aaq1087

Multiomics analysis of lung cancer

Various “omics” analyses of cells provide insight into cellular health and behavior. Phosphoproteomic analysis has informed much of our current knowledge of cell signaling networks and facilitated drug development. Less appreciated are the inputs of other posttranslational modifications (PTMs). Grimes et al. integrated genomic, proteomic, phosphoproteomic, acetylomic, and methylomic data from lung cancer cells versus those in normal lung tissue and explored various regulatory patterns. The authors found that many PTMs are exclusive, in that as phosphorylation increased, acetylation decreased; that the drug geldanamycin broadly alters the PTM landscape and, thus, has effects far beyond its target; and that RNA binding proteins appear to be critical effectors of many signaling paths. These networks may inform new drug development for lung cancer patients and exemplify how cell signaling is regulated by far more extensive PTM networks than was previously appreciated.


Protein posttranslational modifications (PTMs) have typically been studied independently, yet many proteins are modified by more than one PTM type, and cell signaling pathways somehow integrate this information. We coupled immunoprecipitation using PTM-specific antibodies with tandem mass tag (TMT) mass spectrometry to simultaneously examine phosphorylation, methylation, and acetylation in 45 lung cancer cell lines compared to normal lung tissue and to cell lines treated with anticancer drugs. This simultaneous, large-scale, integrative analysis of these PTMs using a cluster-filtered network (CFN) approach revealed that cell signaling pathways were outlined by clustering patterns in PTMs. We used the t-distributed stochastic neighbor embedding (t-SNE) method to identify PTM clusters and then integrated each with known protein-protein interactions (PPIs) to elucidate functional cell signaling pathways. The CFN identified known and previously unknown cell signaling pathways in lung cancer cells that were not present in normal lung epithelial tissue. In various proteins modified by more than one type of PTM, the incidence of those PTMs exhibited inverse relationships, suggesting that molecular exclusive “OR” gates determine a large number of signal transduction events. We also showed that the acetyltransferase EP300 appears to be a hub in the network of pathways involving different PTMs. In addition, the data shed light on the mechanism of action of geldanamycin, an HSP90 inhibitor. Together, the findings reveal that cell signaling pathways mediated by acetylation, methylation, and phosphorylation regulate the cytoskeleton, membrane traffic, and RNA binding protein–mediated control of gene expression.


Proteins covalently attach moieties such as phosphate, methyl, and acetyl groups to other proteins to regulate cell signaling events crucial for all cellular physiological functions, including differentiation, proliferation, cell movement, and cell death. In many human diseases, these cell signaling mechanisms go awry (1). Much progress has been made in identifying proteins that attach (writers), recognize (readers), and remove (erasers) different posttranslational modifications (PTMs) (2), and our understanding of these mechanisms has led to the development of targeted therapeutics, such as cancer treatments that target kinases (3), histone deacetylases (HDACs) (4), and bromodomains, which are the associated structures of readers of acetylation (5, 6).

The canonical view of PTMs is that protein acetylation and methylation are mainly involved in epigenetic chromatin modifications, whereas signal transduction pathways are mainly regulated via phosphorylation. This view dominated the field mainly because different modification types have been typically studied independently. However, there is now increasing appreciation that histones are not the only class of proteins affected by acetylation and methylation (4, 7). This means that drugs targeting enzymes responsible for regulating methylation and acetylation of histones may have a much broader effect. It is therefore important to understand in detail molecular signaling pathways that involve these PTMs. In addition, further understanding of cell signaling pathways will provide new opportunities for therapeutic intervention. For example, multiple PTMs are involved in signaling cascades that are linked to neurological disorders (8) and the innate immune response (9).

Here, we sought to understand molecular signaling pathways that are active in lung tumor–derived cell lines by simultaneously examining patterns of protein phosphorylation, methylation, and acetylation on a large scale. The data we collected provide a unique opportunity for an integrated study of cell signaling networks involving these three different PTMs. Our goal is to define the relationship between kinases, phosphatases, acetyltransferases, deacetylases, methyltransferases, and demethylases. Our approach was to integrate two different kinds of information: clustering based on statistical relationships among various types of PTMs and protein-protein interaction data from public databases (10).

The sequencing of the human genome provided a parts list of many of the molecular constituents that make up the human cell, but we still have a limited understanding about how these parts interact and are organized into functional complexes to regulate the cell signaling pathways that govern molecular biological processes. Protein-protein interaction (PPI) databases attempt to catalog how proteins interact with one another and how these interactions are organized into networks (1116). Networks are useful to model signal transduction pathways (1722), but these models are difficult to understand and validate because such models are complex, integrate information from a variety of sources, and commonly are not specific to a particular tissue or cell type in the proper context. They also suffer from biases, such as literature focus biases or experimental platform biases. Here, we hypothesized that empirical information about PTMs may be used to constrain the complexity of cell signaling models to better understand the system under study (10, 23). We reasoned that clustering of PTMs under different conditions reveals patterns specific to the system, in this case lung cancer cell lines. Clusters identified by statistical relationships that contain proteins known to interact likely represent functional cell signaling pathways. This data-driven modeling approach (24) is somewhat analogous to using gene coexpression data to identify cell signaling pathways in yeast (25). However, PTMs are much closer to cell signaling events than mRNA coexpression, which makes our approach more likely to identify cell signaling events upstream of gene expression changes (10).

We used immunoprecipitation with antibodies specific for tyrosine and serine/threonine phosphorylation, lysine acetylation, and lysine and arginine methylation, coupled with mass spectrometry. This technique was originally described to study only phosphorylated proteins (10, 2628). Mass spectrometry determines the identity of peptides with modified residues and the relative amounts of these peptides. This method has a low false-positive rate, but a high false-negative rate. This is mainly due to stochastic sampling of peptides for detection and subsequent fragmentation (2933). This limitation of mass spectrometry introduces two challenges: (i) The resulting data have a large number of missing values, and (ii) comparing experimental conditions is difficult when peptides are stochastically detected. Tandem mass tag (TMT) experiments surmount the latter issue by labeling samples with different isotopic mass tags and mixing them, so that when a peptide is selected by the mass spectrometer, different tags are resolved upon subsequent fragmentation (34). This allows direct comparison of peptide amounts in multiplexed samples and accurate calculation of treatment-to-control ratios within a TMT multiplex. When comparing two or more TMT runs, however, there is still a large fraction of missing values because of the stochastic nature of detection. Therefore, the TMT data are different from the gene expression data derived from microarray or RNA sequencing experiments, and thus require special considerations for data analysis (text S1). Building on previous work (10, 23), we evaluated different methods for handling these data, which can be grouped into three approaches: imputing missing values, pairwise-complete, and penalized matrix decomposition (PMD). Statistical relationships (Spearman correlation, Euclidean distance) were embedded into a reduced dimension representation using t-distributed stochastic neighbor embedding (t-SNE), which is effective in identifying clusters in a wide variety of nonlinear real-world and biological data (10, 23, 3538). PMD followed by an additional t-SNE step further resolved large clusters to produce a coclustered correlation network (CCCN) for strongly associated modifications. This clustering method was used to decipher cell signaling pathways by filtering PPI edges to retain only interactions between proteins whose modifications coclustered, creating a cluster-filtered network (CFN). The CFN and CCCN data structures were further evaluated in a number of ways and then used to outline cell signaling pathways, starting from the target proteins that are affected by drugs. The analysis highlights molecular pathways defined by highly correlated clusters of PTMs and revealed antagonistic relationships among PTMs that were negatively correlated. Signaling pathways that are regulated by multiple PTMs involve heat shock, RNA binding, cytoskeletal, and membrane trafficking proteins that are linked to canonical tyrosine kinase pathways.


Clustering PTMs

Our goal was to use large-scale mass spectrometry data to integrate cell signaling pathways that involve protein phosphorylation, methylation, and acetylation. We hypothesized that proteins that interact with one another and are posttranslationally modified in patterns identified by close statistical relationships likely represent functional cell signaling pathways. We therefore combined PTM clustering information with PPI databases to create the CFN. The CFN only retained PPIs whose modifications coclustered (Fig. 1A). Thus, the resulting network only retained interactions between proteins that are also supported by protein modification data (10).

Fig. 1 Data acquisition and analysis.

(A) A CFN was created by using a machine learning, pattern recognition algorithm (t-SNE) to identify which PTMs clustered together, filtering PPIs (red edges) to retain only those between proteins whose PTMs coclustered. (B) Outline of TMT mass spectrometry coupled with immunoprecipitation (IP) using modification-specific antibodies. Phospho-Ser/Thr (pS/T) peptide IP was accomplished in multiple steps with AGC/PSD-family kinase substrate, AKT (v-akt murine thymoma viral oncogene homolog 1) substrate, AMP kinase substrate, and ATM/ATR (ataxia telangiectasia mutated/ATM related) substrate antibodies (see Materials and Methods). Phosphopeptides were also further purified on a TiO2 column (81). m/z, mass/charge ratio; LCMS/MS, liquid chromatography–mass spectrometry. (C) Bird’s eye view of the CCCN derived from PTMs from 15 independent experiments, each with six multiplex samples, from comparison of 45 lung cancer cell lines [12 derived from from small cell lung cancer (SCLC) and 33 from non–small cell lung cancer (NSCLC)] to normal lung tissue, and selected cell lines treated with anticancer drugs. This disconnected network includes threshold-filtered Spearman correlations among t-SNE–clustered PTMs (yellow edges are positive correlations; blue are negative correlations). Also shown are negative correlations among different modification types within the same protein, which are useful for revealing antagonistic relationships among PTMs. Node size and color reflect total of all PTM ratios in the data set (fold change key). This network is available for download as data file S1 and on the NDEx repository ( This network combined with the CFN that contains filtered PPI edges may be explored at

PTM data were obtained using immunoprecipitation with antibodies that are nonspecific to individual proteins but specific to PTMs and combined with TMT mass spectrometry (Fig. 1B). Forty-five lung cancer cell lines were compared to normal lung tissue in nine multiplex runs. In addition, selected lung cancer cell lines were treated with the anticancer drugs crizotinib, gefitinib, gleevec, and geldanamycin in six multiplex runs. Because each experiment represents a different state of the lung cancer cell line signaling system, and because combining data from different cell lines improves the quality of clustering (10), we combined comparison of cell lines to normal lung tissue (8729 modifications) and drug-treated cell lines (9321 modifications) into one data matrix containing 13,798 unique modifications; 90 samples from 15 independent experiments (6 samples in each experiment). The combined data matrix contained 78% missing values because there are a large number of PTMs that were not detected across different multiplex runs. Missing values may represent either the absence of PTMs in samples or simply the lack of data, so we developed approaches to reduce their influence on data analysis (10, 23).

Because identification of groups of statistically related PTMs from sparse data sets is challenging, we evaluated different methods to derive clusters from the statistical relationships between the identified PTMs (described in detail in text S1). We found that the t-SNE method when used with dissimilarity representations was the most effective at resolving meaningful clusters based on the uniformity and density of each cluster, and the number of prior knowledge PPIs found among the proteins within the identified clusters (figs. S1 and S2) (10, 23). To use the PTM data to filter edges in PPI databases (Fig. 1A), we evaluated analytical methods that identify PTM clusters whose proteins tend to interact (figs. S3 and S4). We found that breaking up large clusters (>80 PTMs) by applying PMD, and another round of t-SNE, worked best for defining clusters that were most justifiable by several criteria (fig. S4).

Well-studied proteins have more interactions within PPI databases. Hence, filtering edges using PTM clustering enriched for interactions specific to the lung cancer cell line data and mitigated the bias in PPI databases toward well-studied proteins (figs. S5 and S6). There was a weak correlation between CFN “betweenness,” a measure that indicates how frequently proteins are found on shortest paths through the network, and the number of PTMs found on each protein (fig. S7). However, the most highly modified proteins were not the proteins that were common to many paths, that is, had high betweenness. The number of modifications did not always correlate with serving as a hub within cell signaling pathways, although some highly modified proteins may act as hubs. We defined the minimum Spearman correlation to be used to delineate edges that represent correlations between PTMs (Fig. 1A) by examining the effect on network size and density (figs. S8 and S9). Finally, using drug responses that affect known kinase pathways, we determined that binning large ratio values and using log2-transformed data for clustering produced the best results (fig. S10). This is likely because TMT ratio data contain extreme values, which may skew correlation analyses, and it can be argued that PTM ratios beyond 100-fold are not functionally different.

Having established an effective method to cluster PTM data derived from TMT mass spectrometry, and having then used these PTM clusters to filter PPI in a CFN that outlines cell signaling pathways in lung cancer cell lines (Fig. 1A), we then defined a threshold for Spearman correlations between PTMs to display edges in a CCCN (Fig. 1C). The CFN and CCCN networks are available for further exploration at

Pathways in CFNs based on drug-affected PTMs

We next used these networks to examine pathways from drug targets to proteins whose PTMs were affected by drug treatment, to outline cell signaling pathways active in lung cancer cell lines. We began with drugs that directly inhibit receptor tyrosine kinases (RTKs). We reasoned that RTK inhibitors will affect kinase pathways and therefore phosphorylation and other PTMs of substrates and downstream pathway effectors. We traced pathways by identifying shortest paths in the CFN from known RTK drug targets to other proteins whose PTMs changed at least twofold after drug treatment. Crizotinib, which inhibits the RTKs MET [also called c-MET or hepatocyte growth factor receptor (HGFR)] and ALK (anaplastic lymphoma kinase), inhibited the modification of a number of proteins that are involved in cytoskeletal rearrangements and adhesion (Fig. 2A; blue indicates a decrease in PTMs), consistent with the oncogenic role MET is playing in metastasis (39). Epidermal growth factor receptor (EGFR) phosphorylation was also decreased by crizotinib (Fig. 2A), and the EGFR inhibitor gefitinib decreased MET phosphorylation (Fig. 2B), indicating cross-talk between these RTKs or possibly off-target effects. The CFN connections downstream of MET, ALK, and EGFR linked to the tyrosine kinase LYN, the tyrosine phosphatase PTPN11 (also known as SHP2), and the adaptor/scaffolding proteins IRS1 (insulin receptor substrate 1) and CRK (v-crk avian sarcoma virus CT10 oncogene homolog) (40). These connections indicate known interactions supported by the statistical clustering of PTMs.

Fig. 2 Networks derived from composite shortest paths from drug targets to drug-affected proteins.

(A to C) Graphs showing the sum of shortest paths from each target to each protein whose PTMs were more than twofold affected by the MET and ALK inhibitor crizotinib (A) and the EGFR, ERBB2, and ERBB3 inhibitor gefitinib (Iressa) (B) in H3255 cells treated for 1 to 24 hours. A key defining node shape and border colors and edge colors is shown bottom right. Directed edges are shown with arrowheads; these indicate one protein acting on another protein (for example, kinases phosphorylating substrates). Undirected edges without arrowheads indicate various other types of interactions. Edge line thickness is proportional to the strength of interactions, as defined in PPI databases. Node size and color are proportional to the changes in PTMs for each protein in response to indicated drug treatments (see scale bar; yellow indicates positive change; blue, negative; green, no change). Many yellow nodes represent overall increases in acetylation in these graphs (C). Fold change for individual PTMs in response to indicated drugs is plotted using heat maps on a blue-yellow scale (see scale bar). Several proteins exhibited phosphorylation decreases (represented by blue on the heat map) and concomitant acetylation increases (represented by yellow) in response to gefitinib.

In addition to decreases in phosphorylation of downstream targets, we noted an increase in acetylation of many proteins (Fig. 2B, yellow nodes). In a group of proteins that were both phosphorylated and acetylated, phosphorylation decreased and acetylation increased with gefitinib treatment (Fig. 2C). The acetyltransferase EP300 itself showed increased acetylation with gefitinib treatment and was linked to HSP90 in the CFN (Fig. 2B). Also present was fatty acid synthase, whose activity and expression are controlled by EP300 (41). EP300 and HSP90 had many links in the CFN to proteins dually modified by both phosphorylation and acetylation (Fig. 2B). Thus, examination of several PTMs in the same experiment revealed an unexpected relationship between phosphorylation and acetylation. Filtering PPIs by PTM modification clustering highlighted links between cytoskeletal proteins, RNA binding proteins, and heat shock proteins in lung cancer cell lines.

HSP90 proteins, together with CDC37, act as chaperones for a large number of kinases, including RTKs, and are required for regulation of activity as well as their initial folding (42). The mechanisms of action of the HSP90 inhibitor geldanamycin are not well understood compared with the other anticancer drugs used in this study. Heat maps showing PTMs affected by more than twofold by geldanamycin revealed that, in addition to heat shock proteins (HSPs; Fig. 3A), many proteins involved in regulating endocytosis (Fig. 3B) and cytoskeletal dynamics (Fig. 3C) were affected by geldanamycin. This is consistent with reports that geldanamycin affects endocytic membrane trafficking and cytoskeletal dynamics (4345). Examination of the shortest paths between HSP90 proteins, the known targets of geldanamycin, and proteins whose PTMs were affected by geldanamycin included acetyltransferase EP300 interacting with HSP90AA1 (Fig. 3D). As mentioned above, proteins that have high betweenness are likely to be hubs in the network, that is, connect many cell signaling pathways (fig. S7). These included HSP90AA1 and several other proteins linked to it in the geldanamycin subnetwork (Fig. 3D), including the 14-3-3 proteins YWHAQ and YWHAZ, the RNA binding protein NPM1, and the heat shock proteins HSPA8, HSPA1B, and β-actin (ACTB). Geldanamycin treatment also affected phosphorylation of the RTKs ALK, AXL, EGFR, ERBB3, and IGF1R (insulin-like growth factor 1 receptor), of which phosphorylation was decreased for all but AXL and ERBB3 (Fig. 3D).

Fig. 3 Heat maps and network showing PTMs affected more than twofold by geldanamycin.

(A to C) Heat maps of PTMs of HSPs (A), proteins involved in endocytosis (B), and proteins involved with the cytoskeleton (C) in each of two lung cancer cell lines cultured with serum and either untreated or treated with geldanamycin (100 μM for 15 or 24 hours) relative to serum-starved, untreated cultures. Scale bar (B) indicates fold change on a blue-yellow scale. (D) Network graph plotted as in Fig. 2 showing the sum of shortest paths from HSP90AA1 and HSP90AB1 (center) to each protein whose PTMs were more than twofold affected by geldanamycin in H2228 or H3122 cells. Node size and color indicate the sum of PTM changes in response to geldanamycin in both cell types (refer to the key in Fig. 2).

These results suggest that geldanamycin may affect the balance between phosphorylation and acetylation by influencing EP300 through interaction with HSP90 proteins. This in turn affects endocytosis (as indicated by phosphorylation changes in DYNC1I2, EPS8, CAV1, and clathrin heavy chain CLTC), microtubules, actin, and other cytoskeletal elements (as indicated by several PTM changes in spectrin SPTAN1 and keratins KRT7 and KRT8) (Fig. 3). Protein methylation may also be involved. Changes in methylation were also observed in response to geldanamycin, and phosphorylation of the histone lysine methyltransferases WHSC1/NSD2 and NSD1 was reduced (Fig. 3). There were CFN links between EP300 and kinases, phosphatases, another acetyltransferase (CLOCK), methyltransferases (PRMT1 and ASH1L), and the actin-regulating proteins RAC1 and CDC42 (Fig. 4A).

Fig. 4 EP300 interactions with PTM-modifying enzymes.

(A) Shown are links in the CFN between EP300 and kinases, phosphatases, acetyl- and methyltransferases, and the actin polymerization–governing GTPase (guanosine triphosphatases) RAC1 and CDC42. In cases where there were more than one type of interaction between two proteins in the PPI databases (see the key in Fig. 2), to simplify graphs, white edges represent the composite of these interactions with the edge weight summed. (B) EP300 interactions with CLTC and enzymes that modify clathrin and govern EGFR endocytosis. The graph shows PTMs linked by black edges to the parent protein, and correlation edges (yellow, positive; blue, negative) between PTMs in the PTM CCCN, which depicts PTMs that cocluster with edges representing Spearman correlation greater than the threshold of |0.543|, as defined in fig. S9. Node size and color (refer to the key in Fig. 2) indicate ratios of PTM changes in lung cancer cell lines to those in normal lung tissue; see fig. S11 for responses to drugs. The combined CFN/CCCN graphs show the individual PTM response and the sum of all PTMs represented in the parent (gene) node.

Specific PTM relationships in the pathways that connect EGFR to EP300 can be visualized by showing PTMs and their correlation edges on the CFN graph (Fig. 4B). In the graphs in Fig. 4, node size and color indicate comparison of individual PTMs, or the sum of all PTMs for a protein, in cell lines compared to normal lung tissue. The same graphs indicating fold change in response to drug treatments are shown in fig. S11. Individual PTMs on EP300 changed in response to different conditions, and these changes occurred concomitantly with PTMs on other proteins that coclustered with EP300 PTMs (fig. S12). For example, EP300 phospho-S1897 was down-regulated in lung cancer cells compared to normal lung tissue along with two phosphorylation sites on BCAR3 (fig. S12A, bottom), whereas EP300 acetylation sites K1674 and K1760, along with RAC1 and CDC42 acetylation sites, were increased by gefitinib treatment (fig. S12C, top and middle left, respectively). These data indicate that EP300 likely plays a central role in cytoplasmic signaling in addition to its previously described role as a histone acetyltransferase. The inverse relationship between phosphorylation and acetylation on EP300 (fig. S12) and other proteins (Fig. 2) gives rise to the hypothesis that there is an antagonistic relationship between particular sites of acetylation and phosphorylation on cell signaling proteins. These sites may be indicators of mutually opposing mechanisms or pathways.

Evidence for antagonistic relationships among PTMs

The hypothesis that there is an antagonistic or dueling relationship between phosphorylation and acetylation sites leads to two predictions. First, we should expect to find negative correlations between a subset of phosphorylation and acetylation sites for proteins modified with both PTMs. The distribution of correlations between phosphorylation and acetylation sites on the same protein was significantly different than that of dual phosphorylation or acetylation (Fig. 5A). We found 269 such phosphorylation and acetylation modifications with Spearman correlation lower than −0.5 in 112 proteins in the lung cancer cell line PTM data (highlighted in blue in Fig. 5A). Both tyrosine phosphorylation and serine/threonine phosphorylation exhibited negative correlations with acetylation (fig. S13). Negative correlations can be visualized as blue edges on a combined CFN/CCCN (Fig. 5B), which revealed proteins that are involved with endocytosis and the actin cytoskeleton that are affected by geldanamycin. Cortactin (CTTN), for example, had a large number of phosphorylation sites negatively correlated with its acetylation sites that are affected in opposite ways by geldanamycin [Fig. 5, B (upper right) and C]. Clathrin is another example where phosphorylation and acetylation sites were negatively correlated, although not all sites were affected by geldanamycin, or detected in the drug treatment experiments. Cliques of highly correlated coclustered PTMs were also revealed in the CCCN (Fig. 5D, identified by yellow edges). For example, EGFR pTyr869 clustered with MET pTyr1003 in one clique, whereas different EGFR phosphorylation sites clustered with proteins involved with endocytosis (EPS8 and CLTC) in the other clique (Fig. 5D).

Fig. 5 Dually acetylated and phosphorylated proteins: Interactions with bromodomain proteins and kinases.

(A) Correlation between different PTM sites on the same protein: different phosphorylation sites (p p); phosphorylation and acetylation (p ac); and different acetylation sites (ac ac). Homo-PTM correlations (p p and ac ac) are compared to heterologous (p ac) PTMs (both P < 2.2 × 10−16, Welch two-sample t test). Individual PTM-PTM correlations are plotted in gray under boxplots (top). Correlation density between phosphorylation and acetylation sites on the same proteins is also shown (bottom). Similar plots in which tyrosine and serine/threonine phosphorylation are compared to each other and each to acetylation individually are shown in fig. S13. Negative correlations selected for display as edges are highlighted in blue. (B) Combined CFN and PTM CCCN for selected proteins modified by both phosphorylation and acetylation. In addition to correlation edges, negative Spearman correlations less than −0.5 are shown as blue edges (plotted as in Fig. 4B, except edges connecting proteins to their PTMs are light gray). (C and D) Selected regions of (B) expanded for clarity. (E) CFN interactions between tyrosine kinases, geldanamycin-affected dually acetylated and phosphorylated endocytic and cytoskeletal proteins, and bromodomain proteins (colored light red). Node size and color (refer to the key in Fig. 2) are in response to geldanamycin for (B) to (D). (F) Comparison of the number of interactions of bromodomain-containing proteins with dually modified proteins both phosphorylated and acetylated (p ac); those with negative correlations less than −0.5 (p ac neg); all acetylated proteins (all ac); all proteins not acetylated (all except ac); all proteins except those both phosphorylated and acetylated (all except p ac); and proteins only modified by phosphorylation (p only) or methylation (me only). CFN interactions between bromodomain-containing proteins and these groups of proteins were retrieved, and the number of edges was divided by the number of proteins in each group to obtain bromodomain interactions per gene.

The second prediction of the hypothesized dueling relationship between certain phosphorylation and acetylation sites is that there should be interactions between acetyl-binding proteins containing bromo (or BET) domains, which bind to acetylated lysine moieties, and kinase signaling pathways (5). Such interactions were observed in the CFN among dually phosphorylated and acetylated proteins, kinases, and bromodomain proteins (Fig. 5E; bromodomain proteins are colored light red). Bromodomain-containing proteins were more likely to interact with dually phosphorylated/acetylated proteins than acetylated proteins in general, or proteins modified by phosphorylation or methylation (Fig. 5F). That there were more CFN interactions between bromodomain-containing proteins and dually phosphorylated and acetylated proteins compared to all other groups of proteins (Fig. 5F), combined with the fact that gefitinib (Fig. 2, B and C) and geldanamycin (Fig. 3) caused decreases in phosphorylation and concomitant increases in acetylation for a large number of proteins, strongly supports the hypothesis that certain phosphorylation and acetylation sites play antagonistic or mutually opposing roles for proteins identified in this study.

The data suggest the hypothesis that the intracellular location of proteins may change when their phosphorylation and acetylation rates go in opposite directions. To test this hypothesis, we performed cell fractionation experiments that effectively separate organelles including endosomes and lysosomes, and membrane compartments distinguished by resistance to nonionic detergents (lipid rafts; Fig. 6A) (10, 46, 47). Geldanamycin treatment caused changes in the intracellular location of proteins with negative correlations between phosphorylation and acetylation, determined by mass spectrometry (Fig. 6, B and C) and Western blotting (Fig. 6, D and E, and fig. S14). Many dually modified proteins increased in detergent-resistant lipid raft fractions (raft2 to raft4), including HSPs and chaperones (HSP90AA1, HSPA4, HSPB1, and CCT2), 14-3-3 phosphoserine-binding proteins (YWHAZ, YWHAE, YWHAQ, YWHAH), cytoskeletal proteins (ACTB, CTTN, TUBB4A, and MTPN), membrane traffic-associated proteins (CLTC and NDRG1), and signaling adaptor proteins (GIPC1, LASP1, PEA15, CRK, CRKL, and GRB2) (Fig. 6 and fig. S14). CLTC also decreased in endosome fractions (org2) in response to geldanamycin, whereas other proteins increased, including mitogen-activated protein kinase 1 (MAPK1), MAPK12, and KRAS (Kirsten rat sarcoma viral oncogene homolog) (Fig. 6, C to E). Although there was a geldanamycin-induced increase in phosphorylated MAPK3/MAPK1 in raft fractions, phosphorylated CTTN decreased in the same fractions (p-ERK1/2 and p-CTTN; Fig. 6E and fig. S14). Concomitantly, several bromodomain-containing proteins (PBRM1, ATAD2, BAZ1A, BAZ1B, and SMARCA4) decreased in the same lipid raft fractions (raft2 to raft4; Fig. 6, B and C). These data are consistent with previous work showing that geldanamycin influences the intracellular localization of signaling proteins (4345) and support the hypothesis that the changes in modification state that we observed in response to geldanamycin (Fig. 3) have functional consequences for intracellular localization.

Fig. 6 Geldanamycin induces changes in lipid raft localization of dually phosphorylated/acetylated proteins.

(A) Outline of cell fractionation experiments that separate organelles including different populations of endosomes (org1 to org4, top) and detergent-sensitive and detergent-resistant membranes including lipid rafts (raft1 to raft4, bottom). Org1 to org3 contain lysosomes and endosomes with different sedimentation velocity, respectively; org4 contains soluble, cytoplasmic proteins; raft1 contains detergent-sensitive proteins; raft2 to raft4 are detergent-resistant fractions of decreasing equilibrium density (10, 46, 47). (B to E) H3122 and H3255 cells treated with geldanamycin (“geld.”) or not treated (“C.”) were fractionated, and total protein amounts in each fraction [four organellar fractions (“org.”) and four raft fractions (“raft.”)] were determined in three separate experiments by mass spectrometry (B and C, H3122 cells) and, for a select few, by Western blotting (D, H3255 cells; E, cell type indicated). Mass spectrometry data [expressed as heat maps in (B)] and Western blot data (D, and fig. S14) were used to calculate the amount of each protein in each cell fraction as a proportion of the total in the whole cell, which is the sum of protein amounts in all cell fractions. The heat maps in (C) and (E) show fold change abundance in treated cells relative to control cells for each fraction for mass spectrometry and Western blot data, respectively. One hundred sixty-eight of 218 dually phosphorylated and acetylated proteins with negative correlations between these PTMs were detected in this experiment. Many exhibited changes in the raft2 fraction (P = 0.05394, Welch two-sample t test). Forty-five of 108 proteins whose PTMs changed significantly in response to geldanamycin (Fig. 3) were detected by mass spectrometry in this experiment; collectively, their amounts increased in the raft2 fraction (P = 0.01457). Of the six bromodomain-containing proteins detected, most decreased in the raft2 fraction (P = 0.04832 for all the bromodomain-containing proteins). Mass spectrometry data expressed as the amount of proteins in cell fractions are in table S2.

A total of 295 proteins were identified that had negative correlations between different types of PTMs on the same protein (table S1). Examination of relationships between protein methylation and other PTMs revealed that the distribution of correlations between methylation and acetylation sites on the same protein was significantly different than correlations of the same PTMs (Fig. 7A). There were 50 proteins with 95 dual acetylation and methylation PTMs that had negative correlation less than −0.5 (highlighted in blue in Fig. 7A). These include RNA binding, cytoskeletal, heat shock proteins, transcription factors, and acetyltransferases, including EP300 and NCOR2 (table S1). Note that none of this group were histones, which were dually modified, but correlations were mostly positive or not significant. There were also groups of methylated and acetylated proteins whose PTMs clustered together and were all highly correlated with one another. An example clique of these highly correlated coclustered PTMs (Fig. 7B, identified by multiple yellow edges) revealed several RNA binding proteins and transcription factors, including the methyltransferase EZH1, acetyltransferase CREBP, deacetylase SIRT9 (which was phosphorylated), and proteins whose methylation and acetylation sites were inversely correlated. These data suggest that there are molecular interactions governed both synergistically and antagonistically by methylation and acetylation in proteins to regulate RNA processing and transcription.

Fig. 7 Clusters of proteins dually modified by different PTMs.

(A and C) Correlations between different PTM sites on the same protein plotted as in Fig. 5A. Homo-PTM correlations (me me; ac ac; p p) are significantly different from heterologous PTM correlations (me ac; p me; P < 2.2 × 10−16 in all cases). (B and D) Selected proteins modified by (B) acetylation and methylation, and (D) phosphorylation and methylation are shown with CFN links to PTM-modifying enzymes, as described in Fig. 4B. These figures highlight two cliques of PTM CCCN edges that coclustered and had a high Spearman correlation. Negative correlations between different PTMs on the same protein are indicated by blue edges. Node size and color (refer to the key in Fig. 2) reflect total of all PTM ratios in the data set.

Similar to other dual modifications on the same proteins, the distribution of PTM correlations among 96 dually phosphorylated/methylated proteins was distinct from homologous PTM correlations, with 211 modifications having negative correlation between phosphorylation and methylation less than −0.5 (Fig. 7C). Many of these coclustered PTMs were found on RNA binding and RNA-processing proteins. We also identified a clique of PTMs among phosphorylated and methylated RNA binding proteins that included links to methyltransferases and kinases (Fig. 7D). The core clique of proteins whose PTMs all clustered and correlated included proteins involved in RNA splicing and processing, similar to the clique of methylated and acetylated proteins above (Fig. 7B) (48). The extended group contained other proteins involved in RNA binding, cytoskeleton, chaperone activity, and transcription regulation. HNRNPA2B1 displayed multiple negative correlations between its tyrosine phosphorylation and arginine methylation sites. Note that the methyltransferase EZH1, the RNA-processing protein XRN2, and the RNA binding protein RBMX were linked to both clusters (Fig. 7).

Connections to aberrantly regulated genes in lung cancer

Last, to assess the functional significance of upstream signaling events’ influence on global mRNA gene expression by PTMs, we examined how cell signaling pathways inferred from the CFN analysis may regulate basal gene expression patterns in lung cancer cell lines. Among the most commonly differentially regulated genes from the gene expression data that we obtained from lung cancer cell lines from the Cancer Cell Line Encyclopedia (CCLE) (49), SMARCA4 and NKX2-1 were also detected in the protein modification data (Fig. 8A) (50). SMARCA4 is the most down-regulated gene, and its inactivation is thought to promote NSCLC aggressiveness (51). NKX2-1 is one of the top 10 differentially regulated genes in this set of cell lines. NKX2-1 plays a key role in lung development; its loss causes a failure of lung cell differentiation and leads to malignant transformation in lung adenocarcinoma (5254). SMARCA4 and NKX2-1 exhibited CFN/CCCN interactions with the RNA binding proteins DDX5, DHX9, HNRNPAB, and lysine acetyltransferase NCOA2 (Fig. 8B). These links from DDX5 to NCOA2, NKX2-1, and SMARCA4 were preserved in a highly curated network focused on direct binding interactions (Fig. 8C). This is consistent with a mechanism to control NKX2-1 expression through long noncoding RNAs (54). The acetyltransferase EP300, the methyltransferase PRMT1, the kinases PKN2, CHEK2, and YES1, and the bromodomain-containing kinase BAZ1A also had cluster-filtered interactions with this group (Fig. 8B), indicating that the interplay of phosphorylation, acetylation, and methylation is likely to play a role in controlling expression and activity of SMARCA4 and NKX2-1.

Fig. 8 Pathways to SMARCA4 and NKX2-1.

(A) Gene expression data from 29 NSCLC cell lines from the CCLE (49) plotted on a blue-red scale (key) (50). NKX2-1 and SMARCA4 (highlighted) are among the most dysregulated genes in lung cancer. (B) CFN and CCCN path from EP300 to NKX2-1 and SMARCA4, where node size and color (refer to the key in Fig. 2) indicate total ratio data for all experiments where lung cancer cell lines were compared to normal lung tissue. (C) Shortest paths from EGFR and MET to transcription factors in a CFN derived from the PPI data set with highly curated molecular interactions with a focus on direct interactions for which there is strong evidence (20, 85). Note that DDX5 and NCOA2 bind to NKX2-1 and SMARCA4 as in (B). Node size is CFN betweenness, and node color is normalized betweenness defined as CFN betweenness divided by the betweenness from all PPI data sets before filtering.

In sum, we detected a large number of proteins that were modified by more than one type of PTM; in this study, we detected more than 700 proteins that were phosphorylated, acetylated, and methylated (Fig. 9A and fig. S15A). Correlation was poor (R2 = 0.03368) comparing relative abundance of proteins detected by their PTMs in our data to total protein abundance estimates for human proteins in paxDB (fig. S15B) (55). This suggests that the amount of detectable PTMs was governed largely by signaling events rather than by total protein amounts. Elucidation of clusters of positively correlated PTMs, when combined with PPI information, identified cell signaling pathways that are active in lung cancer cell lines. In addition, our observations suggest that proteins that function as signal integrating hubs may have one or more exclusive OR (XOR) switches governed by a subset of antagonistic relationships among phosphorylation, acetylation, and methylation sites (Fig. 9B). Because different PTMs govern interactions among proteins with PTM-specific binding domains (Fig. 9B), proteins modified by more than one PTM type may thus function as hubs for signal integration or pathway control switches. Identification of these cell signaling hubs, and the enzymes that modify them, provides insight into the complex signal transduction mechanisms that regulate lung cancer cell lines.

Fig. 9 Dual PTMs may function as dueling PTMs.

(A) Venn diagram showing the overlap among proteins modified by more than one type of PTM. A similar Venn diagram in which tyrosine and serine/threonine phosphorylation are separated is shown in fig. S15A. (B) Schematic of the findings. Proteins in molecular signaling pathways modified by more than one PTM have different sets of interacting proteins (5, 6). PTM-driven interactions occur through recognition of specifically modified amino acid residues by protein domains listed under the PTM type in the figure. Acetylated (Ac) proteins interact with bromodomain and BET (bromodomain and extraterminal domain) proteins (green); methylated proteins interact with proteins containing tudor domains if methylated on arginine (RMe); or chromo, PWWP (’Pro-Trp-Trp-Pro’), and MBT (malignant brain tumor) domains if methylated on lysine (KMe) (purple). Phosphorylated proteins interact with several protein families (red): tyrosine phosphorylated proteins (pY) interact with SH2 (Src homology domain 2) and PTB (phosphotyrosine binding) domains; serine/threonine phosphorylated proteins (pS/T) interact with a variety of proteins including 14-3-3 protein family members and proteins containing the domains WW (domain with 2 conserved Trp residues), FHA (forkhead-associated domain), WD40 (WD or β-transducin repeats), and LRR (leucine-rich repeats). Our data suggest that where inverse correlations exist between different PTMs, these may function as exclusive “OR” (XOR) switches to direct alternative cellular outcomes. This principle applies to phosphorylation versus acetylation; phosphorylation versus methylation; and methylation versus acetylation.


The data generated for this study presented a unique opportunity to simultaneously examine, on a large scale, proteins modified by tyrosine and serine/threonine phosphorylation, lysine acetylation, and lysine and arginine methylation (Fig. 1B). This allowed definition of relationships between kinases, phosphatases, and enzymes known to affect other PTMs, including acetyltransferases, deacetylases, methyltransferases, and demethylases. Our computational data analysis approach was to integrate two different kinds of information: clustering based on statistical relationships among various PTMs, and PPI data from public databases, to outline signal transduction pathways (Fig. 1A). We started with PPI networks retrieved from Pathway Commons, String, GeneMANIA (15, 16), BioPlex (12), and the kinase-substrate data from PhosphoSitePlus (13, 14), and then filtered interactions based on PTM clustering. Clustering methods using t-SNE were previously used for analyzing lung cancer phosphoproteomic data (23), and the CFN approach we utilized here was previously used on phosphoproteomic data from neuroblastoma cell lines (10). We further refined this approach with an additional step (PMD) to break up large clusters, as described in Materials and Methods and in the Supplementary Materials. We have shown that the CFN derived from these data effectively removes biases in PPI networks where well-studied proteins have more links (fig. S6) and that the number of PTMs does not overly bias node betweenness (fig. S7), so that the two kinds of information were well balanced for construction of a CFN.

The cell signaling pathways within the CFN were based on known PPIs and clustering of PTMs in lung cancer cell lines, so for interactions between any two proteins to be retained in the CFN, they must be included in the same PTM cluster. Starting with drug targets (the proteins that bind to and are directly affected by drugs), we examined links, shortest paths in the CFN, to proteins whose PTMs were changed by drug treatment (Figs. 2 and 3). This effectively outlined signaling pathways in lung cancer cell lines between drug targets and downstream proteins whose PTMs were affected by the drug.

The hypothesis that certain PTMs may have a mutually opposing, antagonistic relationship to one another was first suggested in our results by drug treatments that caused protein acetylation to increase and phosphorylation to concomitantly decrease (Figs. 2 and 3). The links between the acetyltransferase EP300 and heat shock proteins including HSP90s, suggested that the HSP90 inhibitor geldanamycin affects the balance between the phosphorylation and acetylation of endocytic and cytoskeletal proteins. This in turn perturbs endocytosis and cytoskeletal dynamics, which influences membrane trafficking and activity of selected RTKs and other proteins involved in signal transduction (Fig. 6). These results are consistent with previous work showing that geldanamycin affects membrane traffic and cytoskeletal dynamics to regulate activity and intracellular localization of many signaling proteins (4345). EP300 had many CFN links to enzymes that regulate other PTMs, including kinases, phosphatases, and methyltransferases (Fig. 4 and figs. S11 and S12). EP300 has been previously reported to function in cytosol to regulate autophagy (56, 57). Our data suggest that EP300 (and possibly other acetyltransferases) may play a role in regulating other membrane traffic and cytoskeletal mechanisms. Consistent with this notion, HDAC6 acetyltransferase regulates the acetylation status of microtubules and modulates EGFR trafficking (58). That geldanamycin affects the balance of different PTMs sheds new light on the elusive mechanisms of action of this class of drug, which is important to understand because HSP90 inhibitors are an emerging therapy for lung cancer (5961).

Further support for an antagonistic relationship between certain phosphorylation and acetylation sites was provided by examining negative correlations between PTMs within the same protein and visualizing them as edges (Fig. 5, A and B). In addition, cluster-filtered interactions existed between kinase signaling pathways and bromodomain-containing proteins that bind to acetylated lysine moieties (Fig. 5E). These interactions were most frequent among proteins with negative correlations between phosphorylation and acetylation sites on the same protein (Fig. 5F). Negative correlations within proteins were also identified for acetylation versus methylation sites (Fig. 7, A and B) and phosphorylation versus methylation sites (Fig. 7, C and D). These data elucidated a family of different PTMs that potentially function as multiple XOR switches in signaling pathways (Fig. 9B).

Our data are consistent with the notion that different PTMs on the same protein integrate multiple signals to generate combinatorial outputs (62). These interconnected, functionally associated mechanisms coevolved across eukaryotes (2, 63) and are associated with oncogenesis (40). Our study indicates that acetylation and methylation, together with phosphorylation, increase the interactome of hubs in the signal transduction network by conditionally altering the sets of interacting proteins with particular protein domains (Fig. 9) (64, 65). Proteins in the network may be classified as proteins that attach (writers), recognize (readers), and remove (erasers) PTMs (2). The readers of acetylation contain bromo and BET domains (5). Readers of methylation are proteins that have the tudor domain for arginine methylation, and chromo, MBD, and PWWP domains for lysine methylation (66). It is noteworthy that bromodomain-containing proteins include kinases (BAZ1A, TRIM33, BRD2, and BRD4), acetyltransferases (EP300 and CREBBP), and methyltransferases (ASH1L and KMT2A; Fig. 5E). This emphasizes the interconnected nature of mechanisms involving different PTMs.

Signaling pathways outlined using CFNs involve a large number of modified RNA binding, cytoskeletal, and membrane traffic proteins. The number of RNA binding and cytoskeletal proteins in these networks makes sense in light of the number of mechanisms involved with RNA transport, sequestration, and regulatory RNA molecules (6771). Cells evolved mechanisms to control protein expression by transcription, translation, and transport of RNA to different intracellular locations. Two RNA binding proteins were the most highly modified proteins in the data (AHNAK and SRRM2; fig. S7). Several such highly modified RNA binding proteins are dysregulated in cancer and move from cytoplasm to nucleus in response to phosphorylation (72). Our data suggest that methylation and acetylation also likely play a role in this process.

Signal transduction involves more than the three PTMs studied here, for example, generation of second messengers and covalent attachment of carbohydrates, lipids, or ubiquitin family proteins, which cause additional changes in activity, intracellular location, and binding partners (6264). Accurate models of signal transduction should attempt to integrate data about these processes for a complete picture. For example, MET ubiquitination and internalization are regulated by phosphorylation at Tyr1003, the c-CBL E3 ubiquitin ligase binding site (73). This PTM on MET clustered with phosphorylation of EGFR at Tyr869 and Tyr1197 but not at Tyr1045, EGFR’s c-CBL binding site (Fig. 5, C and D) (74).

It would also be useful to correlate gene expression patterns with PTMs. This remains challenging, however, because the correlation between protein and mRNA levels is poor, as shown for example during Xenopus embryogenesis, where large cells allow direct comparisons (75), although many differences may be resolved using a simple model for expression kinetics (76). PTMs in cells are even more highly variable and dynamic than protein levels. For example, very different phosphorylation patterns in the anaphase-promoting complex and mitotic spindle checkpoint proteins are observed in response to antimitotic drugs (77), and PTMs show much more marked changes than mRNA or protein during developmental progression (76). PTMs are more likely to be useful to inform us about cell signaling pathways than transcription because they are closer to signaling events, at least those that are upstream of transcription (25, 76).

Our study integrating three different PTMs contains a large amount of information that outlines signaling pathways when clustering is combined with known PPIs. These data elucidated clusters of highly correlated PTMs among groups of proteins [Figs. 5B (yellow CCCN edges) and 6, B and D]. This means that if one or a few of these PTMs are detected, then other members of the clique are likely to be found, which is useful for determining signatures of pathway activation or disease state. There are more synergistic relationships, as well as dueling or antagonistic relationships, among different PTMs in lung cancer signaling pathways than can be presented from our data; therefore, we have created a browser-accessible interface as a resource for further exploration by other investigators ( Networks are also available on the NDEx repository ( and in data file S1. The data will be useful for prediction of pathways of drug resistance (78), and side effects from drugs that target writers, readers, and erasers of acetylation and methylation (49).


Modification-specific antibody immunoprecipitation and mass spectrometry

PTMs of 45 lung cancer cell lines, 12 derived from SCLC and 33 from NSCLC, were compared to normal lung tissue (pooled from anonymous patients) using an established protocol (26, 27). In addition, several cell lines were treated with crizotinib, gefitinib, geldanamycin, or imatinib at 1 μM for 1 to 24 hours. In all, 15 six-plex TMT experiments were performed. Briefly, cells were washed and harvested in phosphate-buffered saline, and cell pellets were frozen in liquid nitrogen. Cells were lysed in a 10:1 (v/w) volume of lysis buffer [4% SDS, 100 mM NaCl, 20 mM Hepes (pH 8.5), 5 mM dithiothreitol, 2.5 mM sodium pyrophosphate, 1 mM β-glycerophosphate, 1 mM Na3VO4, and leupeptin (1 μg/ml)], and proteins were reduced at 60°C for 45 min. Proteins were then alkylated by the addition of 10 mM iodoacetamide (Sigma-Aldrich) for 15 min at room temperature in the dark, and methanol/chloroform was precipitated. Protein pellets were resuspended in urea lysis buffer [8 M urea, 20 mM Hepes (pH 8.0), 1 mM sodium orthovanadate, 2.5 mM sodium pyrophosphate, and 1 mM β-glycerolphosphate] and sonicated. Insoluble material was removed by centrifugation at 10,000g for 5 min, and the supernatant was diluted fourfold in 20 mM Hepes (pH 8.5) and 1 mM CaCl2 for Lys-C digestion overnight at 37°C, then diluted twofold followed by trypsin digestion for 4 to 6 hours at 37°C. Samples were then acidified to pH 2 to 3 with formic acid, and peptides were purified on a Waters Sep-Pak column and dried in a speed-vac. Peptides were purified on a Waters Sep-Pak column and quantified using a micro-BCA assay (Thermo). Mass tag (six-plex TMT reagents; Thermo) were cross-linked to peptides in 30% acetonitrile/200 mM Hepes (pH 8.5) for 1 hour at room temperature, and the reaction was stopped by the addition of 0.3% (v/v) hydroxyamine. Combined samples were then sequentially immunoprecipitated with cocktails of modification-specific antibodies from Cell Signaling Technology in the following order: anti-phosphotyrosine (P-Tyr-1000, #8954); anti-phosphoserine/threonine (AGC/PSD family Kinase Substrate Antibody; CST in-house antibody validated for peptide immunoprecipitation) (phospho-Akt substrate, #9614; Phospho-AMPK Substrate Motif, #5759; Phospho-ATM/ATR Substrate, #9607 and #6966); anti-acetyllysine [acetylated-Lysine, #9814 (79)]; anti-methyllysine [Mono-Methyl Lysine, #14679 (80)]; and anti-methylarginine [Mono-Methyl Arginine, #8015 (80)] (see Fig. 1B). After anti-phosphotyrosine and anti-phosphoserine/threonine immunoprecipitation, phosphopeptides were further purified on a TiO2 column [Thermo Fisher Scientific (81)]. Samples were then mixed in equimolar ratios, and the ratios were checked and samples run on an Orbitrap Q Exactive MS (Thermo Fisher). Identification of peptides and quantification of mass tags were obtained from the MS2 spectrum after fragmentation by tandem mass spectrometry analysis, as described (28, 80). Peptides with false discovery rate (FDR) <1% were selected for further analysis. Modification sites with site localization scores of less than A-score 13 were excluded from analysis (82).The data were filtered to include single modification sites present in three or more experiments for the final analysis. For display of results in heat maps, treatment/control ratios were calculated as fold change (treatment/control if treatment > control; −1/treatment/control if treatment < control).

Final clustering for CFNs

A detailed discussion of data analysis considerations and network development is provided in text S1. TMT ratio data were used to calculate pairwise-complete Euclidean distance and Spearman and hybrid Spearman-Euclidean dissimilarity (SED) (10, 23). Clusters were identified using t-SNE, as discussed below. Clusters containing more than 80 modifications were subjected to PMD with dimensionality reduced to the number of experiments that contained modifications in the cluster, and t-SNE was then performed on these PMD embeddings. Clusters were examined manually, and those that were sparsely populated with data (meaning that contained modifications mostly from one experiment) were discarded. An adjacency matrix was constructed by pairing coclustered modifications to each other. To construct CCCNs, the adjacency matrix was used to filter Spearman correlations at various values, as shown in fig. S9, excluding values that were not between coclustered modifications. Spearman correlation values were used to represent network edge weights in modification site CCCNs. This protein modification CCCN was used to construct a protein CCCN by merging all coclustered correlation edges into the gene names of modified proteins. The final gene CCCN represents the sum of modification correlations among all proteins (genes) that clustered together, whose modifications were detected in two or more experiments and whose Spearman correlation is greater than the absolute value of minimum densities shown in fig. S9B.

Pairwise-complete Euclidean distance and Spearman correlation

Pairwise-complete Euclidean distance and Spearman correlation were performed in R using previously described methods (10, 23). Embeddings were created from Euclidean distance; Spearman dissimilarity was defined as 1 − abs(cor); and hybrid SED was defined as the mean of normalized Euclidean and Spearman dissimilarity as 1 − cor.

Matrix decomposition

Matrix decomposition was performed using the methods described by Witten et al. (83) using the R package PMA (Penalized Multivariate Analysis). Parameter selection for PMD via cross-validation was first performed using the tuning function The function PMD, which applies an L1 penalty on the columns and rows, was used to obtain a matrix of 90 factors (equivalent in size to the original data) and 15 factors (the number of TMT runs). The original data were not centered for these calculations. Alternatively, the data were further standardized such that each column has zero mean and unit variance (normalized, centered, or nc). Groups with greater than three sites were evaluated.

t-SNE embeddings

t-SNE embeddings were created using Rtsne, the Barnes-Hut implementation of t-SNE. The pairwise-complete method produced clusters that were more readily distinguished (fig. S1), with other methods tending to crowd many sites into one or two very large clusters. This was mitigated in two ways, by reducing the perplexity in t-SNE and reducing vector length for inclusion of neighbors in clusters (the “too long” value) in the minimum spanning tree length to define clusters. These parameters were optimized as much as possible to avoid one large cluster and also many single-site clusters. The typical settings with Rtsne were dimensions = 3, perplexity = 15, and θ = 0.25, and the initial principal components analysis step turned off.

Random clusters

Random clusters were generated by sampling the total number of sites or genes from the data using a probability vector to obtain the average number of genes and sites similar to real clusters from all embeddings. The probability of random coclustering of different modification sites from the same protein was weighted to the number of modification sites per protein; that is, proteins with many modification sites in the data were more likely to cocluster in random clusters. The average %NA of random clusters was equal to that in the entire data set (fig. S2C).

Internal evaluations

Index is defined asIndex =[(1+realsamples)*(1+clearsites)/(1+percent NA)]/no.siteswhere intensity = total signal − (total signal × percent NA/100); clearsites = no: sites − sites culled by slope; realsamples = no: samples − (no: samples × percent single site samples × 100); and no. sites = number of modification sites in the cluster. This formula is modified from that previously defined (23): “single site samples” is the number of cases where a sample in the cluster contains only one site, and “single sample sites” represents the number of cases where a site in the cluster is represented in only one sample. The “culled by slope” function sorts sites and samples from largest to smallest within each cluster and measures the slope of the regression line for each site in all the samples. If the slope is negative, then the site follows the general pattern in the cluster. If the slope is positive, then the site is more highly expressed in different samples than the rest of the group and is culled.

External evaluation

The Pathway Commons data set was extracted from publicly available resources (20). Interactions were filtered to include only “in-complex-with,” “controls-phosphorylation-of,” “controls-state-change-of,” “controls-expression-of,” and “controls-transport-of.”The gene names of modified proteins were used to retrieve interactions among proteins within all clusters, and the number of edges was quantified.

PPI edges

PPI edges were retrieved from Pathway Commons, as described above, and from STRING, GeneMANIA (using the website or the Cytoscape plugin) (15, 16), BioPlex (12), and the kinase-substrate data from PhosphoSitePlus (13, 14). Text mining, colocalization, and coexpression edges were excluded to focus on interactions that are likely part of cell signaling pathways. We found it most useful to limit interactions to those defined by direct physical interactions from curated pathways. For example, genetic interactions may be direct or indirect, so they were excluded in some cases to focus on direct physical interactions (84). A CFN was constructed from a PPI data set with highly curated molecular interactions with a focus on direct interactions for which there is strong evidence (Fig. 8C) (20, 85).

Network analyses

For the CCCN, we used clustering to filter correlations between all sites. The resulting protein CCCN was used to create a CFN of PPI interactions that were filtered on the basis of clustering by excluding all interactions save those from proteins with coclustered modifications. To identify pathways that are likely to be active in lung cancer cell lines, shortest paths were determined using the R package igraph ( A composite of all the shortest paths was assembled for Figs. 2 (A and B) and 3D. Centrality was identified by node degree and betweenness and calculated using igraph functions. Networks were also graphed to show protein modifications with correlation edges from the CCCN. In the CCCN graphs, correlation edges depict PTMs that cocluster with edges representing Spearman correlations greater than the threshold of |0.543|, as defined in fig. S9. In addition, for Figs. 5B and 7 (B and D), negative correlations less than −0.5 are shown between different modifications within the same protein.

Website construction

To enable exploration of the networks in detail by other investigators, we developed a website for interactive visualization ( Our novel infrastructure allows a web client (browser) to interface to a server-based instance of Cytoscape. Networks and their underpinning data are stored on the server as R data files, and each web session corresponds to an instance of Cytoscape on the server. The web client presents a graph rendered by the Javascript library cytoscape.js. Actions on the site initiate queries to the Python server, which in turn uses the R-to-python interface (rpy2) to call the relevant R function. R communicates with the server-based instance of Cytoscape via the RCy3 library ( (86). The cyREST interface (87) returns updated graph information to the client, in the form of JSON data, by connecting Cytoscape to a Python Flask-WSGI server.

Cell fractionation

H3122 cells were treated with 1 nM geldanamycin for 15 hours or not treated, and then mechanically permeabilized using a Balch homogenizer (10, 46, 47). Permeabilized cells were centrifuged at 1000g for 10 min, and the organelles in the supernatant were applied to velocity sedimentation gradients, as described previously (10, 46). Four fractions were collected from organelle gradients, as shown in Fig. 9A. Lysosomes are found in fraction 1, and endosomes in fractions 2 and 3 (10, 46). The 1000g pellets (cell ghosts) were solubilized for 1.5 hours on ice in 1% Triton X-100 and subjected to floatation equilibrium gradients, as described, except that the 10,000g centrifugation step was omitted (10, 47). Four fractions from the floatation equilibrium gradients were collected: Fraction 1 is from the bottom of the gradient, fraction 2 contains the lipid rafts as previously characterized, and fractions 3 and 4 contain additional floating membranes at the top of the gradient, as shown in Fig. 9A. Samples were prepared for TMT mass spectrometry as described above, except without immunoprecipitation with modification-specific antibodies, so total protein amounts were measured; 10 mass tag labels were used (Thermo TMTplex); and samples were analyzed on an Orbitrap Fusion Lumos (Thermo Fisher Scientific). The iBAQ method was used to normalize signals, where a protein’s total intensity is divided by the number of tryptic peptides between 6 and 30 amino acids in length (88). Western blots with lysates from H3122 and H3255 cells were then performed to verify results from select proteins in duplicate experiments. Antibodies to β-actin (#4970), phospho-CTTN Tyr421 (#4569), CTTN (#3503), phospho-ERK1/2 Thr202/Tyr204 (#4370), CCT2 (#3561), CLTC (#4796), CRKL (#3182), LASP1 (#8636), and YWHAZ (#7413) were from Cell Signaling Technology; antibody to ERK1/2 (sc-93) was from Santa Cruz Biotechnology. All antibodies were incubated at 1:1000 (except phospho-ERK1/2, 1:2000) overnight at 4°C according to the manufacturers’ protocol. Duplicate blots were used to probe phospho- and non–phospho-CTTN and ERK1/2. Western blot chemiluminescent signals were obtained on a Fuji LAS-3000 and quantified using Image Gauge software. The amount of each protein in each fraction as a percent in the whole cell was then calculated by dividing Western blot or mass spectrometry signals by the total in all fractions (10, 47). Ratios were then calculated to compare geldanamycin-treated to control samples.


Text S1. Data analysis considerations and network development.

Fig. S1. Example t-SNE embeddings.

Fig. S2. Initial evaluation of clusters derived from t-SNE embeddings.

Fig. S3. Correlation between replicates.

Fig. S4. Number of edges per gene in clusters.

Fig. S5. Correlation between degree and betweenness.

Fig. S6. CFN betweenness is not biased by overrepresentation in PPI databases.

Fig. S7. The number of posttranslational modifications per protein weakly correlates with CFN betweenness.

Fig. S8. The effect of applying PTM correlation threshold on CFNs.

Fig. S9. Network density as a function of PTM correlation threshold defines a minimum for construction of a PTM CCCN.

Fig. S10. Relative network density of drug-affected proteins.

Fig. S11. EP300 interactions with PTM-modifying enzymes.

Fig. S12. Combined EP300 CCCN and CFN.

Fig. S13. Comparison of tyrosine phosphorylation, serine/threonine phosphorylation, and acetylation.

Fig. S14. Western blot data from cell fractionation experiments.

Fig. S15. Separation of tyrosine and serine/threonine phosphorylation and relative abundance compared to paxDB.

Table S1. Proteins with PTMs that have negative correlations.

Table S2. Amount of proteins in cell fractions.

Data file S1. Cytoscape file: Complete cocluster correlation network.

References (8991)


Acknowledgments: M.G. and M.C. dedicate this paper to the memory of E. Herbert. We thank S. Beausoleil for data wrangling and discussions, A. Guo for antibody information, N. Fernandez for the gene expression clustergrammer visualization, and J. Syrenne for comments on the manuscript. Experimental data were obtained at Cell Signaling Technology. Funding: This work was supported by the NIH grants BD2K LINCS Data Coordination and Integration Center (grant number U54HL127624) and Knowledge Management Center for the Illuminating the Druggable Genome project (grant number U24CA224260). Author contributions: M.G. performed the data analysis and cell fractionation experiments and wrote the manuscript; J.G., W.C., and T.W. constructed the website; B.H., T.L., L.F., and K.R. performed the experiments; K.R., T.L., and M.C. designed the experiments; B.Z., E.S., N.R.C., A.L., P.H., and A.M. contributed to data curation and analysis; M.C. funded the experiments; and A.M. edited the manuscript and managed the funding for data analysis. Competing interests: M.C. is the founder and CEO of Cell Signaling Technology and Bluefin Biomedicine. All the other authors declare that they have no competing interests. Data and materials availability: There are patents on the generation of the motif antibodies and PTM scan technology used in this study (“Production of Motif-Specific and Context-Independent Antibodies Using Peptide Libraries as Antigens,” U.S. Pat. No. 9085609 and US Pat. No. 9738711). Data are available at, and networks may be explored at and All other data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
View Abstract

Navigate This Article