Research ResourcePosttranslational modification

Proteome-Wide Identification of SUMO2 Modification Sites

See allHide authors and affiliations

Science Signaling  29 Apr 2014:
Vol. 7, Issue 323, pp. rs2
DOI: 10.1126/scisignal.2005146


Posttranslational modification with small ubiquitin-like modifiers (SUMOs) alters the function of proteins involved in diverse cellular processes. SUMO-specific enzymes conjugate SUMOs to lysine residues in target proteins. Although proteomic studies have identified hundreds of sumoylated substrates, methods to identify the modified lysines on a proteomic scale are lacking. We developed a method that enabled proteome-wide identification of sumoylated lysines that involves the expression of polyhistidine (6His)–tagged SUMO2 with Thr90 mutated to Lys. Endoproteinase cleavage with Lys-C of 6His-SUMO2T90K–modified proteins from human cell lysates produced a diGly remnant on SUMO2T90K-conjugated lysines, enabling immunoprecipitation of SUMO2T90K–modified peptides and producing a unique mass-to-charge signature. Mass spectrometry analysis of SUMO-enriched peptides revealed more than 1000 sumoylated lysines in 539 proteins, including many functionally related proteins involved in cell cycle, transcription, and DNA repair. Not only can this strategy be used to study the dynamics of sumoylation and other potentially similar posttranslational modifications, but also, these data provide an unprecedented resource for future research on the role of sumoylation in cellular physiology and disease.


Posttranslational modification alters the activity, function, and fate of modified proteins. There are many types of posttranslational modifications ranging in size from small chemical groups, such as phosphate, to large protein molecules, like ubiquitin and ubiquitin-like proteins (Ubls). The diversity and number of posttranslational modifications increase the complexity of the proteome by several orders of magnitude. In addition to ubiquitin, the mammalian Ubl family includes at least 11 other proteins that conjugate to lysine residues and share a highly conserved structural fold in spite of low sequence conservation (1). Small ubiquitin-like modifiers (SUMOs) are essential for cell viability (2) and cellular responses to stress conditions, including heat shock (3, 4), proteasome inhibition (5, 6), and DNA damage (7, 8). Mammalian cells express three SUMO paralogs that are conjugated to target proteins. SUMO2 and SUMO3 differ by only three amino acids, and share 46 to 48% amino acid identity with SUMO1 (9). SUMO2 and SUMO3 form chains (10, 11) that can promote ubiquitin-mediated degradation of target proteins (12, 13).

SUMO maturation and conjugation is a multistep process. Translation of mRNAs encoding SUMO1, SUMO2, and SUMO3 produces inactive pro-proteins that are activated by SUMO-specific proteases, which remove the inhibitory C-terminal residues, exposing two glycines known as a diGly motif. SUMO conjugation involves three distinct enzymatic activities, known as E1, E2, and E3. The heterodimeric E1, composed of SUMO-activating enzymes 1 and 2 (SAE1 and SAE2), uses ATP (adenosine 5′-triphosphate) to form a thioester bond between the sulfhydryl group of the Cys in its active site and the carboxyl group of the C-terminal Gly of SUMO. The Cys in the active site of the E2 enzyme Ubc9 accepts SUMO from the E1 enzyme by transthiolation. SUMO E3 ligases catalyze the transfer of SUMO from Ubc9 to substrate proteins, forming an isopeptide bond between the ε-amino group of the Lys in the substrate protein and the C-terminal carboxyl of Gly in SUMO. SUMO E3 ligases fall into two classes: PIAS [protein inhibitor of activated STAT (signal transducer and activator of transcription)] proteins are similar to RING (really interesting new gene) domain–containing ubiquitin ligases, whereas other SUMO E3 ligases, such as RanBP2 (14), do not contain RING domains (15). The removal of SUMO from substrate proteins is catalyzed by SUMO-specific proteases, including six sentrin-specific proteases (SENPs), which cleave isopeptide bonds between the SUMO C-terminal carboxyl group and substrate proteins (16).

The identity of SUMO-modified proteins is typically determined by either mutational analyses or large-scale proteomic approaches. Both strategies rely on the enrichment of SUMO substrates from complex protein mixtures because the sumoylated form of a protein constitutes only a small proportion of the total protein abundance. Affinity purification can be used to enrich for SUMO substrates from cultured cell lysates (17) and knock-in mice (18) expressing tagged forms of SUMO. Combining stringent enrichment methods with mass spectrometry (MS)–based proteomics enables the identification of hundreds of substrates in a single experiment (4, 6, 19). SUMO-specific antibody-based substrate enrichment methods can be used to identify endogenous SUMO substrates by MS (20, 21). Moreover, SUMO interaction motifs (SIMs) have been exploited to enrich for endogenous SUMO substrates (22). Together, these methods have identified hundreds of putative SUMO substrates, but definitive evidence for sumoylation requires information about the specific lysine that is modified.

Heretofore, there are no MS-based techniques that enable researchers to identify site-specific sumoylation on a scale of hundreds of sites in a single experiment. The primary limitation is the inherent complexity of peptide mixtures derived from protein level purifications of SUMO-modified protein. Peptide level enrichment strategies for other posttranslational modifications, including phosphorylation and acetylation, improve the frequency and quality of modified peptide identification (23, 24). Likewise, immunoaffinity purification of ubiquitylated peptides using antibodies specific to the diGly tryptic remnant on ubiquitin-conjugated lysines (25) enables the detection of more than 10,000 sites of ubiquitylation in a single study (26, 27).

The identification of sumoylated sites is limited by the fact that “bottom-up” MS-based proteomic studies typically use trypsin for protein digestion, which creates large branched peptides that are not amenable to standard database search algorithms. Branched side-chain remnants resulting from trypsin digestion of mammalian SUMO1, SUMO2, or SUMO3 comprise either 19 or 32 amino acids of the SUMO C terminus covalently bound to Lys on the substrate peptide. Whereas these remnants have a unique mass-to-charge signature, their fragmentation results in complex MS2 spectra that complicate the identification of the substrate sequence. Various approaches used to interpret MS2 spectra of branched sumoylated peptides (2830) have limited utility in complex mixtures associated with proteomic analyses (31). As an alternative approach, trypsin recognition sites engineered into the C terminus of SUMO can produce shorter branched side chains more amenable to MS-based analyses (32, 33). For example, sumoylation by exogenously expressed Q87R or T90R mutated SUMO2 promotes modified substrates that when digested with trypsin produce remnants of five or two amino acids that can be detected by MS to identify SUMO conjugation sites (29).

To enable the large-scale site-specific analysis of sumoylation, we developed a new peptide-specific enrichment approach. We stably expressed polyhistidine (6His)–tagged SUMO2 with Thr90 mutated to Lys (SUMO2T90K) in human embryonic kidney (HEK) 293 cells. Cells were heat-shocked to enhance sumoylation, SUMO2T90K-conjugated proteins were purified under denaturing conditions using the His tag, and after cleavage with endoproteinase Lys-C, a diGly-Lys–specific antibody was used to enrich SUMO2T90K remnant–containing peptides. Analysis of the resultant peptide mixtures identified more than 1000 sites of modification by SUMO2, providing an unprecedented resource for future studies. In addition, although SUMO2 was used in this study to allow comparison with the existing data, the principles of this strategy could be applied to other SUMOs and Ubls in various cellular systems.


Functional analysis of SUMO2T90K

Digestion of ubiquitinated proteins with trypsin gives rise to a diGly remnant linked to the acceptor lysine, which can be used to enrich for modified peptides by affinity purification with a selective diGly-Lys–specific monoclonal antibody (anti-KεGG) (25). The Ubl family members, ubiquitin, NEDD8, and ISG15 each contain an Arg preceding a diGly sequence at the C terminus (Fig. 1A). Thus, trypsin-mediated cleavage of proteins conjugated to these Ubls creates an identical branched side-chain consisting of diGly conjugated to Lys. We mutated Thr90 to Lys in SUMO2 (SUMO2T90K), creating a Lys N terminal to the diGly motif at the C terminus (Fig. 1A), and generated HEK293 cells stably expressing 6His-SUMO2T90K. Similar to parental cells, SUMO2T90K-expressing cells had a comparable doubling time and morphology (fig. S1) and responded to heat shock by increasing the abundance of SUMO-modified proteins (Fig. 1B).

Fig. 1 Alignment of human Ubls and functional analysis of SUMO2T90K.

(A) Sequence alignment of the C terminus of human Ubl family proteins that terminate in a diGly motif. Sequence of the predicted peptide after Lys-C digestion is highlighted in bold. Lys and Arg are underlined. (B) Western blots for 6His (top) and SUMO2 (bottom) in lysates of parental HEK293 NS3 cells or those stably expressing 6His-SUMO2T90K and either unstressed (37°C) or heat-stressed for 30 min at 43°C. *Nonspecific immunoreactivity. Data are representative of three independent experiments. (C and D) Images of Coomassie-stained protein gels of in vitro sumoylation (C) or desumoylation (D) reactions of recombinant wild-type or T90K mutant SUMO2 using SP100 as a substrate. Data are representative of three independent experiments.

To assess whether the SUMO2T90K mutation had any effect on SUMO2 conjugation or cleavage, we performed in vitro enzyme assays. We found that purified SUMO2 and SUMO2T90K showed little difference in the rate of polymerization and conjugation to the substrate proteins SP100 (Fig. 1C), IRF2, RanGAP1, or PML (fig. S2A). Likewise, SUMO2 and SUMO2T90K had similar effects on the rate of cleavage from these substrates by SENP1 (Fig. 1D and fig. S2B). Finally, SENP1 and SENP2 cleaved the inactive proforms of recombinant SUMO2 and SUMO2T90K at a similar rate (fig. S2C).

Proteome-wide identification of sumoylation sites in human cells

We established a workflow using 6His-SUMO2T90K–expressing cells that enabled us to select for SUMO-modified proteins and enrich for peptides containing diGly-conjugated lysines (Fig. 2A). Cells (~2 × 109) were heat-stressed at 43°C for 30 min to enhance sumoylation (3) and lysed under denaturing conditions. Proteins conjugated to 6His-SUMO2T90K were isolated by nickel affinity chromatography and subsequently digested with Lys-C, or Lys-C and Glu-C to truncate long substrate peptides. Peptides containing diGly-Lys were enriched with anti-KεGG and analyzed by MS using parameters optimized for the identification of low-abundance peptides. Using this strategy, we identified 2747 peptides, of which 1217 (44%) contained a diGly-Lys motif (table S2). These peptides represented 1002 unique sumoylated sites in 539 proteins (table S1).

Fig. 2 The identification of 1002 SUMO2T90K-modified sites in human cells.

(A) Depiction of the biochemical purification strategy. LC-MS/MS, liquid chromatography–tandem MS. (B) Graph showing the number of sumoylated (gray) or total (black) peptides after nickel affinity chromatography (Ni-NTA) and subsequent enrichment with diGly-Lys–specific antibody (KεGG). Samples were analyzed by MS set to analyze the 10 most abundant peptides with 60-ms fill time (T10 60 ms) or the single most abundant peptide with a 1-s fill time (T1 1s). The number of unique SUMO modification sites is shown in brackets. (C) Venn diagram showing comparison among sumoylated sites identified in this study, in an MS-based study using Lys-deficient 6His-SUMO2Q87R/T90R–expressing HeLa cells (29), and in all other studies using MS as annotated in the PhosphoSitePlus database (34). The total number of sumoylation sites is shown in brackets.

To estimate the reproducibility of the SUMO2T90K-based enrichment strategy, we repeated the experiment. We performed the SUMO2T90K protein and peptide enrichment workflow in duplicate on a lysate of HEK293 culture. In this analysis, we identified 468 sumoylated sites in common with the 1002 SUMO2T90K-modified sites found in the first experiment, of which 323 (69%) were common between replicate purifications (fig. S3), indicating a high degree of reproducibility. In addition, this analysis suggested that relatively small-scale experiments (~6 × 108 cells) are sufficient to identify large numbers of sumoylated peptides. Moreover, we determined that setting the mass spectrometer to fragment and scan the most abundant peptide from the initial full MS scan (“top 1”) with a maximum 1-s fill time yielded almost twice the number of successful identifications of sumoylated peptides as using “top 10” data-dependent acquisition with a 60-ms fill time (Fig. 2B). From these data, we determined that antibody-based enrichment of SUMO2T90K-modified peptides increased the percentage of identified sumoylated peptides from 0.05% to greater than 31% compared to nickel affinity purification alone (Fig. 2B).

We also compared our data to published MS-based analyses of posttranslational modifications. SUMO2T90K-modified sites overlapped with 55% of SUMO2-modified sites detected in the largest MS-based study of sumoylation (29) and 39% of sites identified in all MS-based studies of sumoylation (34) (Fig. 2C). Slightly less than half of SUMO2T90K-modified proteins overlapped with proteins identified in published studies involving protein-level SUMO substrate identification (fig. S4, A and B). However, the fact that these studies used different cell types, which likely contain different sets of sumoylated proteins, may, at least in part, explain the relatively low overlap. Comparison of functional annotations of sumoylated proteins discovered here and in other large-scale SUMO2 proteomic studies revealed similar enrichment of sumoylated proteins in cellular processes, including gene expression, RNA metabolism, DNA replication, repair and recombination, and cell proliferation, survival, and growth (fig. S4C). Only 146 SUMO2T90K-modified sites (14.6%) overlapped with known sites of acetylation (34) (table S1), and 378 SUMO2T90K-modified sites (37.8%) overlapped with known sites of ubiquitylation (34) (table S1). Whereas the accuracy of the estimated percentage of overlap among these studies could be compromised by the relative completeness of the data sets, these observations at least suggest that ubiquitination and, less commonly, acetylation can occur on the same residues as sumoylation.

Sequence context of SUMO2 modification sites

Initial findings of a consensus motif for SUMO modification [ψKxE or ψKxD, where ψ represents a hydrophobic residue, and x represents any amino acid (35)] focused experimental investigation toward these sites, which may explain the fact that most SUMO-modified sites discovered to date conform to this consensus. Small-scale proteomic studies confirmed this, but also identified a less common inverted consensus motif DxKψ or ExKψ (29). Our SUMO2T90K-modified peptide data set enabled us to investigate the sequences of sumoylation sites using robust statistical analyses. We used pLogo (36) to analyze the sequence of diGly-Lys–containing peptides. This analysis revealed a significant overrepresentation of Glu two amino acids C-terminal to the conjugated Lys (position +2), of hydrophobic residues Ile or Val at position −1, and, to a lesser degree, of the acidic residues Asp or Glu at position −2 (Fig. 3A). This result is consistent with a mixed population of sumoylated peptides containing forward and inverted sites. Unexpectedly, we also found that 90 sumoylated peptides had Glu or Asp at both positions +2 and −2 simultaneously (table S1), suggesting that some sites contain “forward and inverted”–type consensus motifs. We separately analyzed sites containing forward, inverted, or forward and inverted consensus motifs for additional conserved residues. As expected, most sequences with forward consensus motifs contained Glu at position +2 (90%) and a large hydrophobic residue at position −1, typically either Val (31.7%) or Ile (26.5%) (Fig. 3B), although Leu, Phe, and Pro were also overrepresented. For sequences with inverted consensus motifs, there was an approximately equal likelihood of Glu or Asp at position −2, but unexpectedly, no preference for large hydrophobic residues at position +1 (Fig. 3C). Instead, there was an overrepresentation of Val or Ile at position −1 and Glu at position +2, largely in peptides with the forward and inverted motif. Together, sites with forward or inverted consensus motifs represented 72.6% of all SUMO modification sites. The sequences of the remaining sumoylated sites did not show a strong consensus, although there was a small but significant overrepresentation of Gly at position +2 (Fig. 3D).

Fig. 3 Sequence analysis of SUMO2T90K-modified peptides.

(A to D) Sequence logo graphs of amino acid sequence conservation surrounding (A) 1002 SUMO2T90K-modified sites, (B) a subset of 534 sites with acidic residue (Glu or Asp) in position +2, (C) a subset of 283 sites with acidic residue (Glu or Asp) in position −2, and (D) 275 residual lysines. The fraction in the parentheses represents the number of sites with a full 13–amino acid sequence divided by the total number of sites that conformed to that motif. The y axis corresponds to the log-odds of the binomial probability (π). Threshold values of 3.68 (P < 0.05) are shown in red and marked with red horizontal lines. Note different scales of y axes.

Potential for multisite modification

Reduced electrophoretic mobility of sumoylated proteins has led researchers to predict that proteins contain multiple SUMO modification sites (22). We found that about one-third of SUMO2T90K-modified proteins had more than one sumoylation site (Fig. 4A and table S1). For example, PARP1 had 15 sites, hnRNPM had 14 sites, and hnRNPU, GTF2I, and Uba2 had 12 sites (Fig. 4B and table S1). Because Uba2 is a component of the heterodimeric E1 SUMO-activating enzyme (37, 38), this suggests that it may undergo autosumoylation. Sumoylation sites were clustered in some proteins (Fig. 4B), implying that these regions are more susceptible to modification. We were not able to determine whether multiple sumoylation sites were simultaneously modified for most proteins due to methodological limitations. However, because Lys-C is inhibited by posttranslational modification of Lys, we were able to identify a few peptides with simultaneous sumoylation of adjacent lysines (table S1 and SMfile1.PDF). SUMO1, SUMO2, and SUMO3 have multiple posttranslational modifications, including acetylation at their N terminus and sumoylation on internal lysines (table S1 and SMfile1.PDF). The major site involved in chain formation on SUMO2 and SUMO3 is Lys11 (10), which we frequently detected as a SUMO2T90K-modified site. We also detected doubly modified peptides of SUMO2 and SUMO3, including one modified at Lys11 and Lys7, and the other modified at Lys5 and Lys7 (table S1 and SMfile1.PDF), suggesting that multiple branching patterns may occur.

Fig. 4 Evidence for multisite modification by SUMO2T90K.

(A) Graph of the number of SUMO2T90K-modified sites per protein. (B) Graphical representation of clustering of sumoylation site in highly modified proteins (nine or more sites per protein). The length of the bar corresponds to the number of amino acids in the protein, and the red triangles indicate the location of sumoylation sites. The numbers of sumoylation sites and amino acids in the protein are shown in parentheses.

Evidence for protein group modification

SUMO modification of diverse substrates is transiently increased in response to proteotoxic stress (3, 4, 6). Analysis of the functional annotations of 539 SUMO2T90K-modified proteins in our data set derived from heat-shocked cells revealed a statistical overrepresentation of proteins involved in highly interconnected complexes (Fig. 5) that regulate gene expression; DNA replication, recombination, and repair; and cell growth and proliferation (fig. S4C). Zinc finger transcription factors were frequently identified in our analysis with evidence for 85 SUMO2T90K-modified sites distributed among 52 different proteins (table S1). Several studies investigating individual zinc finger transcription factors identified sites of SUMO modification in these proteins by mutational analysis (39) and demonstrated that sumoylation is generally associated with transcriptional repression (40). Similarly, 78 sumoylation sites were present in 14 heterogeneous ribonucleoprotein particle proteins (hnRNPs) (Fig. 5), consistent with previous studies (41). Our data extended previous observations by demonstrating extensive SUMO modification of hnRNPs, including 14 sites on hnRNPM, 11 sites on hnRNPU, 9 sites on hnRNPA0, 9 sites on hnRNPUL1, and 8 sites on hnRNPC. In addition, as expected from a previous analysis of endogenous sumoylated proteins (22), we identified multiple SUMO2T90K-modified proteins involved in DNA replication, recombination, and repair (Fig. 5). Thus, these results suggest that groups of functionally related proteins are likely subjected to contemporaneous SUMO modification.

Fig. 5 Proteins with multiple SUMO2T90K-modified sites were present in functionally related protein interaction networks.

Merged display of four protein interaction networks of selected SUMO2T90K-modified proteins involved in RNA posttranscriptional modification; DNA replication, recombination, and repair; gene expression; cell cycle; and cellular development. The size of the nodes is proportional to the number of identified sumoylation sites. Network information, protein details, and modification site numbers can be found in table S3.

A database of virtual sites to search for modifications with wild-type SUMO

Branched peptides generated from tryptic digestion of proteins modified by wild-type SUMOs produce complex MS2 fragmentation spectra that are not recognized by standard annotation algorithms (11, 28). To enable automated searching for these peptides, we used our list of SUMO2T90K-modified sites to generate virtual branched peptide databases of artificial peptide sequences. In this database, the C-terminal tryptic peptide of either SUMO1 or SUMO2 was fused to the N terminus of the predicted tryptic peptide corresponding to observed SUMO2T90K-modified sites (Fig. 6A and SUMO1.FASTA and SUMO2.FASTA).

Fig. 6 Using a virtual branched peptide database to identify wild-type sumoylation sites.

(A) Bioinformatic approach for the analysis of MS2 spectra of tryptic branched peptides derived from proteins modified by wild-type SUMO1 or SUMO2. The list of SUMO2T90K-modified sites was used as a template to create a database of virtual branched peptides (11). Each virtual peptide consists of a C-terminal tryptic fragment of SUMO1 or SUMO2 joined to the N terminus of the tryptic peptide encompassing the sumoylated Lys. This database was used to annotate spectra in MS-based proteomics data. (B) Annotated MS2 spectra of a peptide of c-Myb modified at Lys648 by SUMO2T90K (top) or by endogenous SUMO (bottom). (C) Annotated MS2 spectra of a peptide of ubiquitin modified at Lys11 by SUMO2T90K (top) or by TAP-SUMO2 (bottom).

Many SUMO2T90K-modified peptides were also phosphorylated (table S1). Therefore, we used the virtual branched peptide databases to search raw data files containing high-resolution spectra from an analysis of phosphopeptide-enriched samples (42). We identified a branched peptide using the virtual branched SUMO2 database that contained the long tryptic remnant of wild-type SUMO2 conjugated to Lys648 of the transcription factor c-Myb (Fig. 6B). In the SUMO2T90K data, we only detected the Ser653 phosphorylated form of the peptide (table S1), consistent with the fact that the sequence surrounding Lys648 on c-Myb corresponds to a phosphorylation-dependent sumoylation motif (ψKxExxSP). This observation is consistent with independent studies that show sumoylation of Lys648 (43, 44) and phosphorylation of Ser653 (4547).

To identify additional branched peptides modified by wild-type SUMO2, we used MS to reanalyze peptides from a proteomic study using exogenously expressed SUMO2 fused to a tandem affinity purification (TAP) tag (6) and searched the resultant spectra using the virtual branched SUMO2 peptide database. We identified 14 additional sumoylated peptides (SMfile2.PDF), including one containing sumoylation of Lys11 of ubiquitin (Fig. 6C). These data validated the SUMO2T90K results and demonstrated that using virtual branched peptide databases based on the SUMO2T90K data for reanalysis of existing MS2 spectra may be useful for identifying sites sumoylated by both endogenous and overexpressed wild-type SUMO2.


Methods to enrich for posttranslationally modified peptides are instrumental in identifying modified sites on a proteome-wide scale by MS. Without enrichment, the relatively low abundance of modified peptides causes most of them to go undetected in complex mixtures. Previous studies have used strategies that employ overexpression of SUMO with mutations in the C terminus that shorten the side-chain remnants on substrate proteins after trypsin digestion (29, 32, 48). These strategies use affinity chromatography to enrich for proteins modified by tagged SUMO but lack efficient peptide-based enrichment steps. To enrich for sumoylated peptides, we created a stable cell line expressing 6His-SUMO2T90K and performed metal affinity purification of SUMO2T90K-modified proteins and, after cleavage with Lys-C, antibody-based enrichment of sumoylated peptides with anti-KεGG. Analysis of diGly-Lys–containing peptides by MS with instrument settings optimized for the detection of low-abundance peptides identified 1002 sumoylation sites among 539 proteins. Peptide level enrichment increased the frequency of identification of sumoylated peptides by more than 600-fold.

Previous studies suggest that sumoylation primarily occurs on proteins containing the forward consensus motifs ψKxE and ψKxD or the inverted consensus motifs ExKψ and DxKψ (29). However, our study revealed that, for sites with the forward consensus motif, more than 90% contained Glu at the +2 position, suggesting that ψKxE may be a better substrate for sumoylation than ψKxD. Glu and Asp were equally likely at the −2 position in the inverted consensus motifs containing peptides, and a hydrophobic residue at +1 was not overrepresented. Together, the forward and inverted consensus motifs accounted for more than 70% of SUMO2T90K-modified peptides. The sequences of the remaining SUMO2T90K-modified peptides showed no consensus except for the modest overrepresentation of Gly at position +2. Because Gly does not have a side chain, this may suggest that steric hindrance inhibits sumoylation at nonconsensus motifs.

Previously, we used MS to identify sumoylation sites on proteins that were sumoylated in vitro using wild-type recombinant SUMO2 and digested with trypsin (49). However, this approach is not conducive to the analysis of complex protein mixtures, such as those derived from purification of cell lysates. Here, we used the list of SUMO2T90K-modified peptides to create a virtual branched peptide database to search for branched peptides resulting from the trypsin-mediated cleavage of sumoylated proteins. This approach enabled the annotation of previously unassigned MS2 spectra corresponding to proteins sumoylated with endogenous or overexpressed tagged wild-type SUMOs. Thus, this method could be used to search for sumoylation sites in existing proteomic data sets designed to address various biological questions.

Consistent with previous studies showing that groups of functionally related proteins are coordinately modified by SUMO (4, 6, 22, 50), we found evidence for group modification. Specifically, 85 sites distributed between 52 different zinc finger proteins, 78 sites on only 14 hnRNPs, and an extensive network of proteins involved in DNA replication, recombination, and repair, and cell cycle bearing many sites of SUMO modification. This supports the idea of protein group modification by SUMO, where the limited number of SUMO E3 ligases and proteases appear to work to direct SUMO to large protein complexes, which may be stabilized by multiple SUMO-SIM interactions (50).

Most proteomic studies to date have focused on SUMO2. However, the method we described here is broadly applicable to other SUMO paralogs and other Ubls. Thus, this method may be useful in addressing key questions about the biology and biochemistry of Ubls. For example, which sumoylation sites are shared among SUMO paralogs and which are unique, is there site specificity in the sumoylation response to stresses, and what are the sites of modification of other Ubls?


Generation of HEK293 N3S cells stably expressing 6His-SUMO2T90K

pEFIRESpuro-6His-SUMO2 was created by cloning a polymerase chain reaction–generated 6His-SUMO2 fusion into the Nhe I and Not I sites of the plasmid vector pEFIRES-P-eYFP-C1 (51), replacing the coding sequence of eYFP (enhanced yellow fluorescent protein). The SUMO2T90K mutation was introduced into pEFIRESpuro-6His-SUMO2 by site-directed mutagenesis. The SUMO2 coding regions of all plasmids were fully sequenced. HEK293 N3S cells (Sigma-Aldrich, 92052131) grown in suspension culture were transfected using Lipofectamine 2000 (Life Technologies) with pEFIRESpuro-6His-SUMO2T90K and selected with puromycin at 2 μg/ml. Thereafter, stable cell populations were maintained in growth medium containing puromycin (1 μg/ml).

Cell culture and protein extraction

HEK293 N3S cells stably expressing 6His-SUMO2T90K were cultured in Dulbecco’s modified Eagle’s medium (DMEM) (Gibco), supplemented with 10% dialyzed fetal calf serum (FCS), puromycin (1 μg/ml), and penicillin and streptomycin (100 U/ml). Cells were grown on five 175-cm2 dishes to about 90% confluency before their transfer into Eagle’s minimum essential medium (spinner modification; Sigma-Aldrich) supplemented with 10% FCS, puromycin (1 μg/ml), 2 mM l-glutamine, and penicillin and streptomycin (100 U/ml). Three liters of cell culture (~2 × 109 cells) was stimulated by heat shock at 43°C, harvested by centrifugation, washed twice with cold 1× DPBS (Dulbecco’s phosphate-buffered saline; Gibco), and lysed in cell lysis buffer [6 M guanidinium-HCl, 100 mM sodium phosphate buffer (pH 8.0), 10 mM tris-HCl (pH 8.0), 20 mM imidazole, 5 mM β-mercaptoethanol] (6 ml of lysis buffer per 1 g of cell pellet). DNA was disrupted by short pulses of sonication, insoluble particles were removed with 0.2-μm sterile filters (Sartorius), and protein concentration was determined by BCA (bicinchoninic acid) assay (Pierce). In the experiment designed to evaluate reproducibility, duplicate 750-ml cultures of HEK293 6His-SUMO2T90K corresponding to 6 × 108 cells were processed in parallel.

Nickel affinity chromatography

Nickel affinity purification of 6His-SUMO2T90K conjugates was performed with Ni2+-NTA agarose beads (Qiagen) according to a published protocol (52) with minor changes. Ni2+-NTA agarose beads (250 μl) were added to the cell lysate (286 mg of total protein for the primary experiment or 48 mg of protein for the experiment designed to evaluate reproducibility) and mixed at 4°C for 16 hours. Beads were washed with cell lysis buffer, wash buffer pH 8.0 [8 M urea, 100 mM sodium phosphate buffer (pH 8.0), 10 mM tris-HCl (pH 8.0), 10 mM imidazole, 5 mM β-mercaptoethanol], wash buffer pH 6.3 [8 M urea, 100 mM sodium phosphate buffer (pH 6.3), 10 mM tris-HCl (pH 8.0), 10 mM imidazole, 5 mM β-mercaptoethanol], and again with wash buffer pH 8.0. Proteins were eluted in three sequential steps with 1.5 column volumes of elution buffer [8 M urea, 100 mM sodium phosphate buffer (pH 8.0), 10 mM tris-HCl (pH 8.0), 200 mM imidazole, 5 mM β-mercaptoethanol].

Filter-aided sample preparation and protein digestion

Digestion of 6His-SUMO2T90K proteins was performed on 30-kD cutoff filter units (Sartorius) according to a published protocol (53) with minor changes. Samples were concentrated on filter units, washed twice with UA buffer [200 μl of 8 M urea, 100 mM tris-HCl (pH 7.5)], and treated with 50 mM chloroacetamide in UA buffer for 20 min in the dark. Samples were then washed twice with UA buffer, three times with 200 μl of immunoprecipitation (IP) buffer [50 mM Mops-NaOH (pH 7.2), 10 mM Na2HPO4, 50 mM NaCl], and digested for 16 hours with Lys-C (Wako) in 50 μl of IP buffer at 37°C (enzyme-to-protein ratio, 1:50). Samples were collected and the filters were washed with 50 μl of IP buffer to increase the yield of Lys-C peptides. Peptides retained on the filter units were subsequently digested with endoproteinase Glu-C for 16 hours in 50 μl of IP buffer at 20°C (enzyme-to-protein ratio, 1:100), and after collection of peptides, the filter units were again washed with additional 50 μl of IP buffer. To analyze nickel affinity purifications directly, a small fraction of the nickel column elution (~2 μg) was diluted 10-fold in 8 M urea, 100 mM tris-HCl (pH 7.5), treated with 50 mM chloroacetamide for 1.5 hours at 20°C, then diluted four times with 50 mM ammonium bicarbonate, and mixed with Lys-C (enzyme-to-protein ratio, 1:50), followed by digestion at 20°C for 16 hours.

Immunopurification of diGly-Lys–containing peptides

Anti-KεGG conjugated to protein A beads (20 μl of beads; PTMScan, Cell Signaling Technology) was washed three times with 1 ml of 50 mM sodium borate (pH 9.0), resuspended in 1 ml of fresh 20 mM dimethyl pimelimidate (DMP), and incubated for 30 min at room temperature while rotating. Beads were washed twice with 1 ml of 200 mM ethanolamine (pH 8.0) for 2 hours at 4°C, three times with 1.5 ml of cold IP buffer, and stored in IP buffer at 4°C. Enrichment of diGly-Lys–containing peptides with anti-KεGG was performed according to a published protocol (54) with minor changes. Anti-KεGG (19 μg) cross-linked to protein A beads (3 μl) was added to peptide mixtures (300 μg for the primary experiment or 360 μg for the experiment designed to evaluate reproducibility) in IP buffer and incubated at 4°C for overnight while rotating. Beads were washed twice with 500 μl of cold 1× DPBS, and peptides were eluted twice with 50 μl of 0.1% trifluoroacetic acid.

MS analysis and data processing

Before MS analysis, all peptide samples were desalted on self-made reverse-phase C18 (Empore) Stop and Go Extraction Tips (55) and analyzed by liquid chromatography–tandem MS on a Q Exactive mass spectrometer (Thermo Scientific) coupled to an EASY-nLC 1000 Liquid Chromatography system (Thermo Scientific) via an EASY-Spray ion source (Thermo Scientific). Purified peptides were loaded onto 75-μm × 500-mm EASY-Spray column (Thermo Scientific) at a maximum pressure of 800 bars, and various gradient lengths from 90 to 150 min were used with a linear gradient of 5 to 22% of solvent B (100% acetonitrile, 0.1% formic acid) in solvent A (0.1% formic acid), followed by a ramp to 40% of solvent B. Flow rate was set to 250 nl/min, and eluting peptides were injected online into the mass spectrometer. Various Q Exactive settings were tested depending on the complexity of the samples; however, optimal data acquisition for low-complexity diGly-Lys–containing peptides was achieved with the following parameters: Precursor ion full-scan spectra [mass/charge ratio (m/z), 300 to 1600] were acquired with a resolution of 70,000 at m/z 400 (target value of 1,000,000 ions, maximum injection time of 20 ms). Up to one data-dependent MS2 spectrum was acquired with a resolution of 35,000 at m/z 400 (target value of 500,000 ions, maximum injection time of 1000 ms). Ions with unassigned charge state, and singly or highly (>8) charged ions were rejected. Intensity threshold was set to 2.1 × 104 U. Peptide match was set to preferred, and dynamic exclusion option was enabled (exclusion duration, 40 s).

Data analysis and manual validation of the results

Raw MS data files were processed using MaxQuant software (version (56, 57) and searched against UniProtKB human proteome (canonical and isoform sequences; downloaded in April 2013). Enzyme specificity was set to cleave peptide bonds C-terminally to Lys residues for samples treated with Lys-C only, or C-terminally to Glu, Asp, and Lys residues for samples digested with Lys-C and Glu-C. A maximum number of three or five missed cleavages were allowed for samples cleaved with Lys-C, or Lys-C and Glu-C, respectively. Carbamidomethylation of Cys was set as a fixed modification and oxidation of Met, acetylation of protein N termini, phosphorylation of Ser, Thr, and Tyr, and diGly adduction to Lys (except in peptide C terminus) were set as variable modifications. A minimum peptide length was set to seven amino acids, and a maximum peptide mass was 10,000 daltons. A false discovery rate (FDR) of 1% was set as a threshold at both protein and peptide level, and a mass deviation of 6 parts per million (ppm) was set for main search and 0.5 dalton for MS2 peaks.

MS2 spectra (filtered at 1% FDR) of sumoylated peptides were manually validated and annotated using MaxQuant viewer expert system (58) on the basis of the following criteria: (i) good coverage of y- and b-ion series, (ii) extensive identification rate of intensive fragment ion peaks, (iii) mass error less than 2 ppm after mass recalibration or 4 ppm in case of unsuccessful recalibration, and (iv) preferential fragmentation N-terminally to proline or C-terminally to Glu and Asp residues. All peptides identified from the reverse decoy database were removed, and the probability of site localization was checked to be greater than 75%. Existence of a diagnostic peak corresponding to a fragment ion of diGly residue (GG+, m/z +115.0505 Th) or presence of an ion series with a neutral loss corresponding to single Gly (m/z 57.0215 Th) or an identification of various modified peptides corresponding to the same sumoylation site contributed to the confidence in assigned sumoylation sites. Finally, all accepted modified peptides corresponding to more than one protein were merged into one site identifier indicated in table S1. To investigate experimental reproducibility, a SUMO site was considered “identified” if it matched a peptide from the first list of 1002 modified peptides.

In vitro SUMO conjugation, deconjugation, and processing assays

Recombinant substrate proteins used in these assays have been described previously (59). All reactions were buffered in 50 mM tris-HCl (pH 7.5). SUMO2 proform processing assays contained 150 mM NaCl, 0.5 mM TCEP, 600 μM SUMO, and 100 nM SENP1 or 200 nM SENP2 recombinant catalytic domains (59), and reactions were incubated at 20°C for 0, 5, 10, 20, 30, 60, 90, 120, 180, 240, or 960 min. Conjugation assays contained 5 mM dithiothreitol, 5 mM MgCl2, 2 mM ATP, 110 nM SAE1 and SAE2, between 0.5 and 2 μM Ubc9, ~10 μM substrate protein, and a range of SUMO2 concentrations (0, 40, 80, or 200 μM) and were incubated at 37°C for 4 hours. Deconjugation assays were prepared as above using 200 μM SUMO followed by addition of SENP1 to 10 nM, and reactions were monitored at 0, 0.5, 1, 2.5, 5, and 10 min at 20°C.

Immunoblot analysis

HEK293 N3S cells either stably expressing or not 6His-SUMO2T90K were grown on 75-cm2 flasks in DMEM supplemented with 10% FCS, puromycin (1 μg/ml) (where required), and penicillin and streptomycin (100 U/ml). Where indicated, heat shock treatment at 43°C for 30 min was applied. Cells were washed twice with 1× DPBS and lysed in 4% SDS, 100 mM tris-HCl (pH 7.5). Immunoblots were prepared using mouse antibody specific to 6His (Clontech no. 631212) or rabbit antibody specific to SUMO2 (Zymed, 91-5100).

TAP-SUMO2 sample preparation and MS analysis

TAP-SUMO2 protein samples remaining from a previous study (6) were fractionated by protein electrophoresis, the gel was cut into nine slices, and tryptic peptides were extracted, as described previously (60). Peptide mixtures were analyzed by MS as described with the following alterations: High-performance liquid chromatography fractionation occurred over a linear 115-min fractionation containing a gradient from 15 to 32% acetonitrile, with a loop count of 3, an m/z precursor scan range of 800 to 1800, +3 to +7 inclusion charge range, 5 × 105 AGC (automatic gain control) target or 500-ms maximum injection time, and 35,000 at 400 m/z resolution for MS2 scans.

Virtual branched peptide database

Virtual branched peptide databases were created as a modification of the procedure described by Matic et al. (11). A python script was created to extract protein sequence information for each SUMO-modified protein to determine the tryptic peptide encompassing each SUMO target lysine. Cleavages at KP (lysine-proline) and RP (arginine-proline) were ignored, and protein N-terminal methionines were omitted. Each tryptic peptide was appended at the N terminus with either the tryptic fragment of SUMO1 C terminus (IADNHTPKELGMEEEDVIEVYQEQTGG) or that of SUMO2 C terminus (FRFDGQPINETDTPAQLEMEDEDTIDVFQQQTGG), which include a single missed cleavage in this region of SUMO. Raw data files were searched against the databases (SUMO1.FASTA and SUMO2.FASTA) using MaxQuant version (56, 57). Maximum peptide size was set to 10,000 daltons. A whole human proteome database was used for first search, and the branched peptide database(s) was used for main search. Missed cleavages were set to at least 3, and no FDR filtering was used at the protein or peptide level. All spectra were computationally (58) and manually validated.

Bioinformatics analysis

Sequence analysis was performed with pLogo (36). Because not all identified peptides could be assigned to a single protein, multiple 13–amino acid sequences were used for input in these cases. N- or C-terminal sequences that did not cover the 13 residue window were omitted from the output. Residues were scaled relative to their Bonferroni-corrected statistical significance using human proteome as a background data set (645,531 lysines). Protein functional annotation and network analyses were created using Ingenuity Pathway Analysis (Ingenuity Systems, Qiagen).


Fig. S1. Comparison of parental and 6His-SUMO2T90K–expressing HEK293 cells.

Fig. S2. Comparison of SUMO2 and SUMO2T90K in conjugation and deconjugation reactions with various substrates.

Fig. S3. Overlap of SUMO2T90K sites in replicate experiments.

Fig. S4. Overlap of sumoylated proteins with published protein level proteomic studies.

Table S1. Sumoylation sites identified in SUMO2T90K-expressing cells.

Table S2. diGly-modified peptides identified in SUMO2T90K-expressing cells.

Table S3. Functional annotation and network analysis of SUMO2T90K-expressing cells.

SMfile1.PDF. MS2 spectra of multiply sumoylated peptides identified from SUMO2T90K-expressing cells.

SMfile2.PDF. MS2 spectra of branched sumoylated peptides identified from SUMO2-expressing cells.

SUMO1.FASTA. SUMO1 virtual branched peptide database.

SUMO2.FASTA. SUMO2 virtual branched peptide database.


Acknowledgments: We thank A. K. Garg (University of Dundee) for help with branched peptide database creation. Funding: T.T. is funded through the EU Seventh Framework Programme (FP7A-PEOPLE-2011-ITN). I.M. was supported by a Sir Henry Wellcome Fellowship (Wellcome Trust 088957/Z/09/Z). E.G.J. and M.H.T. are funded through a Cancer Research UK programme grant (C434/A13067). R.T.H. holds a Wellcome Trust Senior Investigator Award (098391/Z/12/Z). Author contributions: T.T. optimized sample processing and conducted sample preparation, MS analysis, data analysis, and bioinformatic analysis. R.T.H. and I.M. conceived the enrichment and sample processing strategy. I.M. developed workflow for the MS analysis and branched peptide analysis and edited the manuscript. E.G.J. purified recombinant proteins and conducted in vitro assays. A.F.M.I. created the stable cell line. M.H.T. consulted at multiple stages, conducted Ingenuity Pathway Analysis, constructed branched peptide databases, and prepared TAP-SUMO2 peptide samples. R.T.H., T.T., and M.H.T. cowrote the manuscript. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All MS raw files will be publicly available and are currently accessible at Individually annotated spectra for each of the SUMO-modified peptides are available upon request.
View Abstract

Stay Connected to Science Signaling

Navigate This Article