Research ResourceCancer

Breast tumors educate the proteome of stromal tissue in an individualized but coordinated manner

See allHide authors and affiliations

Science Signaling  08 Aug 2017:
Vol. 10, Issue 491, eaam8065
DOI: 10.1126/scisignal.aam8065

Profiling the tumor stroma proteome

Communication between a tumor and cells in the surrounding stroma contributes to tumor growth, progression, and drug resistance. Thus, targeting this communication, in the primary tumor and especially in metastatic niches, may be an effective way to treat cancer. Wang et al. grew patient breast tumors subcutaneously in mice and obtained species-distinguished proteomic profiles of the tumors (human) and tumor-associated stroma (mouse). The authors found that all breast tumors consistently altered clustered subsets of the stromal proteome, particularly proteins involved in immune signaling, but that these varied in a subtype- and stage-specific manner. These findings may have future implications for treatment stratification and provide a platform from which to understand this experimental model and tumor-stroma interactions on a large-scale protein level.


Cancer forms specialized microenvironmental niches that promote local invasion and colonization. Engrafted patient-derived xenografts (PDXs) locally invade and colonize naïve stroma in mice while enabling unambiguous molecular discrimination of human proteins in the tumor from mouse proteins in the microenvironment. To characterize how patient breast tumors form a niche and educate naïve stroma, subcutaneous breast cancer PDXs were globally profiled by species-specific quantitative proteomics. Regulation of PDX stromal proteins by breast tumors was extensive, with 35% of the stromal proteome altered by tumors consistently across different animals and passages. Differentially regulated proteins in the stroma clustered into six signatures, which included both known and previously unappreciated contributors to tumor invasion and colonization. Stromal proteomes were coordinately regulated; however, the sets of proteins altered by each tumor were highly distinct. Integrated analysis of tumor and stromal proteins, a comparison made possible in these xenograft models, indicated that the known hallmarks of cancer contribute pleiotropically to establishing and maintaining the microenvironmental niche of the tumor. Education of the stroma by the tumor is therefore an intrinsic property of breast tumors that is highly individualized, yet proceeds by consistent, nonrandom, and defined tumor-promoting molecular alterations.


Tumors typically encounter a naïve microenvironment twice: during local invasion of the primary tumor and upon metastatic dissemination and acclimatization to foreign tissues. Pernicious tumors are adept at remodeling the microenvironment to coevolve tumor-promoting niches. Understanding how primary tumors acquire the ability to direct modification of normal tissue remains a rate-limiting step in our ability to constrain tumor growth and metastasis.

In breast cancer, the stroma plays an important role in progression (1) and resistance to therapy (2, 3). Genomic analysis shows close clonal relationships between primary tumors and their metastases, and no metastasis-specific mutations have been identified to date (4, 5). Together, this has led to the prevailing notion that a key mechanism underlying tumor growth and metastasis is the interaction of a tumor with the neighboring stroma to educate its microenvironment (68) and that this occurs via tumor-intrinsic changes in the transcriptome, epigenome, or proteome. However, the lack of experimental models with sufficient biological complexity to both model the multicellular tumor microenvironment and allow unambiguous delineation of the tumor from the naïve stroma has limited molecular characterization of stromal education by tumors. Therefore, it is unclear what component of a tumor’s education of the stroma is cell-autonomous and tumor-intrinsic versus an adaptation to the host’s cues or resources in the naïve or tumor-associated microenvironment. Proteomic investigation of the tumor-stroma interface is therefore a much needed step to clarify the regulatory programs that enable development of the cellularly heterogeneous tumor microenvironmental niche.

Patient-derived xenograft (PDX) models provide the most reproducible experimental approximation of primary human tumors for cancer research (9). Like primary tumors, PDXs are complex tissues composed of multiple distinct cell types that heterotypically interact. PDXs model the architecture of the original tumor and prove to be valuable as preclinical models for drug treatment studies (10, 11). Tumors evolve through a multistep progression in which they acquire a succession of metastatic and survival capabilities. Serially passaging PDXs serves as a basic model of both local invasion and metastasis in which cancer cells disseminate from a supportive microenvironment and need to form a new niche in a new environment to survive (12, 13). PDXs are rather stable genetically and cytogenetically during serial passage (14) as well as in proteomic and phosphoproteomic analyses (15).

Breast cancer stroma are extensively studied at the mRNA level, and analysis of large primary tumor data sets finds that stromal gene expression signatures can predict clinical outcome in both breast (16) and colorectal (17) cancer. Proteome-level characterization of the stroma is limited to targeted analysis with specific antibodies used for immunohistochemistry or proteomic characterization of isolated stromal subfractions such as the matrisome (18) or decellularized extracellular matrix (ECM) (19). Several mRNA studies focus on the stroma in PDX models and confirm stable gene expression patterns (13, 20). However, PDXs present a unique opportunity to differentiate proteins in patient-derived tumors from the naïve, murine-derived microenvironment by using species-specific differences in amino acid sequence, thus eliminating the need for physical or biochemical separation. Although large-scale unbiased proteomic characterization of stromal remodeling has not been performed, it holds tremendous potential to clarify the tumor-stroma interaction because many important factors in the microenvironment are regulated at the protein level, including enzymatic remodeling and export of regulatory proteins into the ECM.

The persistence, prevalence, heterogeneity, and coordination of stromal education by individual patient tumors remain largely unexamined, perhaps because analysis of a single patient tumor sample, without the availability of biological replicates, cannot distinguish whether molecular changes in the microenvironment are due to normal variation, sample preparation variability, or tumor-specific regulation. We sought to investigate the process by which breast tumors establish a tumor-promoting niche using cross-species proteomic profiling of breast cancer PDXs. As a starting point to investigate tumor education of their microenvironments, we used subcutaneously implanted breast cancer PDXs, a common model of tumor-host interactions. Our goal was to determine whether there is heterogeneity in patient breast tumors’ stromal education and to delineate which components of the stromal proteome are susceptible to regulation by breast tumors. We found that regulation of PDX stromal proteins by tumors was persistent, individualized, and biologically organized.


Species-specific proteomic characterization of breast cancer PDXs reveals widespread, tumor-specific stromal regulation

The seven subcutaneous breast cancer PDXs that were proteomically profiled (workflow in fig. S1) were previously described (15) and represent basal (n = 3), HER2-enriched (HER2E; n = 1), luminal B (n = 2), and claudin-low (n = 1) breast cancer subtypes. No PDX was analyzed before three passages in mice (table S1) to allow full exchange of the tumor’s stroma with mouse elements (21). Three biological replicates of each patient tumor representing multiple PDX passages were analyzed. All PDXs were passaged in female mice except for six basal tumors, which are estrogen receptor–negative, and were passaged in male mice (table S1). All breast tumors used to generate PDXs in this study metastasized, and three of the PDXs (WHIM2, WHIM12, and WHIM20) often formed metastases in mice (table S1); therefore, the PDX models investigated have metastatic capacity. The engrafted PDX tumors were carefully dissected away from normal mouse tissues once they reached 1500 mm3, and proteomic analysis was performed to identify species-unique proteins, genes, and peptides from each species (Fig. 1A). Only species-unique peptides were used to accurately quantify and distinguish the tumor proteome (with human-unique peptide sequences) from the murine-derived proteins of the naïve tumor microenvironment (fig. S1). Protein products of 2038 human and 415 mouse genes were detected with gene-specific peptides across all samples after filtering to remove abundant murine plasma proteins and erythrocyte markers (fig. S2; see also Materials and Methods).

Fig. 1 Cross-species proteomic profiling finds that the proteome of breast cancer PDXs share common protein abundance changes in both the tumor and microenvironment across biological replicates.

(A) Number of genes, proteins, and peptides identified from species- and gene-unique peptide sequences. (B) Box plot presentation of protein abundance correlation for all seven PDXs (n = 3) within biological replicates of the same PDX (intra-PDX) compared to other PDXs (inter-PDX) for the tumor (human) and naïve microenvironment (mouse). The P value of the correlation differences between intra- and inter-PDX mouse proteins was 2.7 × 10−7. (C) Clustered heatmaps of differentially regulated human and mouse proteins. Each column represents a biological replicate of each PDX. Male mice used for six of the basal subtype PDX samples are denoted (hash). Values are median normalized by column and log2-transformed. (D) PCA of differentially regulated human and mouse proteins, with each point representing one of three biological replicates for each PDX.

Relative quantification was performed with TMT10 isobaric tagging [fig. S6 (see also Materials and Methods) and tables S2 and S3]. Relative protein abundance in process replicates was highly correlated with an average R2 of 0.75 (fig. S3A), indicating high technical reproducibility in the proteomic workflow. A high correlation of human protein abundance was also observed in biological, serially passaged replicates of the same PDX (fig. S3), suggesting that the proteomes of the engrafted tumors remained stable. Relative changes in the mouse proteome were also highly correlated within biological replicates of individual PDX tumor models (fig. S3A). Median correlation of species-unique proteins and peptides within biological replicates of the same PDX (intra-PDX) was more than 10-fold higher than across PDX lines (inter-PDX). Median intra-PDX R2 values were 0.53 and 0.10 for human and mouse proteins, respectively (fig. S3B). These data suggested that tumors are intrinsically capable of differentially educating stromal proteomes, which we further investigated here.

Individual breast cancer PDXs induce unique and biologically coordinated stromal protein signatures

One-way analysis of variance (ANOVA) was performed to identify specific proteins in the tumor and the PDX stroma that were differentially regulated across the subcutaneous PDX models. Eighty-two percent and 35% of human and mouse proteins representing the tumor and stroma, respectively, exhibited significant changes in protein abundance that were consistent across biological replicates of a PDX. Median intra-PDX R2 values were 0.63 and 0.44 for human and mouse proteins after ANOVA filtering, respectively (Fig. 1B). Biological replicates of each PDX coclustered based on both human and mouse protein abundance (Fig. 1C). Hierarchical clustering segregated mouse stromal proteins into six clusters (Fig. 1C). All differentially abundant stromal proteins in each of the six clusters are listed in table S4. Despite the small sample size, all PDXs clustered adjacently by subtype on the basis of human tumor protein abundance, whereas nearly all PDXs clustered by subtype on the basis of mouse stromal protein abundance (Fig. 1C). Principal component analysis (PCA) was performed to orthogonally assess whether patterns in the proteomes, especially those of the stroma, were consistent within biological replicates of individual PDX tumors. Again, the biological replicates of each PDX were closely grouped on the basis of both human tumor– and mouse stroma–unique protein abundance (Fig. 1D). PCA of ANOVA-filtered proteins showed no clustering by gender for greater than 80% of the data’s cumulative variance, suggesting that the use of male mice for basal tumors was not a source of significant variation in the data (fig. S4). Together, these data demonstrated that patient breast tumors consistently induce unique stromal proteomic changes when placed in a naïve microenvironment. In addition, these results revealed that the magnitude of tumor-specific stromal education is substantial, encompassing over one-third of the detected mouse proteins. These data also established that tumor education of the stroma is prevalent among breast tumors, given that it occurred in all seven breast tumors analyzed. Therefore, stromal regulation during colonization of a naïve microenvironment is predominantly an intrinsic property of a patient tumor that persists across biological replicates and passages, with regulatory components that are stably ingrained.

To examine whether the differentially regulated PDX stromal proteins share a common biological role, unbiased annotation by the Molecular Signatures Database (MSigDB) (22) was performed using the Hallmark, KEGG (Kyoto Encyclopedia of Genes and Genomes), and Reactome gene sets. Each of the six stromal clusters had significant enrichment of specific and unique molecular signatures (Fig. 2A). Specific components of the ECM, including laminins, collagens, and elastic fibers, as well as actin-cytoskeleton binding proteins, were each enriched in single clusters, including clusters I, II, and III (Fig. 2A). Cluster II associated with cancer-associated fibroblast (CAF) activity because it contains the CAF-secreted factors POSTN, FBN1, COL1A1, COL12A1, and BGN (23). Cluster IV corresponded to members of the complement system, which are known to facilitate tumor survival (24, 25). Clusters V and VI were annotated to multiple molecular signatures with high statistical significance but lacked a clear biological organizing principle. This is a limitation of even large databases of curated gene sets, as no analysis can fully and precisely encompass all biological processes. The annotation of each cluster, especially clusters V and VI, was therefore refined using literature analysis (Fig. 2B). Cluster V contained numerous markers of myeloid-derived suppressor cells (MDSCs) (7, 26), potent immunosuppressors that can be recruited to the invasive edge of tumors by tumor-derived chemoattractants (27) to promote tumor survival (7). Cluster V also contained three glycolytic enzymes—PKM, GPI, and LDHA—as well as transketolase (TKT) in the pentose phosphate shunt, consistent with tumor regulation of extracellular metabolites (2831). Cluster VI contained several canonical markers that are signatures of M2-type tumor-associated macrophages (TAMs) (3234), in addition to endoplasmic reticulum (ER) processing and stress as suggested by the MSigDB analysis. M2-type TAMs are also recruited to the microenvironment and secrete proinflammatory factors that support tumor survival (27, 35).

Fig. 2 Individual breast PDXs have unique microenvironmental protein signatures.

(A) Unbiased MSigDB annotation of proteins (by gene symbol) found within each stromal cluster. (B) Bar plots of relative protein abundance for specific molecular signatures including the ECM, complement system (Complement sys.), ER biology, and TAMs. Each PDX has a unique signature consisting of multiple proteins within each category. Error bars are the SD of biological replicates (n = 3). (C) Differential patterns of mouse protein signatures were observed between individual tumors. Gene sets annotated in (A) were used to compare with the baseline using a Student’s t test for each combination of gene set and tumor. The intensity of color key is proportional to the magnitude of −log10 adjusted P value (Benjamini-Hochberg method). Color choices are assigned according to the directionality of deviation from baseline protein abundance (red, up; blue, down). Three biological replicates were analyzed for each PDX.

To assess whether stromal regulation by individual PDXs occurs by the co-regulation of multiple proteins with shared biological roles, we quantified all stromal proteins within each molecular signature across biological replicates for each PDX. The stromal protein changes within each signature were highly coordinated (Fig. 2B). Tumor-specific education of stromal proteomes was also unique and individualized, as demonstrated by highly significant differences in each protein signature across PDX tumors (Fig. 2C). All seven PDXs coordinately regulated at least one stromal cluster with statistical significance. Together, breast PDXs individually educate their microenvironment with highly coordinated stromal proteomic changes that encompass well-defined protumorigenic signatures.

PDX stromal protein signatures are common in patient breast tumors and separate patients by subtype and stage

Despite PDX’s burgeoning position as the leading experimental model to approximate the biology and genetics of human tumors (9), the use of PDXs to investigate the biology of tumor niche formation by unambiguous delineation of tumor- and stromal-derived murine proteins remains undeveloped. Limited adoption is partly because PDXs lack the original stromal complement of the primary tumor and NSG mice that PDXs are engrafted onto do not produce cells of the lymphoid lineage. Notably, PDX models do contain murine myeloid-derived cells that play important roles in tumor growth, chemotaxis, and invasion in the microenvironment (27).

We therefore sought to validate that the stromal protein signatures seen in the subcutaneous PDX models were consistent with regulation of the microenvironmental niche by primary tumors. The breast TCGA (The Cancer Genome Atlas) transcriptomic data set (36) and the corresponding CPTAC (Clinical Proteomic Tumor Analysis Consortium) proteomic data sets (37) were examined to determine whether primary patient tumors similarly co-regulate proteins within the stromal clusters at the mRNA and protein levels. Because the TCGA data cannot be used to differentiate tumor from stroma, we focused on clusters IV to VI that represent stromal proteins that are not often found in the tumor itself and represent the stromal microenvironment. Only three human homologs of the 24 mouse proteins in clusters IV to VI were consistently detected in all seven PDXs, and only six human homologs were detected in any PDX, confirming that these proteins are primarily stromal-derived. Because the complement system is almost entirely regulated at the protein level (38), only protein results were shown for cluster IV. We found highly correlated gene products within stromal clusters IV to VI at both the mRNA and protein levels for the 1095 and 77 publicly available TCGA breast cancer patient samples, respectively (Fig. 3A). Monte Carlo simulation was used to determine the statistical significance of co-regulation within each stromal cluster. All stromal clusters were significantly correlated in patient tumors at both the mRNA and protein levels (Fig. 3B). Examples of individual primary breast tumors co-regulating mRNA or protein from multiple genes within stromal clusters IV to VI, similar to the PDX stromal protein clusters in Fig. 2B, are also shown (fig. S5).

Fig. 3 Stromal proteomic signatures separate patient breast tumors by stage and subtype.

(A) Spearman’s correlation coefficients for both the mRNA (n = 1095, TCGA) and protein (n = 77, CPTAC) data sets (by gene symbol) of stromal clusters IV to VI. Because the complement system is regulated at the protein level (38), only protein results are shown for cluster IV. (B) P values for the correlation matrices determined using Monte Carlo simulation by sampling 10,000 randomized sets of equal size to the test clusters and ranking the sums of the Spearman’s correlation coefficients. (C and D) Box plot of CPTAC proteomics data (37) for proteins in clusters IV and VI by intrinsic subtype (C) and tumor stage (D) (n = 77). Error bars are SEM. P value is determined by Student’s t test.

The biological pathway annotations assigned to the stromal protein signatures are well known to play a role in breast cancer tumorigenicity and colonization. For example, a stromal-based serpin-enriched cluster correlates with better prognosis (3). In addition, TAM markers, as determined through mRNA analysis of the stroma from 53 primary tumors, associate with poor outcome (16). We therefore assessed whether stromal protein clusters IV to VI correlated with three clinically relevant parameters in the TCGA set: overall survival, intrinsic subtype, and tumor stage. There was no association between stromal clusters and patient survival. Because stromal proteomic signatures separated all but one PDX by subtype in the seven PDXs analyzed (Fig. 1C), we next determined whether stromal clusters IV to VI stratified patient tumors by subtype in the much larger TCGA proteomic data set. Strikingly, all but 1 of the 24 genes in the three clusters had significantly increased mRNA expression in basal versus nonbasal TCGA tumors (Fig. 3C). This result was consistent with the hypothesis that basal, predominantly triple-negative, tumors are more immunogenic than other subtypes and recruit myeloid-derived cells (39, 40).

All members of stromal clusters V and VI, but not cluster IV, had greater protein quantities in stage I tumors (Fig. 3D). Stromal proteomic clusters can therefore differentiate patient tumor populations, perhaps because stage I tumors, like PDXs, are in the initial stages of setting up the microenvironmental niche. The TCGA proteomic data contain only a single stage IV metastatic tumor, which was excluded from the analysis.

Together, the stromal proteomic regulation by patient tumors observed in PDX models was consistent with stromal changes induced by primary tumors and was associated with clinically relevant parameters including intrinsic subtype and tumor stage. No correlation by stage or subtype was seen at the mRNA level, suggesting that stage and subtype regulation of the stromal clusters occur at the protein level.

Integrating tumor and stromal proteome analysis reveals that cancer hallmarks are highly correlated with stromal regulation

It has been hypothesized that oncogenic signaling observed in cancer cells is selected for pleiotropic protumorigenic roles that include regulation of the microenvironment and not simply promotion of proliferation and transformation (41). These data were consistent with this hypothesis because tumor-intrinsic factors are responsible for widespread, individualized changes in the stromal proteomes (Figs. 1 and 2). This hypothesis also suggests that the tumor proteome encodes the educational programs relevant to stromal regulation, which agrees with studies indicating that epigenetic factors are primarily responsible for colonization (4, 5). We therefore performed an integrated analysis of the tumor and stromal proteomes to discover novel candidates in the tumor responsible for stromal education in subcutaneous PDX models.

To assess the relationship between the tumor and stromal proteomes, the protein abundance of each mouse gene in the six stromal clusters was correlated to protein abundance of each human gene across all 21 samples in the data set. To broadly determine the biological roles of the proteins in the tumor that were associated with each stromal protein signature, human proteins were ranked by their maximum absolute correlation to each stromal cluster and analyzed by Gene Set Enrichment Analysis (GSEA), using only the small, highly curated MSigDB “Hallmarks” gene set (42). The MSigDB Hallmarks gene set contains 50 refined sets that broadly describe many important biological processes with minimal redundancy (42). Twenty-three of the 50 hallmarks were enriched in at least one stromal cluster using a stringent cutoff of <10% false discovery rate (FDR) (Fig. 4A). Nearly all of the gene sets identified in the MSigDB hallmarks are implicated in Hanahan and Weinberg’s hallmarks of cancer (41) (Fig. 4A), consistent with the hypothesis that the hallmarks of cancer are pleiotropically responsible for educating the microenvironment in addition to tumor-intrinsic biology. The MSigDB hallmarks gene sets found in the tumor were highly associated with specific stromal clusters associated with microenvironmental regulation including epithelial-to-mesenchymal transition, apical junctions, hypoxia signaling, as well as interferon-α (IFN-α) and IFN-γ responses.

Fig. 4 Tumor proteins that are highly correlated with stromal protein clusters.

(A) Unbiased MSigDB annotation of proteins found within each stromal (mouse) cluster and assignment to Hanahan and Weinberg’s Hallmarks of cancer (n = 21 PDX tumors) (41). FDR was calculated using default parameters in GSEA. (B) Correlation of proteins of individual tumor (human) proteins to stromal (mouse) protein clusters (n = 21 PDX tumors). Numbers of significant (Benjamini-Hochberg adjusted P < 0.05) positive (red) and negative (blue) correlations between human proteins and mouse protein clusters are indicated on the x axis. Human proteins are displayed by the genome coordinates of their genes based on University of California, Santa Cruz hg19 annotation.

Local invasion and metastatic colonization are complex, multistep processes, and the proteins involved remain incomplete. One unanswered question is what fraction of a tumor’s proteome is dedicated to regulatory cross-talk with the microenvironment. To provide one estimate using integrated PDX tumor-stroma analysis, we determined how many stromal proteins were significantly correlated with each tumor protein. These results (Fig. 4B) demonstrated that a large portion of the tumor proteome was co-regulated with the stroma. Nineteen proteins in the tumor were significantly correlated with at least 40% of proteins in one or more of the six stromal clusters, and 44 were significantly correlated with at least 30% of stromal proteins in a single cluster. Among the proteins in the tumors that were most correlated, both positively and negatively, with the stromal proteome (Table 1) are many known to regulate the tumor-stroma interface. PEBP1, also known as RKIP, a metastasis suppressor that decreases infiltration of tumors by myeloid cells (43), was significantly negatively correlated with seven proteins in stromal cluster V, which contains MDSC markers. Nardilysin 1 (NRD1), a metalloproteinase that can remodel the ECM, was significantly negatively correlated with nine proteins in the ECM-related cluster II. Notably, the list of tumor proteins highly correlated with stromal protein clusters included many with unknown roles in the tumor microenvironment. The full list of tumor-stroma correlation values (data file S1) serves as a set of candidates responsible for stromal remodeling.

Table 1 Tumor proteins most correlated with stromal proteomic signatures.

Human proteins (by gene symbol) most correlated to specific stromal protein signatures. The median Spearman’s correlation coefficients (ρ) are highlighted as red (positive) or blue (negative) in breast PDX tumors (n = 21).

View this table:


Tumors reprogram their microenvironment to form a tumor-associated niche (6), but our current conceptualization of tumor-stroma remains static and superficial, with limited understanding of the dynamic, progressive process of stromal education by tumors. Many of the key processes involved in remodeling of a naïve microenvironment are depicted using cell lines and xenografts, but it is unclear which of these are relevant to, and used by, patient tumors. In addition, molecular knowledge of microenvironmental signaling interactions remains far from complete. The phenotypic plasticity of cancer cells coupled with the potential selection of myriad possible cellular subclones of tumors intimates that development of the microenvironmental niche may be highly mutable, dynamic, and context-dependent. We used species-specific proteomic analysis of serially passaged subcutaneous breast cancer PDXs to address three major questions. First, can unbiased cross-species proteomic profiling provide sufficient depth and repeatability to improve our conceptualization of the tumor-stroma interface in breast cancer? Second, how do patient breast tumors educate a naïve microenvironment during colonization and what is the molecular persistence and heterogeneity of this process across tumors? Third, do subcutaneous PDXs provide molecular insights into stromal education that are consistent with what occurs in primary patient breast tumors?

Holistically, we found that tumor-specific education of the stroma was highly prevalent, individualized, and molecularly coordinated in breast cancer. The reproducibility of a tumor’s education of stromal proteins is consistent with the notion that patient breast tumors are locked into a predetermined set of instructions due to metabolic, energetic, or other deficiencies that are rigid and specific (31). This is likely selected for because of the numerous obstacles that normal cells have to overcome to become tumorigenic. Alternatively, tumors harbor cancer stem cells (44), which may be enriched in PDXs during passage and lead to the limited stromal proteomic heterogeneity within a single tumor line observed across multiple passages for each PDX. The lack of observed proteomic heterogeneity within a single PDX tumor line suggests that renewed emphasis on drugging the stroma as an antitumor strategy may be highly effective.

These results were enabled by the high repeatability, multiplexing, and depth of coverage of the cross-species quantitative proteomics method used in this study, and serve as an initial characterization of proteomic stromal education by patient tumors. A previously published study used species-specific peptide sequences in an effort to identify proteins exported to the ECM (45); our study here was a large-scale analysis of the tumor and microenvironment in PDXs using species-specific proteomics and yielded a 10-fold increase in proteomic coverage of the microenvironment over the previous report (45), including detection and quantitation of 4784 human and 1721 mouse genes at the proteome level with species- and gene-unique peptides. Fifty-one percent and 52% of peptides and genes, respectively, were filtered out due to the lack of species specificity, and an additional 3% of protein identifications were filtered out due to the lack of gene specificity. Our study focused on proteomic variability of PDXs across biological replicates and passages, as occurs during routine PDX propagation, which was consistent with a median R2 value of 0.47. This robust cross-species proteomic platform has the potential to characterize perturbations in PDXs and the tumor-stroma interface because of genomic heterogeneity across subtypes and in response to drug treatment to uncover drug resistance mechanisms. PDXs complement syngeneic mouse models because PDXs are derived from patient tumors and also enable unambiguous molecular characterization of the tumor versus stroma. Future studies using PDXs of other cancer origins are needed to determine whether heterogeneity of the proteomes of tumor microenvironments is specific to breast cancer.

Delineating the molecular cross-talk between tumor and stromal cells in the tumor microenvironment remains an important challenge in an effort to understand tumor biology (4648), especially because of its correlation with anticancer response (4953) and metastasis (6, 54). Our study characterizes proteins in the stroma that are susceptible to differential induction by the tumor (table S4) and identifies tumor-specific proteins that are highly correlated with stromal protein signatures (data file S1). This is an important step toward the identification of microenvironmental targets that inhibit tumor growth or even return tumor cells to normal, which has been described as the next frontier in anticancer agents (55). These results were consistent with the prevailing view that tumors colonize their microenvironment through regulation of the ECM and recruitment of cells and factors, given that we observed changes in the ECM (clusters II and III) and regulation of the complement system (cluster IV), MDSCs (cluster V), and macrophages (cluster VI). Many of the differentially quantified stromal proteins identified correlate with breast cancer prognosis and treatment. For example, nine stromal genes with differential protein quantities, including S100A9 and LCP1, have mRNA quantities in the stroma that correspond to patient outcome (16). Considering the high co-regulation of stromal protein signatures in both PDX and patient tumors in the TCGA, these observations demonstrated that PDX models largely recapitulate how primary tumors interact with the microenvironment and are good models to study the tumor-stroma interface despite their lack of a lymphoid system, which also limits the utility of PDX models to study immunotherapy and other T cell–driven biology. Notably, this regulation occurs in PDXs grown subcutaneously, in a nonorthotopic location. Because many of the identified biological pathways are druggable, these results suggest that proteomics can guide therapeutic studies targeting the stroma. Furthermore, proteomic characterization of PDX models may provide insight into how seemingly dormant micrometastases are prompted to grow by tumor removal, an important consideration in treatment decisions.

Tumors have previously been shown to harness the stroma for their benefit, but the extent of stromal education by patient tumors at the level of the proteome has not been reported. Our analysis suggested that stromal education is far more pervasive, coordinated, and individualized at the protein level than previously anticipated. Despite each tumor educating a unique repertoire of stromal proteins, the shared regulation of stromal protein signatures suggests that they play a critical role in tumorigenesis. The high consistency and molecular rigidity of stromal education by patient tumors suggest that elucidating how individual tumors alter their stromal environment will yield significant insights into this important and complex aspect of tumor biology.



Materials were purchased from Sigma-Aldrich except for dithiothreitol, trifluoroacetic acid (TFA), and NaCl (Thermo Fisher Scientific) and tris and glycine (Bio-Rad). Liquid chromatography–mass spectrometry (LC-MS) solvents were purchased premixed from Honeywell. High-pH reversed-phase (RP) fractionation columns and TMT10 reagents were from Thermo Fisher Scientific.

Generation and preparation of xenograft tumor samples for proteomic analysis

The source and generation of all PDXs in this study were previously reported (15). All human tissues for these experiments were processed in compliance with National Institutes of Health regulations and institutional guidelines approved by the Institutional Review Board at Washington University. All animal procedures were reviewed and approved by the Institutional Animal Care and Use Committee at Washington University in St. Louis, MO. PDX tumors from established basal (WHIM2, WHIM4, and WHIM14), luminal (WHIM16 and WHIM 20), claudin-low (WHIM12), and HER2E (WHIM11) breast cancer subtypes were raised subcutaneously in 8-week-old NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ mice (Jackson Labs), as previously described (15, 56). Tumors from each animal were harvested by surgical excision at ~1.5 cm3, rapidly divided into four pieces, and snap-frozen by immersion in a liquid nitrogen bath immediately after excision.

Tissue lysate preparation

Samples were cryopulverized into powder using a Covaris CP02 CryoPrep system and solubilized in lysis buffer with a Covaris S220X sonicator (peak incident power: 100 W, 500 cycles per burst, 10% duty factor, 4°C, 4 min) as in (56). The lysis buffer [50 mM Hepes (pH 7.5)] contained the following: 150 mM NaCl, 0.5% Triton X-100, 1 mM EDTA, 1 mM EGTA, 10 mM NaF, 2.5 mM NaVO4, 1× Protease Inhibitor Cocktail (Roche), and phosphatase inhibitor cocktails 2 and 3 (Sigma). Lysates were centrifuged at 22,000g for 10 min to pellet any debris and then filtered through a 0.45-μm filter (Millipore Ultrafree-MC HV) to further remove insoluble protein. Protein concentrations were determined (Advanced Protein Assay, Cytoskeleton) to prepare aliquots with concentrations of ~5 mg/ml, storage at −80°C, and subsequent digestion.

Sample digestion and TMT labeling

The PDX lysates (0.8 to 2 mg) were thawed on ice, and the protein was precipitated using 2-D Clean-Up kit (GE Healthcare) and resolubilized in 8 M urea with 100 mM triethylammonium bicarbonate (TEAB) (pH 8.5). The samples were reduced with 5 mM tris(2-carboxyethyl)phosphine for 30 min at 25°C, alkylated with 40 mM iodoacetamide for 30 min at 25°C in the dark, and quenched with 20 mM dithiothreitol for 15 min at 25°C. Samples were diluted to 2 M urea with 100 mM TEAB before addition of 5 μg of trypsin (Fluka, analytical grade). Samples were incubated at 37°C for 16 hours. Another 5 μg of trypsin was added and incubated at 37°C for 4 hours. Samples were filtered through a 30-kDa filter (Millipore Ultracel YM-30) and labeled with TMT10 reagents per the manufacturer’s instructions. Samples were desalted with C4 and graphitized carbon tips (Glygen), loaded using 1% acetonitrile (ACN) and 1% formic acid (FA), eluted with 60% ACN and 1% FA, and combined before high-pH RP fractionation.

High-pH RP fractionation

TMT10-labeled peptides were separated by high-pH RP spin columns (Thermo Fisher Scientific, catalog no. 84868). After conditioning columns twice with 300 μl of ACN and three times with 300 μl of water, samples were loaded into the column. Columns were centrifuged at 3000g for 1 min, followed by a wash with 300 μl of 0.1% triethylamine. Three-hundred microliters of each elution buffer (5%, 7.5%, 10%, 12.5%, 15%, 17.5%, 20%, and 50% ACN in 0.1% TFA) was added in sequence to the column followed by centrifugation at 3000g for 1 min to collect each eluted fraction. Elutions from 5% and 7.5% ACN were pooled following elution. Samples were evaporated to dryness and resuspended in 0.1% FA for LC-MS analysis.

Pooling TMT-labeled samples

An initial set of six PDX tumors, one biological replicate each, was pooled together to make pool A (fig. S6A). The sample set was first used to evaluate assay reproducibility (fig. S6B; TMT10 plex #1; shown in fig. S3) as well as determine relative protein abundance in these PDXs. A set of 15 additional PDXs was pooled into pool B (fig. S6A) that was used only to determine relative protein abundance in each PDX in TMT10 plex #2 and plex #3 (fig. S6B). Pool A was included as a TMT10 channel in all analyses to stitch all three TMT10 plexes together. The Pearson’s correlation coefficients of all protein abundance ratios between pool A and B, including both human and mouse proteins, were very high, averaging 0.994 across analyses (fig. S7A). In addition, the raw intensity of individual spectral matches in each pool was very linear (average slope = 0.981; fig. S7B) and similar (average linear regression r2 = 0.963; fig. S7B). We also analyzed one sample (WHIM12.2; fig. S6B) in both TMT10 plex #2 and #3 to validate that all relative protein abundance across the two TMT10 plexes with pool B (fig. S6B) resulted in similar protein profiles. The TMT10-labeling schematic for all samples analyzed is shown in table S2.

LC-MS data acquisition of TMT10-labeled samples for quantitative protein profiling on Thermo Elite LC-MS

TMT10 plex #1 was analyzed on a Thermo Elite LC-MS. LC was performed using two 75 μm × 15 cm ChromXP C18 columns in tandem with a 4-hour LC gradient from 5 to 30% ACN over 180 min and 30 to 45% over 25 min. MS analysis was performed on an Orbitrap Elite with a TOP15 method. MS1 scan range was 380 to 16,000 mass/charge ratio (m/z) at a resolution of 120,000 for 10 ms. MS2 spectra were generated by high-energy collisional dissociation (HCD) fragmentation and acquired with 30,000 resolution and scans starting at 110 m/z.

LC-MS data acquisition of TMT10-labeled samples for quantitative protein profiling on Thermo Q Exactive LC-MS

For each high-pH RP fraction, 2 μl of sample was loaded onto a 75 μm inner diameter × 25 cm Acclaim PepMap 100 RP column (Thermo Fisher Scientific). Peptide separations were started with 95% mobile phase A (0.1% FA) for 5 min and increased to 25% mobile phase B (100% ACN, 0.1% FA) for more than 95 min, followed by a 10-min gradient to 40% B, a 5-min gradient to 90% B, and wash at 90% B for 5 min, with a flow rate of 300 nl/min. Full-scan mass spectra were acquired by the Orbitrap mass analyzer in the m/z of 375 to 1400 with a mass resolving power of 70,000. Fifteen data-dependent HCDs were performed with a mass resolving power set to 35,000, an m/z range starting from 100 up to a maximum determined by the instrument, an isolation width of 0.7 m/z, and normalized collision energy setting of 32. The maximum injection time was 50 ms for parent ion analysis and 105 ms for product ion analysis. Target ions already selected for MS/MS were dynamically excluded for 30 s. An automatic gain control target value of 3 × 106 ions was used for full MS scans and 1 × 105 ions for MS/MS scans. Peptide ions with charge states of one, or greater than six, were excluded from MS/MS interrogation.

Peptide identification

Protein and peptide identification as well as relative quantitation were performed with Proteome Discoverer software (version; Thermo Fisher Scientific) using Mascot (v. 2.4.1) as the search engine. MS/MS spectra were searched against a concatenated National Center for Biotechnology Information Reference Sequence (RefSeq version July 2013) database of human (36,380 entries) and mouse (24,821 entries), and cRAP (57) (version 1.0, 1 January 2012, 116 entries). The search parameters included digestion with trypsin/P (two or four missed trypsin cleavages with Elite versus QE data, respectively), static modification of cysteine by carbamidomethylation, TMT10 labeling of lysine, and peptide N termini, and variable modifications included methionine sulfoxide, deamidation of asparagine, deamidation of glutamine, acetylation of protein N terminus, TMT10-plex derivatization of peptide N termini, and S-carbamoylmethylcysteine cyclization of N-terminal cysteine. Mass tolerances were 10 ppm (parts per million) mass accuracy on precursors and 0.02 Da on fragment ions. Peptide FDR was calculated by target-decoy searching against a reversed data set, and peptides were filtered at 1% FDR. A peak integration tolerance of 20 ppm was used for extracting TMT10 reporter ion intensities.

Quantitation, data processing, and normalization

The resulting peptide sequences were mapped onto both human and mouse genes using the PGx software package (version 1.0) (58) to identify species- and gene-unique peptides (fig. S2). When a peptide sequence was observed in both human and mouse sequences, it was defined as “species-shared.” Otherwise, peptide sequences were defined as “species-unique.” The species-specific peptides were further characterized as gene-unique for their respective species. When a peptide was matched to only one gene that was represented by only one database sequence entry or multiple database sequence entries, the peptide was defined as “gene-unique.” Otherwise, peptides were considered as “gene-shared.” To quantify proteins at the gene level, only species- and gene-unique peptides were used.

Relative quantification of protein abundance was performed using the reporter ion signals from the TMT10 multiplex experiments (fig. S2). All peptide spectrum matches (PSMs) with an FDR of ≤1% were “rolled up” to the gene level by summing the peak heights for each gene- and species-specific PSM, and quantified proteins were reported by their respective gene symbols. Genes were excluded from further analysis if their summed peak heights from the reporter ions were zero or had missing values in any TMT channel. For each gene, summed peak heights were then normalized to the internal reference pool included in each TMT10 plex. Mouse genes were further filtered by removing plasma (59) and abundant erythrocyte proteins identified by proteomics (60) before downstream analysis. Relative intensities were log2-transformed, subtracted with log2 scale median, and then divided by log2 scale SD sample-wise.

Statistical analysis

The coefficients of determination (R2) were represented in box plots (Fig. 1B). In each box plot, the lower and higher whiskers represent the first and third quartiles, respectively, the horizontal line indicates the position of median, and dots outside of whiskers are outliers. GSEA (version 2.2.1) analysis was performed using default parameters, and the software package generated statistical analysis. Hierarchical clustering (heatmap.2 in R) was performed using Spearman’s correlation coefficients and Ward’s minimum variance method. Color gradient of heatmap was set as red being the highest and blue being the lowest protein abundance. The human and mouse data were statistically analyzed using one-way ANOVA by considering PDX samples as multiple groups, respectively. Differences were regarded as significant if the adjusted P value (Benjamini-Hochberg method) was less than 0.05. For PCA, the differentially regulated human and mouse proteins were used to calculate principal components, and the data points projected onto the first three principal components were visualized in a three-dimensional space.

To explore whether the use of male and female mice in the basal subtype PDXs affects our species-specific proteomics results, we performed PCA across all 21 tumors using the proteins with significantly altered abundance that were identified by ANOVA. Principal components with up to 80% variance were kept for visualization (fig. S4).

To identify interaction patterns between tumor (human) and stromal (mouse) clusters, correlation analysis was performed on the basis of significantly differentially quantified human and mouse proteins (Fig. 4). Specifically, mouse and human proteins after applying ANOVA were used to calculate Spearman’s correlation coefficients between species, and P values from correlation tests were further adjusted by Benjamini-Hochberg method. For each human protein, the correlation with maximum absolute value in a stromal cluster was used to represent the tumor and stromal interaction. The output data set served as an input for GSEA analysis (Fig. 4A). The numbers of significantly correlated human-mouse protein pairs were also recorded for each human protein split by mouse clusters when set threshold for adjusted P value as 0.05 (Fig. 4B). Blue and red lines indicate the number of negative and positive significant correlation, respectively.

Data from breast cancer TCGA and Clinical Proteomic Tumor Analysis Consortium (CPTAC) (Fig. 3) were used to determine the extent to which the clusters in the mouse stroma are regulated in human patients. RNA-sequencing (RNA-seq) data from 1095 primary breast tumors from the TCGA (36) (TCGA RNA-seq V2 pipeline) were analyzed for coordination at the transcript level. Global iTRAQ proteomic data from 105 patients in the CPTAC analysis of the TCGA (37) were analyzed for coordinate protein level regulation. For the gene sets found by hierarchical clustering of the mouse stromal data (Fig. 2B), Spearman’s correlation coefficients were used to determine the strength of coordination in each data set. P values (Fig. 3B) for the correlation matrices were determined using Monte Carlo simulation by sampling 10,000 randomized sets of equal size to the test clusters and ranking the sums of the Spearman’s correlation coefficients. For Fig. 3 (C and D), CPTAC proteomics data were parsed by TCGA-assigned PAM50 subtypes and pathologic stage, respectively. Statistical significance was assessed using Student’s t tests. Data visualization was implemented using R (version 3.1.2) unless indicated otherwise.


Fig. S1. Framework to study the tumor-intrinsic biology of breast cancer PDXs.

Fig. S2. Data processing workflow of finding species- and gene-unique PSMs.

Fig. S3. Correlation (R2) plot of protein abundance between all biological and process replicates in the data set pre-ANOVA filtering.

Fig. S4. PCA of ANOVA-filtered proteins labeled by mouse gender.

Fig. S5. mRNA and protein abundance of the stromal proteomic signatures in individual TCGA tumors.

Fig. S6. TMT10 pooling and data acquisition.

Fig. S7. Comparison of TMT10 global reference pools A and B.

Table S1. Metastatic information and passage number of PDX in each biological replicate.

Table S2. TMT10 labeling schematic for all samples analyzed.

Table S3. PDX protein abundance.

Table S4. List of all proteins significantly correlated with each stromal cluster.

Data file S1. List of all proteins and correlation values (r) in each stromal cluster.


Acknowledgments: We thank A. Davis, J. Malone, and J. Rumsey for support with data acquisition and implementing data analysis across multiple software platforms. Funding: We thank the Alvin J. Siteman Cancer Center at Washington University School of Medicine and Barnes-Jewish Hospital in St. Louis, MO, for the use of the Proteomics Shared Resource, which provided support with sample preparation. The Siteman Cancer Center is supported in part by National Cancer Institute Cancer Center Support grant P30 CA91842 (R.R.T.). We also acknowledge funding from U24 CA210972 (D.F. and L.D.), Leidos Biomedical Research Inc. contract 13XS068 (D.F.), U24 CA160035 (R.R.T. and M.J.E.), and T32GM007067-41 T32 training grant (A.D.M.). Author contributions: X.W., P.E.-G., R.V., S.R.D., R.B., M.J.E., J.C.R., R.R.T., D.F., and J.M.H. designed research. P.E.-G., Q.Z., and S.L. performed experiments. X.W., R.V., R.B., S.L., J.C.R., and D.F. contributed new reagents or analytic tools. X.W., A.D.M., Q.Z., R.V., K.-l.H., L.D., R.R.T., D.F., and J.M.H. analyzed the data. X.W., S.R.D., B.A.V.T., J.S., R.R.T., D.F., and J.M.H. synthesized results and wrote the paper. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All raw LC-MS data and peak lists are deposited and available from MassIVE (, data set MSV000080670, and ProteomeXchange, data set PXD006162.

Stay Connected to Science Signaling

Navigate This Article