The Structure and Function of Proline Recognition Domains

See allHide authors and affiliations

Science's STKE  22 Apr 2003:
Vol. 2003, Issue 179, pp. re8
DOI: 10.1126/stke.2003.179.re8


One particularly abundant group of modular recognition domains consists of those that bind proline-rich motifs. Such modules, including the SH3, WW, and EVH1 domains, play a critical role in the assembly and regulation of many intracellular signaling complexes. These domains use strikingly similar molecular mechanisms of proline recognition. We discuss some of the potential biological advantages conferred by proline recognition, which may explain its widespread use in signaling.


Domains that bind proline-rich motifs are critical to the assembly of many intracellular signaling complexes and pathways. The importance of proline-rich motifs in biology is highlighted by the finding that proline-rich regions (1) are the most common sequence motif in the Drosophila genome and the second most common in the Caenorhabditis elegans genome (2). The number of defined protein domains that recognize proline-rich motifs has expanded considerably in recent years to include such common motifs as Src homology 3 (SH3), WW (named for a conserved Trp-Trp motif), and Enabled/VASP homology (EVH1, also known as WASP homology 1 or WH1) domains, as well as other proline-binding domains. The number of domains in an organism roughly corresponds to its perceived complexity (Table 1).

Proline recognition domains are usually found in the context of larger multidomain signaling proteins. Their binding events often direct the assembly and targeting of protein complexes involved in cell growth (3-5), cytoskeletal rearrangements (6, 7), transcription (8), postsynaptic signaling (9, 10), and other key cellular processes (11). In addition, these interactions can play a regulatory role, often through autoinhibitory interactions that are alleviated by competing binding events (12).

Several recent reviews discuss individual proline recognition domains (9, 13, 14). This review aims to compare the biological role and the molecular mechanisms of these domains and to address the implications of having multiple domains with similar ligand specificities within a single cell.

Properties of Proline and Polyproline Sequences

Repetitive proline-rich sequences are found in many proteins (15) and in many cases are thought to function as docking sites for signaling modules (16). Why might proline be singled out for recognition by so many key protein-protein interaction modules? Several features of proline distinguish it from the other 19 naturally occurring amino acids (Fig. 1A): the unusual shape of its pyrrolidine ring, the conformational constraints on its dihedral angles imposed by this cyclic side chain, its resulting secondary structural preferences, its substituted amide nitrogen, and the relative stability of the cis isomer in a peptide bond. Each recognition domain exploits some combination of these distinctive features of proline in order to achieve specific binding to proline-rich regions.

Fig. 1.

Properties of proline and polyproline sequences. (A) Chemical structure of proline contrasted with that of other natural amino acids. Proline possesses a five-member ring fused onto the nitrogen, making it a secondary amine, whereas other amino acids have side chains that only branch off of the α carbon, leaving a primary amine. (B) Schematic and structural representation of a PPII helix. The helix has twofold pseudosymmetry: A rotation of 180° about a vertical axis leaves the proline rings and the carbonyl oxygens at approximately the same position. The Protein Data Bank (PDB) accession code for the poly-(l)-proline structure shown is 1CF0. To view this structure in motion, see [Structures] (C) A view down the axis of the PPII helix highlighting the position of the carbons in the xP dipeptide. In the "x" position that requires C-substitution (blue), the primary recognition element is the β carbon, whereas in the "P" position that requires N-substitution (red), the primary recognition element is the δ carbon that is unique to proline.

One feature of proline-rich motifs that is frequently used in binding to signaling domains is their propensity to form a polyproline type II (PPII) helix. The PPII helix is an extended left-handed helical structure with three residues per turn and an overall shape resembling a triangular prism (Fig. 1B) (15, 17). A combination of steric and hydrogen-bonding properties of proline-rich motifs is thought to contribute to its preference for this unusual secondary structure (15, 17). Two features of the PPII helix make it a useful recognition motif. First, in this structure both the side chains and the backbone carbonyls point out from the helical axis into solution at regular intervals (Fig. 1B). The lack of intramolecular hydrogen bonds in the PPII structure, due largely to the absence of a backbone hydrogen-bond donor on proline, leaves these carbonyls free to participate in intermolecular hydrogen bonds. Thus, both side chains and carbonyls can easily be "read" by interacting proteins (18). Second, because the backbone conformation in a PPII helix is already restricted, the entropic cost of binding is reduced (16, 19). Nearly all of the domains described here bind their ligands in a PPII conformation. Many of the interactions with the PPII helical ligand involve aromatic residues. The planar structure of aromatic side chains appears to be highly complementary to the ridges and grooves presented on the PPII helix surface.

One interesting structural feature of the PPII helix is that it has twofold rotational pseudosymmetry: Side chains and backbone carbonyls are displayed with similar spacing in either of the two N- to C-terminal orientations (Fig. 1B). This feature may explain why many proline-binding domains are observed to bind ligands in two possible orientations, a property unique among characterized peptide recognition modules. In principle, this orientational flexibility could play an important role in domain function. For example, one could imagine a complex in which binding in one orientation could be activating, whereas binding in the opposite orientation could be inhibitory. However, such an orientational switching role has not been demonstrated.

Another unique property of proline is that it is the only naturally occurring N-substituted amino acid. Proteins that recognize the δ carbon on the substituted amide nitrogen (Fig. 1A) within the context of the otherwise standard peptide backbone can select precisely for proline at a given position without making extended contacts with the rest of the side chain (Fig. 1C). Thus, sequence-specific recognition can be achieved without requiring a particularly high-affinity interaction. Interactions that are specific and low-affinity can be quite useful in intracellular signaling environments where rapidly reversible interactions may be required. Among proline-binding domains, this phenomenon has been best characterized for SH3 domains, in which required prolines can be replaced without a significant loss in binding affinity by a number of nonnatural N-substituted amino acids that do not resemble proline (20).

Proline also stands out from other natural amino acids in its ability to exist stably as a cis isomer about the peptide bond. In an unfolded chain, proline residues adopt the cis conformation with a probability of ~20% as compared to negligible amounts for the other amino acids (15). Moreover, the kinetic barrier for cis-trans isomerization is higher for proline than for the other amino acids and is even the rate-limiting step in the folding of certain proteins (21). In principle, recognition of cis proline moieties could be a useful way of achieving regulation, potentially even with some degree of kinetic control. However, none of the major proline recognition modules discussed here are known to exploit recogntion of cis isomers. Still, the intriguing possibility remains that cis-trans isomerization could provide a mechanism to modulate such recognition events.

Thus, many chemical properties of proline distinguish it from the other 19 naturally occurring amino acids, and proline recognition domains exploit several of these properties. If a recognition event involves a property of proline that is sufficiently distinct among the natural set of 20 amino acids, the interaction does not have to be of particularly high affinity to be selective. The benefits of weak, but specific, interactions in intracellular signaling pathways may help explain the abundance of proline-based recognition motifs.

SH3 Domains

The first characterized and best understood example of the proline recognition modules is the SH3 domain (14). SH3 domains comprise about 60 residues and typically play an assembly or regulatory function. An assembly role is exemplified by the adaptor protein Grb2, which is involved in the p21 Ras-dependent growth factor signaling pathway (Fig. 2A) (4). Grb2 has a single Src homology 2 (SH2) domain, which recognizes phosphotyrosine motifs, flanked by two SH3 domains. Upon growth factor stimulation, receptor tyrosine kinase activation results in autophosphorylation and phosphorylation of other membrane-associated proteins. These phosphorylation events create docking sites for the Grb2 SH2 domain, thereby resulting in membrane recruitment of Grb2. The Grb2 SH3 domains bind to proline-rich motifs in the protein SOS, a guanine nucleotide exchange factor for Ras, ultimately recruiting SOS to the membrane. Because Ras is myristoylated and membrane-localized, this colocalization with SOS promotes guanosine triphosphate (GTP) loading of Ras. The resultant stimulation of Ras activates a mitogen-activated protein kinase (MAPK) cascade, leading to cell growth and differentiation (3, 5). Similar recruitment roles are played by SH3 domain-containing proteins in various other biological processes, including endocytosis (11) and cytoskeletal dynamics (22).

Fig. 2.

Functional roles of SH3 domains. (A) Assembly role of SH3 domains. Growth factor stimulation leads to the activation of receptor tyrosine kinases and to the phosphorylation of the receptor tail, of related adaptor proteins (not shown), or of both. The resultant phosphotyrosines form docking sites for the adaptor protein Grb2 (through its SH2 domain). The Grb2 SH3 domains bind proline-rich motifs in SOS, the guanine nucleotide exchange factor for Ras, recruiting SOS to the membrane and colocalizing it with Ras. The resultant stimulation of Ras activates a MAPK cascade, leading to cell growth and differentiation. (B) Regulatory role of SH3 domains. Intramolecular interactions of the SH2 and SH3 domains of Src kinases hold their kinase domains in an inactive conformation. These autoinhibitory interactions can be disrupted by external SH2 and SH3 ligands, yielding spatial and temporal control of kinase activation.

SH3 domains also play regulatory roles. An excellent example of this is the Src family of tyrosine kinases (Fig. 2B) (12, 23). Src kinases contain an SH2 and an SH3 domain in addition to the kinase domain. Under basal conditions, the SH2 and SH3 domains participate in intramolecular interactions that hold the kinase domain in an inactive conformation. Binding to external SH2 and SH3 ligands can disrupt these autoinhibitory interactions, thereby yielding activation. An important feature of such a regulatory role is that targeting by the SH2 and SH3 domains is directly coupled to activation of the kinase, yielding precise spatial and temporal control. SH3 domains appear to play a similar autoinhibitory role in several other systems, including the neutrophil NADPH oxidase (24-26). This tightly regulated enzyme produces the antimicrobial reactive oxygen species only upon proper stimulation. Activation involves the assembly and membrane localization of the SH3-containing proteins p40phox, p47phox, and p67phox.

Such regulatory mechanisms reveal how SH3 domains, which were initially viewed as static assembly elements, can function as dynamic switches by alternating binding partners (intra- versus intermolecular). SH3 interactions tend to be fairly weak, with typical dissociation constants (Kd's) in the μM range (14). Such weak affinities may be essential for this kind of reversible switching mechanism.

Much effort has been dedicated to understanding the ligand preferences of SH3 domains (27-30). In vitro peptide selection studies revealed that the majority of SH3 domains require the conserved consensus motif PxxP for recognition. In individual SH3 domains, however, this core PxxP motif is flanked by different specificity elements. For example, a large group of SH3 domains recognize the PxxP core flanked by the basic residues R or K. However, early studies were confounded by the observation that two classes of such ligand motifs emerged: K/RxxPxxP and PxxPxK/R (where K or R are required flanking residues and x is any amino acid). This confusion was clarified by structural studies that revealed that SH3 domains could use a single recognition surface to bind ligands in two possible N- to C-terminal orientations (31-34). Each of these two recognition motifs corresponds to the sequence preferences for a distinct orientation of binding. Efforts are under way to use the extensive peptide library data to generate algorithms to predict SH3 recognition (30, 35, 36).

Structures of SH3 domains both alone and in complex with ligand reveal their mechanism of recognition (Fig. 3). The SH3 fold consists of two antiparallel β sheets at right angles to one another. Within this fold are two variable loops, referred to as the RT and the n-Src loops (37, 38). When bound, the proline-rich peptide ligand adopts a PPII helix conformation (33, 34, 39). Recognition of this structure is achieved by insertion of the ridges of the PPII helix into a complementary pair of grooves on the SH3 surface. These surface grooves are defined by a series of nearly parallel, well-conserved aromatic residues. In addition, hydrogen-bonding donors are well positioned to recognize ligand backbone carbonyl moieties.

Fig. 3.

Structure and binding mechanism of SH3 domains. The structure of the Sem5 SH3 domain in complex with a proline-rich ligand is shown. A cartoon of the proline-binding surface of these domains docked with a ligand, showing the general mechanism of recognition, is shown below. The core recognition surface has two xP binding grooves formed by aromatic amino acids, shown in yellow, and the adjacent, less conserved specificity pockets are designated in green. The Protein Data Bank (PDB) accession code for this structure is 1SEM. To view this structure in motion, see [Structures]

Each groove actually recognizes a pair of residues of the sequence xP (where x is a variable, usually hydrophobic, amino acid). This mode of recognition explains the requirement for prolines. Because the xP dipeptide unit has the unique backbone substitution pattern of a C-substituted residue followed by an N-substituted residue, it forms a relatively continuous ridge that can pack efficiently into the aromatic grooves on the SH3 surface (Fig. 1C). Because this mechanism relies only on the N-substitution of proline and not the entire proline ring, it allows recognition to be highly selective without being of high affinity. Moreover, it has been shown that nonnatural N-substituted groups can be used to make synthetic SH3 inhibitors (20). This mode of recognition also explains why SH3 domains can bind ligands in two possible orientations: A PPII ligand has twofold rotational pseudosymmetry, with respect to both the steric properties of the xP unit and the presentation of hydrogen-bonding groups (the backbone carbonyls) that are used in recognition (Fig. 1B).

Adjacent to the core recognition surface of SH3 domains are the more variable RT and n-Src loops (Fig. 3). In many cases, residues in these loops are observed to make numerous unique interactions with key residues in the ligand that flank the PxxP core. Thus, in general, these loops can be considered to form a flanking specificity pocket. The specificity provided by these pockets has been explored through both phage display techniques and combinatorial synthetic strategies (28, 40). These studies show that there is sufficient variability in these pockets to allow for some differential binding among SH3 family members.

Despite having distinct specificity pockets, many SH3 domains appear to have highly overlapping recognition profiles. For example, a large majority of SH3 domains recognize R/KxxPxxP or PxxPxR/K motifs (29, 41). Thus, an unanswered question is how specificity within SH3 domain-mediated interaction networks is achieved, especially in cells and organisms with many SH3 domains. One solution, used by a handful of SH3 domains, is the evolution of a noncanonical recognition mechanism. Several SH3 domains recognize non-PxxP motifs. This is the case for the SH3 domains of Eps8, which recognizes PxxDY (42); Gads, which recognizes RxxK (43); and Fus1, which recognizes Arg-Ser-rich sequences (41). In most of these cases, it is unclear whether this recognition is mediated by the equivalent surface used by canonical SH3 domains to recognize PxxP ligands. Another class of unusual SH3 domains is found in membrane-associated guanylate kinases (MAGUKs). MAGUK SH3 domains do not appear to bind PxxP motifs, but instead can associate with an adjacent guanylate kinase domain in an intra- or intermolecular fashion (44). This interaction may play a role in the assembly of signaling complexes at cell-cell junctions.

Several other mechanisms may contribute to enhancing specificity in SH3 domain-mediated interactions. There may be tertiary structure elements involved in recognition, as is the case for the recognition of the human immunodeficiency virus (HIV) protein Nef by the SH3 domains of Src family kinases Hck, Fyn, and Lyn (23, 45, 46). Nef presents a canonical PPII core in the context of a folded structure. Thus, there are additional interactions between other parts of Nef with unique elements in the RT loops of these SH3 domains.

Specificity and affinity enhancements may also come from combinatorial recognition by multiple recognition domains working in concert. There are many examples of proteins containing multiple SH3 domains, such as the yeast proteins Bem1 and Sla1 (41) or the above examples of Grb2 (47) and p47phox. Moreover, SH3 domains could function together with other modules such as SH2, PDZ (named after PSD-95, Dlg, and ZO-1), or EVH1 domains that are often found in the same proteins or complexes.

Additionally, some SH3 domains participate in multiple interactions (48). For example, the SH3 domain from the yeast protein Pex13 has two binding surfaces: a canonical surface that binds a PxxP ligand from Pex14 and a second surface that binds a nonproline motif in Pex5 (49). This set of distinct interactions achieved by the Pex13 SH3 domain is thought to reinforce the assembly of the specific trimeric complex. Several other SH3 domains also appear to have binding surfaces distinct from their proline-binding interface (50).

WW Domains

WW domains mediate protein-protein interactions in diverse processes (13). For example, the WW domains of the ubiquitin ligase Nedd4 bind to Na+-channel subunits, thereby targeting ubiquitin-mediated down-regulation of channel activity (51). A mutation in the recognition motif on the Na+-channel subunit, as occurs in the human disease Liddle's syndrome, increases the number of Na+ channels in the membrane, leading to increased blood pressure. WW domains are found in several ubiquitin ligases that bind to other targets (8). In addition, pre-mRNA splicing involves an interaction between the WW domains in the splicing factor PRP40 and a proline-rich region in the branchpoint-binding protein BBP. Another example of a biologically important role of WW domains is the organization of the dystrophin-syntrophin-β-dystroglycan complex (52, 53).

WW domains can be divided into several classes based on recognition motifs (54). All recognize proline-containing motifs that are distinct from, though overlapping with, SH3 domains. For example, the WW domains from the Yes-associated protein YAP65 and dystrophin prefer the motif Pro-Pro-X-Tyr (PPxY) (52, 55); the FBP11 and FE65 WW domains prefer Pro-Pro-Leu-Pro (PPLP) (56); and the FBP21, FBP30, and Npw38 WW domains prefer Pro-Arg (P-R) repeats (57, 58). Phosphorylation can play an important negative or positive regulatory role in WW domain recognition. For example, the WW domains of the mitotic peptidyl prolyl isomerase (PPIase) Pin1 and the ubiquitin ligase Nedd4 bind specifically to phospho-Ser/Thr-Pro motifs but not to their unphosphorylated counterparts. In contrast, interactions with PPxY motifs can be abolished by tyrosine phosphorylation (59-61).

The structures of WW domain-ligand complexes reveal a striking mechanistic similarity to those of SH3s and other proline recognition domains (Fig. 4) (62). Containing 35 to 45 residues, WW domains are highly compact binding domains, comprising an antiparallel three-stranded fold (55). Like SH3 domains, their binding surfaces are composed of a series of nearly parallel aromatic residues. Correspondingly, their ligands adopt PPII helices that position the proline side chains against the ridges and grooves on the domain-binding surface (52, 60). The aromatic groove in the WW domain also recognizes an xP pair in the ligand core. A consequence of this common mode of proline recognition is that WW domains, like SH3 domains, can recognize their ligands in two opposite orientations. WW domains differ from SH3 domains in that they typically have only one xP binding groove as compared to two adjacent xP binding grooves found in SH3 domains. Thus, a shorter proline-rich core is required for WW domain recognition.

Fig. 4.

Structure and binding mechanism of WW domains. The structure of the dystrophin WW domain in complex with a proline-rich ligand is shown. A cartoon of the proline-binding surface of these domains docked with a ligand, showing the general mechanism of recognition, is shown below. The core recognition surface has one xP binding groove formed by aromatic amino acids (yellow) and adjacent, less conserved specificity pockets (green). The PDB accession code for this structure is 1EG4. To view this structure in motion, see [Structures]

How then, outside the requirement for the xP core, do WW domains achieve specific recognition of their ligands? Like SH3 domains, WW domains use variable loops and neighboring domains to enhance specificity. The WW domain fold has two variable loops that are adjacent to the aromatic xP-binding groove. These loops are observed to participate in interactions with key specificity elements, including the required phospho-Ser residues within the proline-rich motif bound by Pin1 or the nonphosphorylated Tyr residue within the PPxY motif bound by the dystrophin WW domain. This mechanism of specificity is conceptually similar to that used by the n-Src and RT loops of SH3 domains.

Multiple cooperative interactions with neighboring domains can also contribute to specificity in WW domain-mediated recognition. The interaction of dystroglycan with dystrophin requires both the WW domain and an adjacent helical EF hand-like domain (EF domains are calcium-binding domains). The two domains form a contiguous recognition surface where approximately half of the dystroglycan peptide ligand contacts only the EF domain. The structure of Pin1 in complex with a phosphopeptide also shows significant contacts between the ligand and the adjacent PPIase domain.

EVH1 Domains

A third class of polyproline-binding domains, EVH1 domains, also typically plays a recruitment or targeting role, often in events involving actin cytoskeleton dynamics (6, 7) and postsynaptic signaling (9). Examples of proteins with EVH1 domains that contribute to targeting of actin polymerization include the Wiskott-Aldrich syndrome protein (WASP) family (63) and the Ena/VASP family (64). Molecular targets of these domains include the cellular focal adhesion proteins vinculin and zyxin (7), as well as the ActA protein from Listeria monocytogenes (65). Recruitment of Ena/VASP proteins to ActA is thought to promote the motility of this intracellular pathogen. By hijacking the cellular actin polymerization machinery, Listeria can move from cell to cell through adjacent plasma membranes to evade immune detection (66, 67). The Homer (also known asVesl) family exemplifies EVH1 domain-containing proteins that are thought to direct targets to synaptic signaling complexes (68). The best characterized targets for this family are the cytosolic portion of group I metabotropic glutamate receptors (mGluRs), inositol 1,4,5-trisphosphate receptors (IP3Rs), ryanodine receptors (RyRs), and the Shank family of postsynaptic density proteins (10, 68, 69).

Individual EVH1 domains have distinct ligand recognition profiles (9). Most domains recognize relatively short peptides comprising a proline-rich core and a few flanking residues with Kd's in the low μM range. One group of EVH1 domains, including the Ena/VASP family of proteins, recognizes the consensus sequence FPxϕP (ϕ is a hydrophobic residue), where x and ϕ are often prolines (70). Another group, including Homer, binds the distinct consensus PPxxF (10). More recently, a distinct class of EVH1 domains, including those from the WASP family member N-WASP and the nucleoporin RanBP1, has emerged that recognizes extended ligand sequences of up to 25 amino acids (71-73).

Structures of EVH1 domains from several proteins, some in complex with their proline-rich ligands, reveal that EVH1 ligands also bind in a PPII helical conformation and dock against an aromatic-rich binding surface on the domain (Fig. 5). However, the mechanism by which EVH1 domains recognize the PPII helix is different from that used by SH3 and WW domains. The binding surface of the EVH1 domain is concave, allowing complementary packing of the apex of the PPII helix. This interface contrasts with the relatively flat recognition surface of SH3 and WW domains that allows complementary packing of the base of the PPII helix. Consequently, whereas SH3 and WW ligand recognition depends largely on side-chain contacts with the backbone amide substitution that is unique to proline, EVH1 ligand recognition seems to depend more on the conformation of the PPII helix. This difference in recognition mechanisms may explain differences in the precise ligand structure required for binding. Although a chemical dissection approach demonstrated that the "required" prolines in an SH3 or WW ligand could effectively be replaced by nonnatural N-substituted amino acids, such as sarcosine (20), this type of approach failed to reveal a similar trend for EVH1 domains (74). This observation is consistent with EVH1 domains recognizing a feature of the proline-rich core other than the repetitive presentation of a substituted amide nitrogen in an xP dipeptide motif.

Fig. 5.

Structure and binding mechanism of EVH1 domains. A representative structure of the Mena EVH1 domain in complex with a peptide ligand is shown. Below is a schematic of the recognition mechanism showing the apex of the PPII helix fitting into an aromatic-rich wedge at the binding surface. Although a conserved set of aromatic residues (yellow) also contacts the PPII ligand, the manner in which the PPII helix docks against the domain surface differs from that observed in most other proline-binding domains discussed here. The PDB accession code for this structure is 1EVH.

EVH1 complex structures also demonstrate a critical role in recognition for motifs outside of the proline-rich core. The Phe residue that flanks the central prolines in EVH1 ligands docks into a hydrophobic pocket adjacent to the PPII recognition surface of the domain. Also, charged residues near the aromatic binding surface of some EVH1 domains interact with oppositely charged residues that flank the core proline residues in the ligand (75, 76). An additional role for flanking sequences in recognition is exemplified by the N-WASP EVH1 domain. Although ligands for this domain contain a short (~10-residue) proline-rich sequence, this motif alone is insufficient for detectable binding. This domain instead requires an extended ligand of about 25 amino acids. Structural and biochemical analysis revealed that the ligand wraps around the EVH1 domain, making energetically important contacts with extensive surfaces beyond the PPII binding interface (72). A similar extended interaction is seen between the EVH1 domain of the nucleoporin RanBP1 and its ligand, Ran (73).

EVH1 domains can recognize their ligands in either of two N- or C-terminal orientations (72). This flexibility is likely a consequence of the fact that, like SH3 and WW domains, EVH1 domains recognize their ligands in the pseudosymmetric PPII conformation (9, 72).

Other Proline-Rich Binding Domains

A number of other proline recognition domains deserve mention, though they may be less abundant within sequenced genomes and are less well characterized than those mentioned above. Among this group are Gly-Tyr-Phe (GYF) domains, profilin, and the ubiquitin E2 variant (UEV) domain from Tsg101. Despite their divergent sequences, these domains seem to share some molecular strategies for proline recognition.

The GYF domain was first identified in CD2BP2, a binding protein for the human transmembrane T cell adhesion molecule CD2. This interaction enhances interleukin-2 production upon T cell activation. These domains are named after a highly conserved Gly-Tyr-Phe motif. The domain in CD2BP2 recognizes tandem PPPPGHR sequences separated by seven amino acids in CD2 (77). Deletion of either PPPPGHR repeat in CD2 abolishes binding to the CD2BP2 GYF domain in a yeast two-hybrid assay (77), although a single repeat was sufficient for weakened but detectable binding in vitro (78). This behavior is consistent with two CD2BP2 molecules, each using its own GYF domain to bind the tandem recognition motif in a cooperative manner. Structural studies are also consistent with a 1:1 binding stoichiometry between the GYF domain and the PPPPGHR motif.

A short peptide ligand binds the CD2BP2 GYF domain in a PPII helical conformation (Fig. 6) (78). The base of the helix docks against a relatively flat surface, as is observed in SH3 and WW domains. An xP dipeptide motif on one turn of the helix inserts into a hydrophobic, largely aromatic pocket on this surface. In addition, negatively charged residues near the binding pocket interact with Arg residues in the peptide that are required for efficient binding, and these electrostatic interactions outside of the PPII peptide core are implicated in determining the specificity of this GYF domain (77, 78). There is also evidence for steric, nonelectrostatic specificity determinants in the loops around the proline-binding pocket in the CD2BP2 GYF domain. Whereas the aromatic residues involved in core PPII helix docking are conserved across all known GYF domains, the residues interacting with flanking residues differ among these domains, separating GYF domains into a number of apparently distinct subclasses (78). This variability supports the notion that in GYF domains, as in other proline recognition motifs, the residues flanking the core prolines contribute to specificity among members of a given domain family.

Fig. 6.

Structure and binding mechanism of a GYF domain. The structure of the CD2BP2 GYF domain in complex with a proline-rich ligand is shown. A cartoon of the proline-binding surface of these domains docked with a ligand is shown below. The core recognition surface has one xP binding groove formed by aromatic amino acids (yellow) and adjacent, less conserved specificity pockets (green). The PDB accession code for this structure is 1L2Z. To view this structure in motion, see [Structures]

Profilin modulates actin dynamics by binding actin monomers and restricting their addition to one end of an actin filament, thus contributing to the polarization of actin polymerization (79, 80). In addition, profilin binds proline-rich sequences on such targeting proteins as formins and focal adhesion proteins like VASP, localizing it to sites of extensive actin filament assembly (7, 81). A crystal structure of profilin complexed with a polyproline decamer reveals extensive contacts between the ligand in a PPII helix conformation and five highly conserved, solvent-exposed aromatic residues on profilin (82). Mutagenic studies validate the importance of these residues in proline binding (83). Thus, profilin further exemplifies the intimate involvement of aromatic residues in PPII helix recognition.

UEV domains are found in the human protein Tsg101, which plays an important role in vesicular sorting and is coopted for the budding of HIV and Ebola virus. Virus budding involves the interaction between a 150-amino acid UEV domain in Tsg101 and a PTAP peptide motif in various structural virus proteins (with a Kd in the low μM range) (84, 85). Several putative cellular targets of Tsg101 also contain related proline-rich motifs. These targets include hepatocyte growth factor-regulated tyrosine kinase substrate (Hrs), which is also involved in Tsg101-dependent protein sorting, and plasma membrane proteins such as connexins 43 and 45, which are substrates of this sorting pathway (86). Tsg101 itself contains a PTAP sequence that binds its own UEV domain in vitro (84), suggesting that the UEV domain could play an autoregulatory role.

A structure of the UEV domain in complex with a PTAP-containing peptide reveals that the peptide adopts a PPII helical conformation over part of its length. The second xP dipeptide in this ligand is bound in a deep groove formed by the side chains of two key tyrosines from Tsg101 whose aromatic rings are positioned in a very similar manner to those in the proline-binding pockets of SH3 and WW domains (84).


The domains discussed here recognize proline-containing motifs by focusing on unique chemical properties of proline and proline-rich sequences. These recognition mechanisms take advantage of the fact that proline is chemically distinct from the other 19 natural amino acids. Thus, these domains are similar to other recognition domains used in signaling, which often focus on a highly distinct recognition anchor such as phosphoamino acids, as exemplified by SH2 and phosphotyrosine-binding (PTB) domains, (87) or carboxy termini, as exemplified by PDZ domains (88). Such features may simply stand out within the chemical milieu of the cell.

An advantage of focusing on such distinct chemical features is that such interactions can be discriminatory without resorting to extremely high affinities. The domains discussed here all tend to have Kd's with values ranging from high nM to low μM. Signaling pathways are often dynamic; they must be activated and inactivated quickly, and their interactions often involve domains switching between multiple interaction partners. Thus, these interactions cannot be so tight as to inhibit the dynamic nature of cellular processes.

Why are there so many proline recognition domains? This abundance may be a simple result of the proliferation of a successful solution to the problem of protein recognition. Having more domain types presumably allows the evolution of more complex signaling networks. Further, having a suite of domains that recognize similar or overlapping motifs may provide additional modes of interaction regulation (89). If domains from distinct family members recognize a single motif, the competition between these alternative partners could, in principle, act as a regulatory switch. Relatively little is known about the functional intersection between different domain families in vivo. However, in one case, T cell activation appears to promote this type of domain interaction swap: a receptor proline-rich motif that initially interacts with a GYF domain interacts with an SH3 domain after stimulation (78).

The number of proline-binding domains, however, exacerbates the problem of selectivity: How are incorrect interactions avoided? Most domains discussed here have multiple mechanisms for recognizing ligands with higher specificity (Fig. 7). Almost all have specificity pockets flanking surfaces used to recognize a proline-rich core. A few have multiple binding sites on a single domain, which may facilitate more specific, cooperative assembly. In some cases, it is clear that multiple domains work together to achieve specific recognition. The molecular mechanisms by which multiple domains cooperate to achieve biologically specific functions remains one of the major questions concerning these and other recognition modules.

Fig. 7.

Potential mechanisms for enhancing the specificity of proline-binding domains. One means of increasing specificity in proline-mediated interactions is by extending the interaction surface with the peptide to include residues beyond the proline-rich core. Another mechanism is to include a nearby sequence on the ligand that interacts with another binding module in the same complex as the proline recognition module. A third mechanism adds a separate recognition surface onto the proline recognition domain that recognizes a distinct peptide.

Table 1.

Abundance of proline recognition domains. The number of proteins with proline recognition domains in some commonly studied eukaryotic organisms, as found in the Pfam homology database (90), is shown. Those listed in the table are meant only to reflect the relative abundance in each proteome; different numbers are obtained from other domain identification databases. SH3-like domains are found in some prokaryotes (91). They are not included in the table because they lack certain key conserved residues, and the structure and function of these domains are unknown.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.
  59. 59.
  60. 60.
  61. 61.
  62. 62.
  63. 63.
  64. 64.
  65. 65.
  66. 66.
  67. 67.
  68. 68.
  69. 69.
  70. 70.
  71. 71.
  72. 72.
  73. 73.
  74. 74.
  75. 75.
  76. 76.
  77. 77.
  78. 78.
  79. 79.
  80. 80.
  81. 81.
  82. 82.
  83. 83.
  84. 84.
  85. 85.
  86. 86.
  87. 87.
  88. 88.
  89. 89.
  90. 90.
  91. 91.
View Abstract

Navigate This Article