ST NetWatch: Protein Databases

Androgen Receptor Gene Mutations Database
The Androgen Receptor Gene Mutations Database is part of the Nuclear Receptor Resource (NRR) Project, a group of associated databases on the thyroid and steroid receptor superfamilies. The androgen receptor (AR), which is encoded on the X chromosome, is activated by testosterone and dihydrotestosterone, and regulates both primary and secondary male sexual characteristics. Mutations that affect the AR are commonly found in metastatic prostate cancers, thus rendering them insensitive to treatment with AR antagonists. Congenital mutations that affect the AR are associated with androgen insensitivity syndrome (AIS), a condition that affects the sexual development of XY individuals and varies greatly in its phenotypic presentation. This database includes a catalog of mutations associated with prostate cancer or AIS and a list of proteins with which the AR interacts or is predicted to interact. References documenting mutations and protein-protein interactions are provided. The site also includes graphical maps that illustrate the positions of residues affected by mutations as well as the positions of residues important for specific protein-protein interactions.
The Biological General Repository for Interaction Datasets (BioGRID) is a collection of published protein-protein and genetic interactions, which are curated from published data from both high-throughput and more-focused studies. The interactions include those documented in many species, such as common model organisms and humans. Entering a protein or gene name into the search form returns a list of binding partners for the protein of interest and a list of genetic interactions for the gene of interest. The method by which each interaction was identified and references for each reported interaction are indicated. The “Publication Summary” for each reference notes the interactions that were reported in that study. The contents of the database can be downloaded, and plugins that facilitate the import of BioGRID data into Cytoscape are available. A help wiki provides support for searching, defines terms used in the database, and explains the curation procedures and how to contribute to BioGRID.
Biological Macromolecule Crystallization Database
The Biological Macromolecule Crystallization Database (BMCD) contains molecule, crystal, and crystallographic data for published structures of proteins and protein complexes, nucleic acids, viruses, and other macromolecules. The information in this database is targeted to structural biologists seeking information about crystallization conditions and technical information about published crystals. BMCD entries do not include the structures or links to them but do include the PubMed ID and full citation for each crystal structure’s publication. In addition to the simple text search, the advanced search feature enables users to search the database by resolution, year of publication, or crystallization parameters like concentration, pH, and temperature. The BMCD is maintained by the National Institute of Standards and Technology.
Biological Magnetic Resonance Data Bank (BMRB)
The Biological Magnetic Resonance Data Bank (BMRB) is a repository for nuclear magnetic resonance (NMR) data for peptides, proteins, and nucleic acids. The contents of the database may be searched by the identifier, sequence, or name of a macromolecule of interest or by experimental methods, NMR parameters, or kinetic or thermodynamic data. Information about experimental methods, instrumentation, and data validation is available for each macromolecule, as well as a Java-based data viewer for visualization of the NMR spectra and quantitative data. BMRB also includes a separate database of NMR data on metabolites, which can be queried by the name of the metabolite of interest or its molecular formula, mass, structure, or characteristic NMR peaks. The macromolecule and metabolomics databases must be searched separately. BMRB provides links to tools for validating NMR data and structure files and for converting data between different formats. The site includes details on how to deposit data into BMRB and pages with links to tools for NMR spectroscopists and programmers.
Cytokines & Cells Online Pathfinder Encyclopaedia
The Cytokines & Cells Online Pathfinder Encyclopedia (COPE) is part of a site designed to help users "Cope with Cytokines". Horst Ibelgaufts' site provides basic information on cytokines and their nomenclature through an alphabetized index. The information is actually an electronic, revised, and updated version of the "Dictionary of Cytokines", published in 1995 by VCH Publishers Inc., which is now out of print. There are other resources lurking under the "Browse contents, new entries, subdictionaries" link that include "miniCOPE dictionaries" on apoptosis, chemokines, hematology, metalloproteinases, and virulence factors. There is also a list of cell lines (over 200 of them!) used in cytokine research.
Database of Interacting Proteins (DIP)
The Database of Interacting Proteins (DIP) is an interactive database based on published data regarding protein-protein interactions that allows you to search for protein partners. Sequences can be tested for reported interacting proteins or searches can be performed using a text interface using keywords or citation information. New cross-referencing feature links to ProLinks, a database with more than 10 million linkages in over 80 genomes. The site is free for non-profit, academic use.
Domain Club Browser
The Domain Club Browser complements a Research Article published in the 24 November 2009 issue of Science Signaling in which Jin et al. describe a proteome-wide clustering method to identify eukaryotic protein domain combinations that correlate with evolutionary change, focusing on 623 protein domains from the Simple Modular Architecture Research Tool (SMART) database. The authors organized proteins from seven eukaryotic species (yeast, slime mold, nematode, fruit fly, zebrafish, chicken, and human) into 1245 "domain clubs" that share similar domain composition but do not necessarily share similar domain organization. To identify domain clubs to which a specific protein belongs, users may choose the human gene symbol from the drop-down menu on the search page (only data for human proteins is available at this time). The result is a PDF file that includes a schematic diagram of the protein indicating which domains it contains and, for each domain, a domain club profile, which includes information on how many domain clubs contain that domain across the seven species analyzed and how many members within each domain club contain that domain. Alternatively, users may explore the data from the domains side by selecting a protein domain from the drop-down menu to retrieve a PDF file that contains a summary of the information for that domain in the Domain Club data set. Searching the collection by domain name returns a report that includes a summary of the domain clubs that contain that domain across the seven surveyed species plus graphical schematic depictions of all the human proteins that contain that domain, grouped into domain clubs, which is useful for comparing domain architecture between individual club members. A sample domain profile on the search page is labeled as a key to help users interpret the data included in the PDF reports.
EF-Hand Calcium-Binding Proteins Data Library
The EF-Hand Calcium-Binding Proteins Data Library has sections on general, sequence, and structural information, and analytic tools that allow you to find homologs for EF-hand proteins or calculate the per residue solvent accessible surface area for several EF-hand proteins. This site does not appear to actively maintained. However, there is still valuable information for the calcium-binding protein afficianado.
GPCRDB: Information System for G Protein-Coupled Receptors
Sequences, alignments, models, mutants, phylogenetic trees, ligand-binding data, and more...a labor of love to delight the true fan of G protein-coupled receptors. The discussion forum provided by Scientist Solutions offers practical advice for wet lab research on these proteins. Viewing the models requires RasMol, but the snake-like diagrams are readily viewable and available for downloading as scalable vector graphic (SVG) or GIF files. This site was updated in 2005.
Guide to Pharmacology
The Guide to Pharmacology is a collaborative effort of the International Union of Basic and Clinical Pharmacology Database (IUPHAR-DB) and the British Pharmacological Society’s Guide to Receptors and Channels (GRAC). These two databases include structural, functional, and expression information on G protein-coupled receptors (GPCRs), nuclear hormone receptors, and voltage- and ligand-gated ion channels. The information in each database is curated by experts and includes information on these receptors, their ligands, and the drugs (both approved and experimental) that act on them. With the integrated search tools, users can search the contents of both databases by receptor or ligand name or identifier, or by keywords, authors, or PubMed IDs of key references. The information is targeted to researchers and students of pharmacology but is appropriate for basic and clinical scientists of all disciplines. The site includes documentation and a tutorial that describe the information available and how to access it.
Guide to Receptors and Channels (GRAC)
The Guide to Receptors and Channels (GRAC) is a searchable online version of the British Pharmacological Society’s “Guide to Receptors and Channels,” published by the British Journal of Pharmacology. Users can download a complete PDF of the guide or browse and search the contents online. The site includes information on G protein–coupled receptors, nuclear hormone receptors, voltage-gated ion channels, ligand-gated ion channels, and other ion channels. Related receptors and channels are grouped together into subfamilies for which there is a general overview, suggested reading, and references. The subfamily pages also include links to detailed information about the human, rat, and mouse genes that encode members of each family. For some receptors, users can also follow links to the International Union of Basic and Clinical Pharmacology (IUPHAR) Database to access structural, physiological, pharmacological, and clinical information. Both GRAC and the IUPHAR Database can be searched through the online Guide to Pharmacology.
Human Protein Reference Database (HPRD)
Under the guidance of Dr. Akhilesh Pandey of Johns Hopkins University and the Institute of Bioinformatics, a group of biologists, bioinformaticists and bioengineers created this site based exclusively on manually curated information. The site offers illustrations and detailed information about proteins, including molecular functions and protein-protein interactions. The query interface offers many different ways to locate a protein in the database, such as based on molecular class, protein domain, structural motifs, chromosomal location, and tissue expression. Users can also suggest a new protein or become a molecule authority. The site includes a tool called PhosphoMotif Finder, which identifies phosphorylation motifs in submitted sequences based on matching those found in the published literature. Academic use of the site is free. is devoted to the "genomics, evolution and function of protein kinases." It is maintained by Gerard Manning's lab at the Salk Institute, the same group that generated the Human Kinome Poster. The site includes KinBase, a multi-species kinase database. KinBase searches can be restricted by name, component protein domain, family-subfamily classification, or species. Pages for individual kinases include chromosome map position, if known, links to information about that protein at iHOP, PhosphoSite, and Gene Cards, as well as sequence data at Entrez Gene. includes links to ongoing kinome projects in various species and provides a free phylogenetic tree navigation tool called HyperTree. The homepage and "What's New" section have summaries of recent relevant research as well as news about the site and updates to KinBase or kinome projects.
Kinomer is a multispecies library of protein kinases classified into groups based on sequence and functional similarities. Kinases are divided into eight conventional and four atypical protein kinase groups; descriptions of each group are available on the site. The library includes kinases from various organisms: 16 fungi, 7 mammals, 6 plants, 4 fish, 4 insects, 3 protozoa, chicken, sea squirt, and more. Users can search the database to identify specific types of kinases in a single species or in multiple species; one can generate, for example, a list of the tyrosine kinases from chicken, the casein kinases from all 16 species of fungi, or every atypical protein kinase in the mouse. Jalview, a Java sequence alignment editor applet, allows users to sort, perform alignments, determine consensus sequences, and generate molecular phylogenies with all or a subset of the search results. Users may also upload or paste a protein sequence into a search window to classify a kinase into one of the 12 main groups and compare it to related kinases. This resource was developed by Geoff Barton’s research group at the University of Dundee, and the library is freely available for download.
Ligand-Gated Ion Channel Database
Created by Nicolas Le Novère and Jean-Pierre Changeux, the Ligand-Gated Ion Channel Database (LGICdb) contains nucleic acid and protein sequences of subunits of three classes of ligand-gated ion channels: The Cys-Loop Superfamily, the ATP-Gated Channels, and the Glutamate-Activated Cationic Channels. Multiple sequence alignments can be easily generated, and some phylogenetic studies of the superfamilies are provided. The nomenclature is unique, but there is a key to the naming convention. Data can be accessed by searching keywords or sequences or by browsing within each channel superfamily. Custom sequence alignments are very easy to generate, although alignments of complete genes takes a bit of time. LGICdb is a project of the Computational Neurobiology group at EMBL-EBI.
LOCATE: Subcellular Localization Database
LOCATE is a database with information about the subcellular localization and membrane topology of proteins from the mouse RIKEN FANTOM3 protein sequences. The information can be accessed through various browsing options and by searching. For example, by simply clicking on a subcellular compartment in the visual representation of a cell, one is provided with a list of proteins that can be further sorted. The database may also be searched and the output is available in multiple formats, including machine-readable options. The high-throughput, computational pipeline MemO was used to predicted membrane organization. The subcellular locations of the proteins were determined by a high-throughput, immunofluorescence-based assay and by manual review of peer-reviewed publications. This database is maintained by the Institute of Molecular Bioscience, The University of Queensland and the ARC Centre in Bioinformatics.
MINT: The Molecular INTeraction Database
The MINT database contains curated information about experimentally verified protein-protein interactions, and you can search the database to find your favorite protein's binding partners. Interaction information obtained from the literature and from high throughput screens includes binary interactions only; information on higher-order complexes is not available. The search engine works best if you use an accession number or specific gene name rather than a family name. You can view your favorite protein's binding partners as a list, link to the literature that illustrates the interaction, and even download binding partners' sequence files easily.
Modeller is a tool for comparative modeling of three-dimensional protein structures. To generate a model, the user submits an alignment of the sequences to be modeled and a structure on which to model the sequences. Users at academic and nonprofit institutions may download the Modeller software at no charge after completing a license agreement. A list of frequently asked questions (FAQ), a user-editable wiki, tutorials, and a user manual are available in the Discussion Forum and Documentation sections of the site. Modeller was created and is maintained by Andrej Sali’s lab at the University of California, San Francisco. Other protein structure modeling tools from the Sali lab include ModBase, a database of theoretically calculated comparative protein structure models; ModWeb, a Web-based tool for protein structure modeling; ModLoop, a Web-based tool for modeling loops in protein structures; and ModEval, a Web-based tool for evaluating protein structure models generated with the Modeller software.
Molecular Class-Specific Information System (MCSIS)
This collection of information systems has grown from the original GPCRDB (G protein-coupled receptor database) to now include databases for five different types of proteins: The GPCRDB, The NucleaRDB (nuclear receptors), The PrionDB (prion proteins), The KChannelIDB (potassium channels), and the GPCRIPDB (GPCR interacting proteins). Groups in Europe and the US collaborate to create, maintain, and curate these databases and information systems.
Nuclear Receptor Resource (NRR)
The Nuclear Receptor Resource (NRR) is a collection of information on nuclear receptors (NRs), including their structure, function, and targets. It includes general information on the nuclear hormone receptor superfamily, such as nomenclature and tissue-specific expression patterns of genes that encode NRs. A collection of pages provide information on specific NRs, such as the estrogen receptor alpha, the androgen receptor, glucocorticoid receptor, and vitamin D receptor. A graphics library includes diagrams illustrating NR structure, ligands, transcriptional activity, and specific NR pathways are appropriate for learning and teaching.
NucleaRDB: Information System for Nuclear Receptors
NucleaRDB is contains sequences, multiple sequence alignments, phylogenetic trees, and other information about nuclear receptors.
NURSA: Nuclear Receptor Signaling Atlas
An information portal for members of the nuclear receptor research community, NURSA seeks to develop an understanding of the structure and function of all nuclear receptors. The site serves as a comprehensive database of all relevant findings in the field, with features that include a detailed, user-friendly animated tutorial, an e-journal, a searchable molecular database with PubMed links, and a library of annotations and resource links. There’s even a calendar of upcoming meetings. Coming soon are personal laboratory pages for researchers, an interactive discussion forum, and a jobs database.
Orientations of Proteins in Membranes database
This database, maintained by Andrei Lomize, Mikhail Lomize and Irina Pogozheva of the University of Michigan, provides representations of proteins with respect to the lipid bilayer. A computational method is applied to optimize the representation of proteins with known structures relative to the lipid bilayer. The initial data includes predominantly integral membrane proteins and a some peripheral membrane proteins or peptides that interact with the membrane. The structures are based on data from the Protein Data Bank (PDB). The site can be searched or browsed, and the files are available for downloading. Details about the computational analysis is also provided. Each structure is available as an image oriented with respect to the membrane. One of the neatest features is that each structure has a Chime, Jmol, or Webmol version that allows the orientation from the intracellular and extracellular side as well as packing through the membrane to be readily visible.
Peptide Atlas
The Peptide Atlas is a repository for annotated protein sequences obtained from tandem mass spectrometry experiments. The database is maintained by the Institute for Systems Biology Proteome Center and accepts submissions from both published and unpublished studies. It includes organism- and sample-specific collections of peptides, such as those from human urine and mouse plasma, that are assembled by combining multiple data sets. The Peptide Atlas is the repository for data from the Human Plasma Proteome Project (HPPP), an international consortium of members of the Human Proteome Organisation (HUPO). Data sets may be downloaded in their entirety or searched online by protein name or peptide sequence. The record for each protein includes the position and sequences of peptides in that protein that were identified in the various data sets and information about the samples in which the protein was identified. Each peptide record includes information about the physical properties of the peptide, spectra from the individual experiments in which the peptide was identified, and genome mapping data. The cell type, experimental conditions, and type of instrumentation used to analyze each sample are provided, along with links to publications describing the data sets. Instructions for contributing to the database are provided on the site.
Pfam is a database of protein domain families that can be used to identify domains in a protein of interest and to learn more about the structure, function, and evolution of those domains. The protein domain families in the database are divided into two categories, Pfam-A and Pfam-B, which differ in the quality of their sequence alignments. Pfam-A families are small groups built from manually curated short alignments, which are expanded to full alignments by an automated process. Pfam-B families are generated by an entirely automated process and represent a larger set of alignments. Together, these two datasets confer both reliability and comprehensiveness to the database. The database may be searched by query sequence, accession number, PDB identifier, family name, or keyword. Each family’s entry includes extensive information about the sequence, structural and functional characteristics, and variations among family members. Information about the domain organization of family members, the distribution of family members across different species, and documented protein-protein interactions is also available. Phylogenetic trees illustrate the relatedness of different family members, and interactive JMOL images allow the user to examine structures of the proteins from any angle.
PHOSIDA Posttranslational Modification Database
The Phosphorylation Site Database PHOSIDA was established as a collection of data on phosphorylation sites, but now also includes information on acetylation and glycosylation sites identified by mass spectrometry analysis in Matthias Mann's laboratory at the Max Planck Institute for Biochemistry. The database includes protein phosphorylation sites from mouse, human, fly, nematode, yeast, and several species of bacteria, plus human protein acetylation and mouse protein N-glycosylation data. Identify modified sites in a protein of interest by searching each species-specific dataset by gene identifier or by protein name or sequence. The record for each protein includes information about its component domains, modification sites identified by Mann's group, references supporting the modifications, and links to information about modification sites in other databases. For each modification site, there is information about predicted secondary structure, evolutionary conservation, dynamics of modification, and the biological context in which the modification was detected. The phosphorylation site predictor identifies likely positions of phosphothreonine and phosphoserine residues and motifs targeted by specific kinases within a protein of interest. The "Background" section provides information on the methods used to predict phosphorylation sites, links to the published mass spectrometry analyses, instructions for downloading the datasets, and details about the sequence, structural, and evolutionary analyses of the phosphoproteins in the database.
Phospho.ELM is a curated database of both experimentally verified and manually annotated serine, threonine, and tyrosine phosphorylation sites in eukaryotic proteins. Search the database by protein or gene name or by Uniprot or Ensembl identifier to access information about phosphorylation sites in proteins of interest. Users may also select a kinase or a phosphopeptide binding domain from a drop-down menu to view proteins phosphorylated by a specific kinase, such as MAPK1, or that contain phosphorylation sites recognized by a specific binding motif, such as the Src SH2 domain. Users may submit a sequence query through the Phospho BLAST Search tool to identify portions of a particular protein that match phosphopeptides contained in the database. Each phosphoprotein's record includes its phosphorylation sites and their sequence contexts, links to PubMed references supporting the existence of each phosphorylation event, and the identity of the kinase(s) responsible for each modification. For most phosphoproteins, there are also links to interaction networks in NetworKIN, STRING, or PHOSIDA. Records also include information about each phosphoprotein's subcellular localization and the tissue(s) in which it is found, links to relevant BioCarta pathways, and information about binding partners in the MINT database. Users can request access to a tab-delimited file containing all of these phosphorylation sites for academic or other noncommercial purposes.
PhosphoPep is a database of protein phosphorylation sites identified by mass spectrometry (MS) analysis; it includes data from human, fruit fly, nematode, and yeast. Each of the four organism-specific libraries may be searched for phosphopeptides that are present in proteins of interest or that match user-defined spectral data. Lists of proteins and pathways represented by phosphorylated peptides in each dataset are available for browsing. For each protein that contains one or more phosphorylated peptides, the sequence and position of each phosphorylation site within the protein, as well as the original MS data for each phosphopeptide, are provided. PhosphoPep enables users to access KEGG pathways, view STRING protein-protein interaction networks, identify predicted motifs with Scansite, or create a Cytoscape network for each phosphoprotein. The libraries of MS data are available for download, and the Usage Guide includes information about terms used in the database, including those used to describe the quality of the spectral data, and a downloadable tutorial.
Phosphorylation Site Database
The Phosphorylation Site Database provides information on prokaryotic proteins that undergo serine, threonine, or tyrosine phosphorylation. Information in the database may be searched by the name or accession number of the protein or gene of interest, the phosphorylation site sequence, the phosphorylated amino acid residue, or a literature citation. All information in the database comes from the primary scientific literature, and the database was constructed and is maintained by Peter J. Kennelly, Susannah Wurgler-Murphy, and Douglas M. King at Virginia Polytechnic Institute and State University.
Posttranslational modifications are critical elements of cellular signaling. PhosphoSite, a database of protein phosphorylation sites, has been updated to include posttranslational modifications other than phosphorylation. The expanded database, PhosphoSitePlus (PSP) includes information on acetylation, methylation, sumoylation, neddylation, ubiquitination, and GlcNAcylation in addition to phosphorylation. Users can search the database by protein name or ID, domain name, sequence, disease, cell line, or tissue type to see the modifications made to proteins of interest or to identify known targets of modifying enzymes, such as a specific kinase or ubiquitin ligase. Entries for individual proteins include general information on function and subcellular localization, plus links to additional information in other databases. A linear map of each protein shows the positions of modification sites; clicking on a particular modification site opens a summary page with information about that particular modification site, links to other databases, references supporting the modification, information about the biological context of the modification, and relevant cell lines and diseases. The database is manually curated by scientists at Cell Signaling Technologies (CST), and the pages include links to CST products related to the modified proteins. The “Using Phosphosite” section gives an overview of the database and how to search its contents, and the “About Phosphosite” section provides basic information about posttranslational modifications and the rationale behind the design and scope of the database.
PPAR Resource Page
The PPAR Resource Page is part of the Nuclear Receptor Resource (NRR) project, a group of associated databases on the thyroid and steroid receptor superfamilies. Peroxisome proliferator-activated receptors (PPARs) are steroid hormone receptors that heterodimerize with the retinoid X receptor (RXR) to regulate transcription in response to binding to specific fatty acids or their derivatives. The site includes graphical overviews of PPAR pathways and links to interactive pathways for several PPAR subtypes. The site also includes a link to PPRESearch, a tool for identifying PPAR response elements (PPREs) in submitted promoter sequences.
Prosite, a searchable database of protein domains and functional sites, is part of the ExPASy group of proteomics databases and tools. Users may search by domain name to access functional, structural, and taxonomic information, as well as consensus sequences and alignments of individual sequences that represent a domain, such as SH2 or histidine kinase domains. Prosite will scan user-supplied protein sequence data to identify domains and motifs and determine to what family the protein belongs. Users may enter their protein sequences through the main page for a quick scan or through the advanced tool ScanProsite for more control over search parameters. Search results are returned as a list of conserved domains found in the input protein sequence and include both general information about each domain and information about specific proteins in which the molecular function or structure of a particular domain has been characterized. Prosite also provides information about the characteristics shared by proteins that contain a particular domain, so users can make predictions based on domain content and structure. Users can also access alignments that may contain hundreds of individual sequences to see the degree of sequence variability in homologous domains. The MyDomains Image Creator can be used to make customized domain cartoons for one's favorite real or hypothetical protein. The PRATT tool allows users to identify conserved domain patterns from groups of proteins that do not align by sequence.
Protein Information Resource (PIR)
The Protein Information Resource (PIR) is produced by collaborating groups at the University of Delaware and Georgetown University Medical Center and is part of the UniProt consortium. The purpose of PIR is to assist researchers in integrating proteomic and genomic information and promote standardization of protein annotation. Users may access several databases and tools from the site, including PRO, the Protein Ontology tool, which represents classes of proteins that are evolutionarily related to one another, that interact with one another to form a complex, or that represent different isomers or modified forms of proteins produced from the same coding sequence. iProClass integrates protein sequence, expression, family, function, and structure information by cross-referencing information from many databases, including UniProt, Pfam, GO, PDB, and OMIM. A tutorial shows users how to conduct individual protein or batch searches with iProClass and how to use and interpret the search results. PIR SuperFamily (PIRSF) is a system for classifying proteins based on their evolutionary relationships and can be used to identify phylogenetic relationships, functional convergence and divergence, or relationships between proteins that share structural similarities or have similar domain architecture. The iProLINK (integrated protein literature, information, and knowledge) tool helps users mine the published literature for proteomic information.
Protein Kinase Resource
The Protein Kinase Resource contains a collection of tools and data for aficionados of these central enzymes of signal transduction. An active and helpful "Protein Kinase Discussion Group" offers access to the know-how of other kinase researchers.
Protein Tyrosine Phosphatases
Cold Spring Harbor Laboratory (CSHL) and Novo Nordisk collaborated to produce this collection of sequence, structural, and functional information on protein tyrosine phosphatases (PTPs). There is some general introductory material about this superfamily of protein phosphatases and its subdivision into more specific classes on the home page, and additional pages contain specialized information such as phylogenetic trees, structures, and genetic maps. The “Protein Sequence Analysis” section includes a database of PTP transcript sequences and alignments, plus phylogenetic and functional classifications of PTPs across species. The “3D Structural Analysis” section provides literature references and Protein Data Bank (PDB) IDs for accessing PTP crystal structures, plus information on conserved motifs and active sites. The “Bioinformatics” section includes a BLAST tool and a Tree-building tool for performing alignments and phylogenetic analyses of user-defined PTP sequences. Annotation and mapping data for human PTPs are presented in the “Human Genome Analysis” section, and links to additional resources, such as databases and tools for doing bioinformatics, are collected under the “Links” heading. Most of the data may be downloaded in PDF or Excel format. The Protein Tyrosine Phosphatase site is a useful resource for researchers who are annotating or analyzing PTPs and will also be interesting to anyone who just wants to learn more about PTPs.
Proteome Commons
Proteome Commons was an online proteomics resource created and maintained by the laboratory of Philip Andrews at the University of Michigan. As of spring 2013, this project and the Tranche database of proteomic data ceased to be available.
Psychoactive Drug Screening Program (PDSP)
The NIMH Psychoactive Drug Screening Program (PSDP) database is a searchable, interactive database of Ki values (affinity constants) for a large number of G protein-coupled receptors (GPCRs), transporters, and ion channels. The database is updated daily and has a feature for user-supplied data. If you are looking for what proteins may bind your test ligand and with what affinity or if you're looking for a ligand to use to identify a receptor, then this database is incredibly useful. Access to the database is free and it is a NIH-sponsored, non-commercial site.
PTMScout is a Web-based tool for evaluating and viewing data from mass spectrometry experiments and other analyses of posttranslational modifications, including lysine acetylation and serine, threonine and tyrosine phosphorylation. Information for each data set includes a summary of the experiment, the results, and a link to the publication describing the data set. Users may view details of the data supporting each reported modification, compare the results of two or more experiments to identify areas of overlap or uniqueness, or search the database to view reported posttranslational modifications for a protein of interest. Site documentation includes instructions for uploading data sets for analysis by PTMScout. Those who wish to use the software for analyzing nonpublic data sets can contact the administrator to download the software for use on their own computers. PTMScout is a noncurated, open-access database.
The Reactome database, which provides a curated resource of core pathways and reactions in human biology, is being developed through a collaboration among Cold Spring Harbor Laboratory, The European Bioinformatics Institute, and The Gene Ontology Consortium. Written by researchers and cross-referenced with with PubMed, GO, and the sequence databases at NCBI, Ensembl and UniProt, Reactome contains information on topics ranging from apoptosis to xenobiotic metabolism, including sections on cell cycle checkpoints, insulin receptor-mediated signaling, and the Notch signaling pathway.
ROSPath, a database being developed by researchers at the Center for Cell Signaling Research at Ewha Women's University, is intended to provide organized information pertaining to signaling mediated through reactive oxygen species.
Scansite allows you to submit a protein sequence to search for binding motifs and phosphorylation sites. It is very easy to use and the output is clearly presented. Michael B. Yaffe and Lewis Cantley are the Project Directors.
The SCOP+ASTRAL Web site integrates information from the Structural Classification of Proteins (SCOP) database and ASTRAL Compendium. The SCOP collection categorizes structures from the Protein Data Bank (PDB) into groupings based on shared structural characteristics. The ASTRAL database was partially derived from SCOP and includes tools for analyzing protein sequence and structure. This merged resource gives a broad overview of protein folds and the structural and evolutionary relationships between proteins for which structures have been solved. The collection is hierarchically organized into structural class, fold type, superfamily, family, species, and, finally, individual structural domains. Users may search the database with keywords, gene names, species, or identifiers or browse the collection hierarchically. This collection was built through a combination of manual curation and automated classification of protein structures and includes sequences, structure images, and links to PDB entries for each domain. Users may download the SCOP+ASTRAL relational database.
SMART (Simple Modular Architecture Research Tool)
Simple Modular Architecture Research Tool (SMART) is a database of protein domains and protein domain architecture. Release 6 (2008) includes information on 784 domains, with each domain's page including a description of the domain and its defining characters, consensus sequence, phylogenetic distribution, function, 3-D structure, and links to KEGG pathways in which the domain is used. Domain annotations may be searched by entering keyword or domain name. Information on the domains contained in a protein of interest may be accessed by entering the Uniprot ID, accession number, or sequence of the protein into the Sequence Analysis search box. Each protein record includes a schematic representation of the protein's domain architecture with links to information on the individual domains, a list of proteins with domain composition or architecture similar to the protein of interest, information on pathways in which the protein of interest participates, and a link to the protein's STRING interaction network. The Architecture Analysis search identifies proteins that contain a user-defined combination of domains and may be limited by Gene Ontology (GO) terms and taxonomic distribution. The database may be searched in "genomic" mode, in which only proteomes of organisms whose genomes have been completely sequenced (630 species in release 6) are included in searches, or in the "normal" mode, which includes all proteins in the UniProt and Ensembl databases but which may include duplicates. A Glossary defines terms used in the search interface and in returned results. Users can submit batch searches, and free licenses are available to academic researchers to download the data for local searches.
Stephen White Laboratory at UC Irvine
The White laboratory, which conducts research concerning the folding and stability of membrane proteins, maintains various membrane and protein biophysics resources. The Membrane Protein Resources include a list of membrane proteins with known 3D structures, a searchable membrane protein topology database, and the Membrane Protein Explorer--a tool for examining the topology of membrane proteins. With beautiful images of protein structure, links to the Protein Data Bank, PubMed and numerous other useful sites, this site provides an invaluable resource for anyone interested in problems related to the structure of membrane proteins.
Structure-Function Linkage Database (SFLD)
Need help determining an enzyme's function? Try the Structure-Function Linkage Database (SFLD), which contains sequence, structural, and functional information on well-characterized enzymes organized hierarchically by shared biochemical functions. The sequence of interest can be used in a BLAST search or analyzed by Hidden Markov Models to identify sequence and structural features that reveal enzymatic function based on similarity with enzymes in the database. Comparison of a user's sequence with those in the database can be used to identify amino acid residues required to perform specific biochemical reactions or develop strategies for engineering enzymes with new functions. The contents of the database can be browsed hierarchically or by biochemical reaction, and the collection is searchable by enzyme name, reaction, PDB identifier, or enzyme functional domain identifier. Structures of enzymes can be viewed through the molecular graphics program UCSF Chimera. Video tutorials offer an overview of the contents and usage of SFLD and a lesson on using sequence similarity networks from SFLD in Cytoscape. SFLD was developed by Patricia Babbitt's laboratory and the Resource for Biocomputing, Visualization, and Informatics (RBVI), which is funded by the National Institutes of Health (NIH) and National Institute of General Medical Sciences (NIGMS).
The Binding Database
The Binding Database (Binding DB) is a database of measured binding affinities between proteins and their ligands, with emphasis on interactions between potential drug targets and the small molecules that bind to them. Users can search the database by the name of the target protein or chemical compound of interest, the type of assay used to measure binding affinity, or author or title information from articles reporting binding affinities. Search results include links to structural and sequence information about the protein, structural and chemical information about the ligands, articles reporting binding affinities, and details about the method used to determine each reported binding affinity. Video tutorials show users how to obtain the binding data from a specific published article or on a particular protein or compound of interest. A user guide is included in the "Info" section of the site, along with a glossary of terms and search templates that allow users to quickly retrieve the information they seek from Binding DB. Users may register to create a free MyBDB account through which they can maintain a list of protein targets of interest for quick access to binding affinities relevant to their work. The site also includes information on how to contribute data to Binding DB and a list of conferences of interest to those working in medicinal chemistry or on drug discovery. Binding DB was created and is maintained at the Skaggs School of Pharmacy at the University of California, San Diego.
The CAspase Substrate dataBAse Homepage (CASBAH)
Caspases are a family of proteases crucial to cell death and inflammation. The CASBAH is a database of mammalian proteins reported to act as substrates for caspases. Users can enter the name of a protein of interest into the search box or leave the box blank to return a list of all the caspase substrates in the database. As of March 2010, there were over 700 identified caspase substrates in the database. The entry for each substrate includes links to that protein’s UniProt record, proteolytic cleavage sites, and biological consequence of cleavage, all linked to literature references supporting the information. The CASBAH was created and is maintained by Seamus Martin’s laboratory at Trinity College, Dublin.
The Human Protein Atlas
The Human Protein Atlas is a database of human protein expression data from the Swedish Human Proteome Resource (HPR), a project that aims to produce monoclonal antibodies specific to every human protein and then use these antibodies to generate comprehensive expression data. The atlas includes histological data for about 5,000 proteins from 48 tissues, 20 cancer cell types, and 47 human cell lines. The database may be searched by protein name, or users can browse functional categories such as “kinases” or “G protein-coupled receptors.” Search results are returned as entries in a table, each of which includes links to external databases that contain information about the protein, the ID number of the antibody that recognizes the protein, and information about additional expression data that validate or contradict the histological data generated with the antibody. By navigating through the results, users can access the histological expression data obtained with each antibody in normal and cancer tissues and in cultured cells. Results from immunofluorescence, protein array, or Western blot assays used to verify the histological data can be accessed through color-coded links that indicate whether or not they support (green) or contradict (red) the histological data. In addition, users can also obtain detailed information for each protein, including ontology, cytological location, protein topology, transcript information, and links to Entrez Gene, EnsEMBL, and UniProt records. Information about how to purchase the antibodies is also available.
The IUPHAR Database
The International Union of Basic and Clinical Pharmacology (IUPHAR) maintains a searchable database (IUPHAR-DB) of G protein-coupled receptors (GPCRs), nuclear hormone receptors, and voltage- and ligand-gated ion channels. For each type of receptor, tables summarize the information available in the database, including the ligands that activate each protein and the human, rat, and mouse names for each protein. Clicking on the official IUPHAR receptor name opens a page that contains detailed information on the structure, function, tissue distribution, and ligands of individual receptors. The IUPHAR-DB also includes information on ligands, including peptides and synthetic and natural compounds. Information in the database is reviewed by the IUPHAR Committee on Receptor Nomenclature and Drug Classification (NC-IUPHAR).
Viral Post-Translational Modification Database (virPTM)
Viruses use phosphorylation systems of host cells to enhance their replication and to interfere with host immune responses, so knowing the sites that are phosphorylated in viral proteins is important for understanding virus-host interactions. The Viral Post-Translational Modification Database (virPTM) is a collection of demonstrated and predicted phosphorylation sites in proteins from viruses that infect humans. Schwartz and Church compiled virPTM from published data on the phosphorylation of proteins from more than 50 viruses that infect humans and developed an algorithm to predict other possible phosphorylation sites in viral proteins that remain to be experimentally tested.