Protocol

Quantitative Information Management for the Biochemical Computation of Cellular Networks

See allHide authors and affiliations

Science's STKE  31 Aug 2004:
Vol. 2004, Issue 248, pp. pl11
DOI: 10.1126/stke.2482004pl11

Abstract

Understanding complex protein networks within cells requires the ability to develop quantitative models and to numerically compute the properties and behavior of the networks. To carry out such computational analysis, it is necessary to use modeling tools and information management systems (IMSs) where the quantitative data, associated to its biological context, can be stored, curated, and reliably retrieved. We have focused on the biochemical computation of cellular interactions and developed an IMS that stores both quantitative information on the cellular components and their interactions, and the basic reactions governing those interactions. This information can be used to construct pathways and eventually large-scale networks. This system, SigPath, is available on the Internet (http://www.sigpath.org). Key features of the approach include (i) the use of background information (for example, names of molecules, aliases, and accession codes) to ease data submission and link this quantitative database with other qualitative databases, (ii) a strategy to allow refinement of information over time by multiple users, (iii) the development of a data representation that stores both qualitative and quantitative information, and (iv) features to assist contributors and users in assembling custom quantitative models from the information stored in the IMS. Currently, models assembled in SigPath can be automatically exported to several computing environments, such as Kinetikit/Genesis, Virtual Cell, Jarnac/JDesigner, and JSim. We anticipate that, when appropriately populated, such a system will be useful for large-scale quantitative studies of cell-signaling networks and other cellular networks. SigPath is distributed under the GNU General Public License.

Introduction

The sequencing of the genomes of many organisms, including that of humans, has enabled initial estimates of the number of genes in each genome and the proteins they encode. These "parts lists" are gathered as large data sets stored in traditional bioinformatics databases such as SwissProt, GenBank, and Ensembl (1). Such databases are an essential tool for the study of biological systems. However, to understand the functional capabilities that arise from interactions among the parts, or components, in these parts lists, other information is required. This information includes the concentrations of the components in the cell and the mechanism and kinetics of their interactions. A wealth of such data has been collected over the past 30 years in the vast literature of regulatory biochemistry and cell physiology. These functional and mechanistic data need to be linked to their genomic context and stored in an appropriate format that will allow their use in the development of quantitative biochemical models to analyze integrated cellular processes. The data that currently exist are largely incomplete, and additional large-scale data sets are required to obtain a detailed biochemical description of the components that interact in the complete system. We have developed a quantitative information management system (IMS) that links the quantitative data to the genomic information about the cellular components and that allows for the storage of the elementary reactions between the components. The creation of such an IMS should facilitate estimation of the currently available data and specify what new data need to be gathered. The merging of this information can then be exploited to build large-scale quantitative models. We have used signaling networks as an example of complex cellular processes that are amenable to quantitative analysis. Signaling networks are formed by interconnected signaling pathways, and these networks regulate multiple cellular functions in a coordinated manner (2). Signaling networks result from various types of interactions, including protein-protein interactions and those between small molecules and proteins and other cellular components, and many enzymatic reactions (2, 3).

In developing the SigPath system, we have focused on two major questions:

(i) How can components and reactions be associated with their kinetic properties to streamline the construction of quantitative models?

(ii) How can we design a Web-based system that allows a whole community of researchers to assemble and refine the information needed to construct quantitative models?

To address these issues, the IMS stores quantitative details of the signaling components and allows for the assembly of elementary reactions which, when coupled, can describe signaling pathways and subsequently large-scale networks. Such a system bridges bioinformatics databases that focus on qualitative data of individual biological components and modeling environments that focus on mathematical analysis of integrated dynamic systems.

Figure 1 presents a comparison of SigPath with genomic and protein-protein interaction databases and describes the types of information supported by SigPath. In genomic or protein databases, each gene or protein is represented as a separate entry. Chemoinformatics databases describe small molecules, such as adenosine triphosphate (ATP), in a similar fashion. Typically, many of the fields in these entries are textual annotations, and only the links from open reading frame to transcript to protein are encoded explicitly. To enable more structured queries, it is useful to explicitly define the information content for each component using an ontology. An ontology describes a set of terms or concepts and the structured relationships among them. For example, EcoCyc, a resource developed by Karp et al. (4), is an initial approach to the management of information about biochemical networks. Bacterial metabolic and gene regulatory networks are stored through the EcoCyc ontology. The EcoCyc ontology describes the components and the reactions that form the various metabolic pathways. The Pathway Tools that support the EcoCyc database allow read access to the data through a Web browser and offer a non-Web user interface to add to the data (5). SigPath also relies on an ontology to store the various relationships among biological entities. The SigPath ontology is based on the EcoCyc ontology (4). These ontologies share the concepts of chemical and enzymatic reactions. The SigPath ontology extends a subset of EcoCyc with features to store quantitative information (for example, concentration measurements, kinetic rates, and enzymatic parameters) needed to produce quantitative models. These extensions are further documented in the developer section of the project Web pages (http://icb.med.cornell.edu/crt/SigPath/docdev.xml).

Fig. 1.

Comparison of the SigPath IMS with genomic and protein-protein interaction databases. Each frame is divided into two parts. The upper part illustrates protein-protein interaction networks and signaling pathways and networks as they are stored in these databases. The lower part highlights the main types of biological entities found in these databases. Genes are shown as squares, transcripts as circles, and proteins as triangles. Other shapes represent chemical species. Genomic databases link genes to their transcripts and transcripts to the encoded protein. Protein databases also provide these links by linking to genome databases. Protein-protein interaction databases describe noncovalent complexes of proteins or protein subunits. SigPath stores reactions among protein and chemical species, enzymatic reactions, and quantitative models (see text for a complete description of the elements of information in a quantitative model).

The key elements of the design of the SigPath system (Table 1) include:

Fig. 2.

The components of SigPath and a biochemical computing environment. Each shape represents a part of the software, and arrows represent the flow of data and information within SigPath, within a modeling environment, or at the interface of the two systems.

Fig. 3.

MAPK pathway model used for validation. (Top) Conceptual model of the MAPK cascade. (Bottom) List of reactions included in a deterministic model of the cascade. The list shows the set of reactions submitted and saved in SigPath.

Fig. 4.

Comparison of manual simulation results with those obtained by automatic export of models from SigPath. (Upper left) The Gs pathway diagram (binding reactions only). (Upper right) Model of coupled enzymatic reactions from Raf to MAPK. SigPath produces such diagrams automatically; however, Greek letters are currently not supported, and the letters a, b, and g are used for α, β, and γ. The geometric shapes represent molecules involved in the model; colors indicate the type of molecule (red, complex of molecules; pink, proteins; blue, small molecules). Lines represent reactions and arrows indicate their direction, from substrate(s) to product(s). (Lower panels) Changes in the concentrations of three molecules in the models based on simulations (Gs_a_GTP: complex of Gαs subunit and GTP; Gs_A_GDP, complex of Gαs subunit with GDP; Iso_B2AR_Gs_abg_GDP: complex of isoproterenol with the β2-adrenergic receptor and Gαs subunit bound to GDP). Simulation results are labeled according to the method by which they were generated. Manual/KKit: The model was assembled manually in Genesis/Kinetikit. SigPath/KKit: Interactions were submitted into SigPath individually and the model was assembled and exported to Genesis/Kinetikit for simulation; SigPath/JSim: Interactions were submitted into SigPath individually and the model was assembled and exported to JSim for simulation.

Table 1.

Main components of the SigPath IMS.

(i) The SigPath ontology, described above, which provides a structured representation of the information, both qualitative (interactions) and quantitative (kinetics and concentrations);

(ii) A Web-based user interface to support remote, Web-based browsing, searching, submitting, editing, and reviewing the types of information managed by the system;

(iii) A mechanism to link information about interactions to information about the components described in bioinformatics and chemoinformatics databases (called the background information); and

(iv) User interfaces to assemble models and tools to export models directly in the formats of various biochemical computing environments.

These elements allow us to address the following questions:

(i) How can protein, genomic, and small-molecule databases be efficiently connected to resources for quantitative modeling and simulation of biochemical pathways?

(ii) How can a cell-signaling information system be created that will support submission of quantitative information and links to protein and small-molecule resources, yet that will not require the contributors to submit accession codes for each molecule referenced?

(iii) How can the need for contributors to submit information that already exists in other databases be avoided by facilitating automated import of existing information?

Connecting biological databases to modeling tools is essential for two main reasons. First, biological databases, such as protein or genome databases, contain current knowledge of the components of the cell. Second, biochemical modeling approaches enable the study of integrated systems composed of the very components described by biological databases. If the results of the genomic effort are to be useful in understanding cell components as an integrated system, efforts must intensify to develop gateways between these two types of resources. To achieve this bridging, we have imported the large current body of qualitative knowledge from external sources (such as SwissProt and TrEMBL for proteins and NCIOpen for small molecules) as background information. SwissKnife (6) is used to parse SwissProt and TrEMBL files. The background information for a protein imported in this way includes the name, a description, aliases, the source organism(s), and accession codes from the databases linked to by SwissProt and NCIOpen. This background information is used throughout SigPath to perform queries on selected specific molecules. Because background information is automatically imported into the IMS, the user rarely needs to enter this information explicitly when submitting interactions among components. Users can add interactions that both have and do not have quantitative information. Storing qualitative information alone can be useful because, in the appropriate information management environment, it can be converted to quantitative information. In SigPath, this is done by adding kinetic information to interactions previously entered qualitatively. Such iterative refinement of information can develop a database of "parts specifications" rather then simply a parts listing (where parts specifications describe the biochemical interactions of the components in the cell with other components, in a qualitative but structured way). These parts specifications should be useful in the development of large-scale quantitative models of cellular networks. Because the construction of quantitative models of signaling pathways and networks is the goal, the quantitative data in SigPath and the automatic connection to different computational environments permit the development of detailed simulations and predictive models of signaling networks (Fig. 2). As illustrated in the following examples, SigPath can serve not only to store and manage quantitative information on components and their relationships, but also to act as a bridge between the current databases and the large-scale functional networks that will have to be analyzed quantitatively in the future.

Biochemical Computation Using SigPath

To validate SigPath as an effective IMS and bridge to computational environments, we submitted qualitative and quantitative information about two pathways to SigPath: (i) the Gs pathway, based on signaling through the heterotrimeric guanine nucleotide-binding protein stimulatory α subunit (Gαs), and (ii) the mitogen-activated protein kinase (MAPK) cascade. The Gs pathway contains only binding interactions, whereas the MAPK cascade contains mostly enzymatic reactions. Therefore, together these two pathways test both types of quantitative information that SigPath supports in assembling models. These two models were then exported for simulation into two computational environments: Kinetikit/Genesis and JSim. This validation can therefore be considered as a proof of concept for SigPath both as a tool for managing information and as a bridge between the bioinformatics and computational systems.

The biochemical model for the pathway from guanosine triphosphate (GTP)-bound Ras to MAPK 1 and 2 was created by entering reactions into SigPath (Fig. 3). The Gs pathway was simulated on the basis of activation of the β-adrenergic receptor. Both the Gs and the MAPK pathways were validated by comparing simulations using Kinetikit/Genesis or JSim on manually entered data or data exported from SigPath (Fig. 4). In all cases, the results of the simulations were identical in both environments (Fig. 4, lower panels). The results confirm that storing information directly in SigPath, and producing quantitative models for Kinetikit and JSim produces simulations that are consistent with models set up manually in these simulators. We have also reproduced these results with Virtual Cell and Jarnac/JDesigner, which use the SBML (level 1) input format (data not shown).

These two simulations, which represent generalized pathways, may be considered canonical. Such canonical simulations could serve as a useful starting point for the development of cell-type-specific models. For instance, if different cell types contain various splice forms of Gαs in various ratios, then SigPath can be used to specify each of these isoforms as a distinct SigPath chemical. (The SigPath chemical, in this case, will represent a protein.) Quantitative information can then be attached to each isoform to develop a cell-type-specific model. Thus, the linking of the names of the various chemical entities to the existing qualitative databases provides a powerful approach for the systematic representation of known variations in quantitative values, and links the differences to distinct molecular species.

Summary

SigPath is a prototype for an IMS that intimately connects qualitative information stored in biological databases with the quantitative data required for biochemical modeling approaches. This system is being developed under the GNU General Public License to encourage the participation of the research community in extending the system (for instance, to support types of interactions other than those supported by the current system, or to support new modeling tools or exchange formats).

Equipment

Internet Access

Note: No special equipment is needed to access the SigPath IMS through the Internet. All the software that is needed to query SigPath and to view interactions and pathways is installed on the server side.

Macromedia Flash Player

Note: To view the animated tutorials here at STKE or at the SigPath site, this free browser plug-in is required. If your browser does not have a flash player plug-in, you can download it from the Macromedia Web site (http://www.macromedia.com/downloads/).

Instructions

Obtaining a Username and Password

Guest users can browse all the information in SigPath. However, to contribute data to SigPath, users must login for access to restricted parts of SigPath. To get an account, follow the steps below. Registration is free.

1. Go to http://icb.med.cornell.edu/services/sp-prod/sigpath/ and click on the top left link, "Login."

2. Click on the "Register here" link.

3. Enter all the fields required (indicated by an asterisk) and click the "Submit User Data" button.

Note: Required fields include your name, e-mail address, and preferred username and password.

4. On the affiliation page, enter all the fields required (indicated by an asterisk) and click the "Submit Affiliation" button.

5. From the SigPath main menu page (http://icb.med.cornell.edu/services/sp-prod/sigpath/), login by clicking on the "Login" link in the top left and entering your username and password.

Submitting Data to SigPath

Information can be submitted to the SigPath database through a Web browser. SigPath accepts both qualitative and quantitative types of information. As discussed above, SigPath relies on background information to facilitate both types of submission to the IMS. SigPath supports the following submission approaches: (i) BioWizards, which are online forms for the entry of binding, phosphorylation, and dephosphorylation data; (ii) XML submissions, which are suitable for importing many reactions simultaneously into SigPath; and (iii) paper-based submissions, which are forms that can be completed manually and then e-mailed or faxed to SigPath. These three complementary approaches are intended to support data submitters with different backgrounds. Each type of submission tool offered by SigPath facilitates the linking of molecules to existing biological databases, often in a manner transparent to the contributor. For instance, the BioWizard relies on background information obtained from SwissProt and never prompts the contributor to enter cross-references to other databases. Instead, the contributor searches the background information to locate proteins that already have cross-references defined. This approach saves the contributor the tedious task of defining names or accession codes for each protein in a reaction, and avoids errors and omissions.

Contributing data with BioWizards

BioWizards provide a user interface for the submission of binding, phosphorylation, and dephosphorylation reactions. The initial implementation focused on these three types of biochemical interactions, but the approach can be extended to most other types of interactions. Each BioWizard supports the submission of one type of information. For instance, when starting the submission of a binding interaction, the user is prompted for the identity of the molecules that bind to form a complex. When these molecules have been identified, the BioWizard transparently creates an explicit instance of the complex. As shown in the example below, BioWizards have two main roles: guides and translators. As a guide, a BioWizard is designed for contributors with limited data management experience, and it interacts with the contributors using biological language designed to guide them through the different steps of a submission. As a translator, BioWizards translate information expressed in the language of the biologist into a form compatible with the SigPath ontology. For example, in their role as translators, BioWizards are responsible for creating instances of modified proteins, used in the SigPath ontology to represent phosphorylated forms of proteins. Two animated tutorials, called "viewlets," guide the contributor through the process of submitting examples of phosphorylation and binding reactions to SigPath. These are available from the "Tutorials" link (http://stke.sciencemag.org/cgi/content/full/sigtrans;2004/248/pl11/DC1) or the SigPath Web site (SigPath Web site and http://chagall.med.cornell.edu/sp-cd/viewlets/wizbinding.viewlet/wizbinding_viewlet_swf.html).

An example of how to submit a phosphorylation reaction is described below. The process is similar for entering the other types of reactions.

1. Go to the SigPath main menu at http://icb.med.cornell.edu/services/sp-prod/sigpath/ and login.

2. Choose enter "Submit Data via a Biowizard" from the User Tasks.

3. Select the type of interaction, in this example "Phosphorylation," from the drop-down menu. Click the "Submit" button.

Note: The ATP and ADP molecules that are required for the phosphorylation reaction are automatically entered into the table of reaction components.

4. Select an enzyme—for example, the human GTP-binding protein REM—by typing "rem" into the box, choosing "Proteins" from the drop-down menu, and choosing "Human" as the source organism. Press "Search."

5. Choose the correct protein by selecting the radio button associated with it and click on "Choose as Reaction Enzyme."

Note: The protein is added to the right side of the reaction as the kinase.

6. When prompted by the BioWizard, search for the substrate. For example, type "calmodulin" into the text entry box, choose "Proteins" and "Human" as above, and click "Search."

7. Select the radio button beside the human calmodulin protein and click "Set as Reaction Substrate."

Note: The calmodulin protein is added to the left side of the reaction, and the phosphorylated protein is explicitly added to the right side of the reaction as the product.

8. Enter any known quantitative parameters of the reaction and any scientific articles that support this reaction, and click the "Submit Reaction Parameters."

Note: This information is not required, but at the least one citation should be entered to ensure that the data set continues to be valuable to other users.

9. Review the confirmation page with a summary of the information and click "Save Reaction" to save the information into the SigPath database.

Note: If the information is not correct, the user should press the back button to return to an earlier step, or quit and restart the BioWizard by clicking on the SigPath logo to return to the main menu.

10. Review the summary page of the reaction just entered and note the SigPath identification number (spid) of the reaction so that it can be easily identified by searching at a later time.

Contributing data in XML

XML submissions are useful for curators and for contributors who want to submit many reactions simultaneously. Submissions must be formatted according to the SigPath schema (http://icb.med.cornell.edu/crt/SigPath/features/xmldetails.xml), which is compliant with the W3C XML Schema [see (http://www.w3.org/XML/Schema)]. The SigPath XML Schema explicitly defines the pieces of information about a reaction that must be present in a submission for it to be considered valid for entry into SigPath. All entities in SigPath can be imported from and exported to XML. Entities defined in SigPath that have no equivalent in other biological databases are attributed a unique spid. XML submissions can be prepared with text editors or specialized XML editors. We have implemented data validation and context-sensitive highlighting of errors to help contributors diagnose errors in XML formatting or data structure at submission time. A viewlet detailing an XML submission can be found at the "Tutorials" link (http://stke.sciencemag.org/cgi/content/full/sigtrans;2004/248/pl11/DC1) or the SigPath Web site (http://chagall.med.cornell.edu/sp-cd/viewlets/xml.viewlet/xml_viewlet_swf.html).

1. Format the information according to the XML information exchange schema (http://icb.med.cornell.edu/crt/SigPath/features/xmldetails.xml).

2. Go to the SigPath main menu at http://icb.med.cornell.edu/services/sp-prod/sigpath/ and login.

3. Choose "Submit Data via XML Upload" from the User Tasks.

4. Browse for the XML file to upload and click "Submit."

5. Resolve any errors in the XML file, which are highlighted in yellow along with a description of the error.

Note: These errors must be resolved before continuing. To correct errors, edit the file that was originally uploaded and continue from step 2. If there are no errors in the XML file, a page showing a summary of the information will be displayed.

6. To import the information into SigPath, click on the "Confirm Data Submission" button.

Contributing data in the form of paper-based submissions

Information can be contributed to SigPath using forms that can be filled in by the contributor and then sent to SigPath.

1. Go to http://icb.med.cornell.edu/crt/SigPath/submit.xml

2. Choose the appropriate form for the type of reaction being submitted: reactions that include only qualitative information, or reactions that include quantitative information.

3. Fill in the data.

4. Send the completed form by e-mail to icb{at}med.cornell.edu or by fax to (212) 327-7344.

Assembling Models in SigPath

SigPath supports the creation of single-compartment models. Such models assume that the modeled system can be represented as one well-stirred compartment where molecules can react. Future versions of SigPath that will allow multicompartment models are being developed. The model assembly process assumes that the reactions needed to construct a model have already been submitted to SigPath and walks the user through the steps needed to select reactions and initial concentrations. A demonstration in the form of a viewlet is available from the Tutorials link (http://stke.sciencemag.org/cgi/content/full/sigtrans;2004/248/pl11/DC1) or from the SigPath Web site (http://chagall.med.cornell.edu/sp-cd/viewlets/modelcre.viewlet/modelcre_viewlet_swf.html). Once assembled, the model is represented in SigPath both as a list of reactions and as a diagram in multiple formats. In addition, the XML version of the model is also created by SigPath.

1. Go to the SigPath main menu at http://icb.med.cornell.edu/services/sp-prod/sigpath/ and login.

2. Choose "Assemble a Model" from User Tasks.

3. Select the reactions to include in the model. For example, choose "Enzymatic Reactions," enter "MAPK" into the text box, choose "Proteins," and then click "Search."

4. From the resulting list of reactions, select the one you want.

Note: Some reactions cannot be selected. The reactions that cannot be selected do not contain enough quantitative information to be included in a quantitative model. For instance, the user can select only basic reactions if they include the forward rate (Kf) and backward rate (Kb) of the reaction. Similar constraints apply to enzymatic reactions (consult the information that displays on the Web page when completing step 4 to see what parameters are required to enable the selection of an enzymatic reaction).

5. If necessary, edit the reaction to include necessary quantitative data by clicking the "View" link and then choosing the "Edit" link. Follow the BioWizard to enter the reaction parameters and save the reaction.

6. Once a reaction has been selected, click the "Add Reaction to Model" button.

7. Select the initial concentrations of the components of the reaction from the drop-down menu and click on the "Submit Concentrations" button.

Note: If "Model Computed" is selected, the model will set the initial concentration to zero and derive the transient concentration for the component from the reactions in the model. If none of the initial concentrations in the drop-down menu are suitable, click on the "Add New Concentration" link. If the concentration of a component should remain constant, mark it as "buffered" by checking the check box.

8. Select additional reactions by repeating steps 3 through 7.

9. Once all the reactions for the model have been selected, click the "Next" button.

10. Enter a name and description for the model and the target units, and click the "Submit" button.

11. Check that the summary of the model is correct and click "Save Model" to save this model in the SigPath database.

Note: If the information is not correct, click the back button to reach the appropriate edit BioWizard step, or quit and resume the BioWizard by clicking on the SigPath logo to return to the main menu.

12. Go to the "View All Models" link on the main page (http://icb.med.cornell.edu/services/sp-prod/sigpath/mainMenu.action) to view the model.

Exporting Models in SigPath

When a model has been assembled and saved in SigPath, it is available for all users to export in a format compatible with several modeling environments. Currently, SigPath supports the JSim (7), Kinetikit/Genesis v10.0 (8), and SBML formats (Level 1 v1 and Level 2 v1). Exporting a model consists of translating the model, expressed in the SigPath ontology, to the required input format for a given modeling environment. For JSim, exporting consists of generating ordinary differential equations that describe the derivative of the change in concentrations of the molecules with respect to time. The Kinetikit/Genesis export describes how molecules are exchanged between pools of molecules during reactions. Although the formats of the input to JSim and Kinetikit/Genesis are completely different, this is automatically handled by SigPath and users do not need to perform any translations or conversions. Users select a target environment and press "Export," and SigPath creates the appropriate version of the model.

Basic reactions included in the model are assumed to follow mass action law. Values of concentrations and rates are converted to unifying units to produce models that measure concentrations in a unique "target unit." The target unit is used when exporting models to a modeling tool. For instance, when the "target unit" is set to microMolar (μM), all concentrations in the model will be expressed in μM, and the plots generated in a modeling tool will be shown in this unit (this is useful when a modeling tool does not support multiple concentration units). Enzymatic reactions are converted to two basic reactions assuming a Michaelis-Menten mechanism with an irreversible product formation step. The two reactions that result from the conversion are then treated as described for basic reactions. Viewlets describing model export in different formats are available from the Tutorials link (http://stke.sciencemag.org/cgi/content/full/sigtrans;2004/248/pl11/DC1) or from the SigPath Web site (http://icb.med.cornell.edu/crt/SigPath/features/ExportModel.xml).

1. Go to the SigPath main menu at http://icb.med.cornell.edu/services/sp-prod/sigpath/ and login.

2. If necessary, create a model as described in the "Assembling Models in SigPath" section.

3. Choose "View All Models" from User Tasks.

4. Select the model you want to export. Click on the "Export to ..." button for the desired modeling environment.

5. Save the model file to your computer.

Related Techniques

The properties and capabilities offered by the SigPath system are best illustrated by comparison to other currently implemented bioinformatics tools.

Consensual Knowledge Bases vs. Information Management Systems

Consensual knowledge bases (CKBs) (9) offer a consensus of a subset of biological knowledge at a given time. An important step in constructing a CKB is thus to resolve the conflicts that may exist in the information that the CKB stores. How conflicts are resolved will determine the quality of the consensus data stored in a given CKB. However, functional data are often context-dependent, and quantitative experimental data, which seems to be conflicting, may be reconciled with appropriate specification of context. Conflicting information can play an important role in the scientific discovery process by adequately describing complex systems, especially in a rapidly developing field, such as the study of signaling networks.

SigPath is not a CKB and thus allows the submission of potentially conflicting information. Examples would include (i) the same reaction with different forward and reverse rates, (ii) different measured concentrations for the same molecule in the same tissue, and (iii) storage of the same overall process through different elementary steps. At this time, SigPath does not offer automated mechanisms to support the resolution of such conflicts, although the comments field can be used to textually specify context. To keep the information meaningful and most useful for all users, regardless of potential conflicts, each element of information should be supported by (and linked to) at least one primary publication. This representation of ambiguities should be very useful in developing functional analyses of complex systems. Discrepancies in results are sometimes obtained because of differences in the context of the measurements. It is not currently possible to define these contexts in detailed quantitative terms. For instance, in the sample Gs pathway model, the rates of Gαs activation under different cellular conditions could be different because of varying concentrations of the long and short isoforms, which have different rates of GDP release (10). Such variations in measured rates of activation could be subsequently clarified when the concentrations of the two Gαs isoforms are measured under various physiological conditions. Hence, the recording of the potentially conflicting data may promote new experiments to better define these contextual differences. When fully developed, SigPath could function as a stimulus for the quantitative experiments that better define the various cellular contexts.

Other Approaches to the Management of Quantitative Information: SBML and CellML

SBML (http://www.sbw-sbml.org) and CellML (http://www.cellml.org) are being developed as XML-based formats to facilitate the exchange of models across modeling tools [see also (11) about SBML]. SBML (level 1) is often viewed as a more concise format than CellML and addresses a real need of the modeling community as a lingua franca for exchanging quantitative models. Hence, an increasing number of tools offer support for SBML. SigPath can export models in SBML level 1 and level 2 and is thus compatible with modeling environments that can import these formats. Information in SBML files cannot be imported into SigPath at this time because models available in SBML format usually do not link molecules to their genomic context. Both SBML and CellML can also be used to construct repositories of models, and thus can support approaches to manage quantitative information. However, because these approaches are based on a specific file format, the management of information (for example, querying for individual interactions, reusing interactions from one model to assemble another model, merging models, and reviewing data) is left to tools (or users) that directly manipulate the content of the files. SigPath is not constrained in this way, and the information stored in SigPath can be searched and reused, as well as converted into multiple formats.

Comparison of SigPath to Quantitative Databases

The Database of Quantitative Cellular Signaling [DOQCS (http://doqcs.ncbs.res.in/)] (12) provides libraries of quantitative models available for download in several formats (Matlab, Kinetikit). This resource provides quantitative data and pathway diagrams, and has often been used as a reference for the initial submission of quantitative data to SigPath. In contrast to SigPath, however, DOQCS models do not link molecules to their genomic context, and lack data management features (for instance, it is not possible for users to submit new data directly into DOCQS). The BRENDA database (http://www.brenda.uni-koeln.de/) offers kinetic data about enzymatic reactions (13), but does not provide mechanisms to produce models from the data. ProcessDB (14), and Monod (15) are new tools under development that share several of the goals of the SigPath project.

Systems for Qualitative Information

The SigPath ontology is based on the ontology of EcoCyc, a tool that allows the management of metabolic pathways (16). Of the important features shared by SigPath and EcoCyc, the most important is that both systems can compute with structured graphs of molecules and interactions, to support, for example, visualization and structured queries (5, 17). In contrast to EcoCyc, however, SigPath supports each step of the process required to manage quantitative information and produce quantitative models. SigPath allows management of the information through the Web, whereas EcoCyc requires a local installation. The STKE Connections Map (http://stke.sciencemag.org/cm/) is another example of a signaling pathway database that allows the management of information but does not support quantitative information. A number of other pathway databases are also available: a metabolic pathway database, KEGG (18); and the signaling databases CSNDB (19) and TRANSPATH (20). These systems do not offer information management capabilities and do not support quantitative information or the creation of quantitative models.

Several databases have been developed to store data from high-throughput protein-protein interaction experiments (yeast two-hybrid and mass-spectrometry, among others). Major examples include the Biomolecular Interaction Network Database (BIND) (21), the Database of Interacting Proteins (DIP) (22), and the yeast database YPD (23). Some of these, such as BIND, make it possible for users to submit data directly to the database. Protein-protein interaction databases offer limited support beyond the storage of binding interactions. Enzymatic reactions and second-order spontaneous reactions, for instance, cannot be stored explicitly. Because the information is qualitative, such databases are not useful for quantitative modeling.

Notes and Remarks

In this Protocol, we have primarily described the Web interface to SigPath. However, SigPath may also be installed locally so that data can be kept private until publication. The early choice of implementing SigPath in Java makes the system portable on various platforms (from laptops to servers). Detailed instructions on how to install and deploy SigPath are available (http://www.sigpath.org, "download" menu item). We recommend that readers interested in setting up a local installation of SigPath contact us for assistance. The source code of the latest stable release of SigPath is available at the same URL. SigPath has been tested successfully with versions 1.3.1 and 1.4 of the Java Virtual Machines. Details about the implementation and software are provided below.

The Web front-end of SigPath successfully deploys in Servlet 2.3+ compliant application servers. Development and deployment were done in Tomcat. Version 5.0.18 of Tomcat has been tested successfully. SigPath was built with the Struts framework (http://jakarta.apache.org/struts/) and with Java Server Pages for the presentation layer (http://java.sun.com/products/jsp/).

Persistent storage is implemented with the Java Data Object (JDO 1.1) API (24). Because SigPath is JDO compliant, it can be deployed with various database backends. We use FastObjects 9 for development and production use, but have also tested deployment on relational databases with the Solarmetric Kodo implementation (http://www.solarmetric.com).

XML import and export is done with the Castor XML data-binding framework (http://castor.exolab.org/). The XML schema used for generating the castor XML marshaler and unmarshalers is the SigPath data exchange schema (http://icb.med.cornell.edu/crt/SigPath/features/xmldetails.xml).

Graphical representations of models and pathways are built with the TomSawyer Layout Toolkit or with the YFiles API. These libraries are optional and should be configured before compilation to produce diagrams of models.

The source code of the SigPath system is released under the GNU General Public License, and can be downloaded from http://icb.med.cornell.edu/crt/SigPath/download.xml.

Viewlets, which are the animated tutorials, were created using software available from Qarbon.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
View Abstract

Stay Connected to Science Signaling

Navigate This Article