Molecular Interactions

MIF 1.0.0 Specification

HUPO Proteomics Standards Initiative Protein Interaction Specification Documentation

Proteomics Standards Initiative

 

Molecular Interaction XML Format Documentation

 

Version 1.0

Introduction

The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification.

The Proteomics Standards Initiative was founded at the HUPO meeting in Washington, April 28-29, 2002 (see Science296, 827). As a first step, the PSI is developing standards for two key areas of proteomics: mass spectrometry and protein-protein interaction data.

This document describes the molecular interaction data exchange format. PSI is following a leveled approach to building this specification. Level 1 will describe protein interactions at a basic level that covers a large amount of currently available data. Subsequent levels will add capability to represent new molecular interaction information that the community wishes to exchange.

The scope of PSI MI is currently limited to protein-protein interactions. Other molecules, such as small molecules, DNA and RNA maybe taken into account in the future.

PSI MI was designed by a group of people including representatives from database providers and users in both academia and industry. PSI MI is supported by the DIP, MINT, IntAct, BIND and HPRD databases.

Purpose of the PSI MI XML format

The PSI MI format is a data exchange format for protein-protein interactions. It is not a proposed database structure. Intended usages are described by the use cases documentation. These use case descriptions also provide hints for future tools to be developed.

Purpose of this document

The purpose of this document is to describe the general structure of the PSI MI XML specification in a more user-friendly manner than the specification does itself. For the detailed and most up-to-date description please see the auto-generated documentation. This documentation also provides additional information, e.g. sample data files and use case descriptions.

Structure of a PSI MI record

The root element of a PSI MI XML file is the entrySet. An entrySet contains one or more entries. Each entry is a self-contained unit. This allows to easily concatenate the contents of multiple files into a single file by simply adding all the entries into the entrySet.


Figure 1: The entry top level element

Each entry describes one or more protein interactions. The PSI MI format can be used in two forms, a compact and an expanded form. In the compact form, all interactors (proteins), experiments, and availability statements are described once in the respective list elements, and then only referred to by references from the individual interactions in the interactionList. The compact form allows a dense, non-repetitive representation of the data, in particular for large data sets.
In the expanded form, all proteins, experiments, and availability statements are described directly in the interaction element. As a result, each interaction is a self-contained element providing all necessary information. The expanded form results in larger files, but is more suitable for conversion to displayed data, e.g. HTML pages. The PSI MI consortium provides tools to convert the compact into the expanded form and back.

In the next sections, the top level elements shown in Figure 1 and their function will be described.

The source element describes the source of the entry, usually the organisation which provides it. It also contains a release (number) and a releaseDate.

The availabilityList provides statements on the availability of the data, usually copyright statements. In the current version, the availability statements are free text. The PSI MI format might later be extended to provide predefined availability statements.

The experimentList contains experimentDescriptions. Each experimentDescription describes one set of experimental parameters, usually associated with a single publication. In large-scale experiments, normally only one parameter is varied across a series of experiments, usually the bait. The PSI MI format describes the constant parameters, e.g. experimental techniques, in an experimentDescription, while the variable parameters, e.g. the bait, are described in the interaction element.

The interactorList describes a set of interactors participating in an interaction. In the current version of the PSI MI standard, interactors are proteins. It is planned to extend this to other types, for example small molecules, in future versions. The interactor element describes the "normal" form of a protein, consisting of the "administrative" data like name and cross references, and organism and amino acid sequence. Attributes which are relevant for a specific interaction, in particular sequence features, are described in the participant element within an interaction.


Figure 2: Interaction element

The interactionList contains one or more interaction elements. Each interaction contains a description of the data availability(copyright), and a description of the experimental conditions under which it has been determined. Both of these can either be integrated into the interaction element (expanded form) or refer to the respective lists in the entry (compact form) as described above.

participant element
Figure 3: Participant element

Each interaction contains two or more participants, the molecules participating in the interaction. Each participant element contains a description of the molecule in its "normal" form, either by reference to an element of the interactorList, or directly in an interactor element.
Additional elements of the participant element describe the specific form of the molecule in which it participated in the interaction. The featureList describes sequence features of the protein, for example binding domains relevant for the interaction. The role describes the particular role of the protein in the experiment, usually whether the protein was a bait or prey.

The attributeLists are placeholders for semi-structured additional data the data provider might want to transmit. They contain simple tag-value pairs.

Detailed Documentation

see http://psidev.sourceforge.net/mi/xml/doc/MIF.html

Use of external controlledvocabularies

Where possible, external controlled vocabularies are referenced from PSI MI. External controlled vocabularies are used in two forms:

  • Open controlledvocabularies: We think that no existing controlled vocabulary provides all necessary terms for the given attribute in the PSI MI format. In this case, it is up to the data provider to choose a controlledvocabulary, or to provide a free text string if no appropriate controlled vocabulary exists.
  • Closed controlledvocabularies: We think that there is a controlled vocabulary which appropriately covers all necessary terms for the given attribute. In this case, only terms from the defined vocabulary should be used.

The following closed controlled vocabularies are referenced by PSI MI:

  • interaction type
  • sequence feature type
  • feature detection
  • participant detection
  • interaction detection

These CVs are grouped together in one pair of *.dag (hierarchy) and*.def (definitions) files in GeneOntology flat file format. (allfiles, GO format: psi-mi.dag,psi-mi.def, HTML version of GOformat).
The correctness of references to external controlled vocabularies is currently not enforced by the PSI MI schema. It is the responsibility of the data provider to ensure that only existing terms at an up-to-date data source are referenced. The World Wide Web Consortium (W3C) has recently issued a new recommendation for referencing between XML documents. Once this recommendation is implemented by standard software, we will include it in the PSI MI schema.

List of planned features

Because we are following a leveled approach, we are interested in knowing what the community wishes to be included in the next level.

The following items have been tagged for inclusion in the next level:

  • Intramolecular interactions
  • Inclusion of other molecule types, e.g. DNA, RNA, small molecules

The latest list of features to discuss/include in the future can be found here:
http://sourceforge.net/tracker/?atid=511101&group_id=65472&func=browse

How to comment

If you would like to comment on this document, the PSI MI XMLspecification, please send a mail to:
psidev-mi-dev@lists.sourceforge.net?

Available data

Tools

PSI MI XML format is supported by a growing list of tools. Currently available are:

Data submission

The following databases currently accept submissions of PSI MI formatted interaction data:

Further information and relevant links

Databases involved:

Companies involved:

Related Efforts:


 

Tags: 

HUPO PSI-PAR: standard format for protein affinity reagents

List of Contents

  1. News
  2. Introduction to the PSI-PAR format
  3. Purpose of this web page
  4. MIAPAR (Minimum Information About a Protein Affinity Reagent)
  5. XML schema
  6. Controlled Vocabularies
  7. User manual
  8. Example data in PSI-PAR
  9. Software and tools
  10. Meetings
  11. Release schedule
  12. List of planned features
  13. Links
  14. Contact

 

News

  • The PSI-PAR is available in Pubmed

Report: A community standard format for the representation of protein affinity reagents.
Gloriam DE, Orchard S, Bertinetti D, Bjorling E, Bongcam-Rudloff E, Bourbeillon J, Bradbury AR, de Daruvar A, Dubel S, Frank R, Gibson TJ, Haslam N, Herberg FW, Hiltke T, Hoheisel JD, Kerrien S, Koegl M, Konthur Z, Korn B, Landegren U, van der Maarel S, Montecchi-Palazzi L, Palcy S, Rodriguez H, Schweinsberg S, Sievert V, Stoevesandt O, Taussig MJ, Uhlen M, Wingren C, Gold L, Woollard P, Sherman DJ, Hermjakob H.
Mol Cell Proteomics. 2009 Aug 14.
  • The PSI-PAR format has been approved by PSI after a document review

  • The manuscript describing MIAPAR was published by Nature Methods in 2010

 

Introduction to the PSI-PAR format

The work on PSI-PAR was initiated as part of the ProteomeBinders project and carried out by EMBL-EBI and the PSI-MI work group. The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. For detailed information on all PSI activities, please see PSI Home Page.

The PSI-PAR format is a standardized means of representing protein affinity reagent data and is designed to facilitate the exchange of information between different databases and/or LIMS systems. PSI-PAR is not a proposed database structure. The PSI-PAR format consists of the PSI-MI XML2.5 schema (originally designed for molecular interactions) and the PSI-PAR controlled vocabulary. In addition, PSI-PAR documentation and examples are available on this web page. The scope of PSI-PAR is PAR and target protein production and characterization.

 

Purpose of this web page

  • Help database and LIMS system administrators get started with PSI-PAR for import and/ors export of their data. To this end we recommend reading the manuscript, example files, user manual, schema documentation and presentations from the PSI-PAR training day .
  • Aid experimentalists by providing guidelines in form of MIAPAR and information about the benefit of a community standard (PSI-PAR)

 

Minimum Information about a Protein Affinity Reagent (MIAPAR)

The Minimum Information about a Protein Affinity Reagent MIAPAR is available for public comment. MIAPAR is intended to be used as a guideline for experimentalists who wish to unambiguously describe protein affinity reagents and their protein targets. Please send any feedback to orchard@ebi.ac.uk.

 

XML schema

The XML schema is used to standardize the structure of the data representation. The PSI-PAR format utilizes the PSI-MI XML2.5 schema with no changes to the structure as this is the only way to ensure compatibility with the existing software and tools. The use of the PSI-MI XML2.5 schema, which was developed for molecular interaction data, is motivated by the fact that the binding of PARs to protein targets is a type of molecular interaction.

However, the PAR data has introduced information that has not been captured in the PSI-MI XML2.5 schema previously such as the production of PARs and target proteins and annotation of experimental materials and experimental control reagents. To better describe how the PSI-MI XML2.5 schema is used for the representation of PAR data see we have modified the schema element descriptions ("annotations") and made this new schema documentation available for download. It is also recommended to read the user manual and examples.

 

Controlled Vocabularies

To standardize the semantics of data representation, i.e. ensuring common terminology, controlled vocabularies (CVs) are used to populate the elements of the schema. Each CV outlines a list of terms with a standardized name, a definition and one or more aliases. The PSI-PAR CV contains the majority of the terms from the PSI-MI CV and in addition approximately 200 new terms. The PSI-PAR CV can be browsed on the ontology lookup service (OLS) or downloaded in obo format. Apart from the PSI-PAR CV the PSI-MI XML2.5 schema utilizes a number of external CVs/ontologies that can also be found on the OLS including the Gene Ontology, NCB I taxonomy ontology, BioSapiens Annotations and Unit Ontology. A table showing which CV to use for which schema element is available here.

 

User manual

The user manual does not require previous knowledge and systematically describes the use of the PSI-MI XML2.5 schema elements for the representation of PAR data. It is intended to aid database and LIMS system administrators that wish to use the PSI-PAR format to export and/or import protein affinity reagent data.

 

Example data represented in PSI-PAR

This section provides examples of data from published articles that have been captured in the PSI-PAR format. The examples (below) are available as both PSI-MI XML2.5 files and HTML files. Overviews of the examples are also supplied in the manuscript and in this spreadsheet.

  • Characterization of monoclonal antibodies to human group B rotavirus and their use in an antigen detection enzyme-linked immunosorbent assay.
  • Burns JW, Welch SK, Nakata S, Estes MK
  • J Clin Microbiol 1989, 27(2):245-250.
  • Download     XML     HTML
  • A proteomics-based approach for monoclonal antibody characterization.
  • Weiler T, Sauder P, Cheng K, Ens W, Standing K, Wilkins JAs
  • Anal Biochem 2003, 321(2):217-225
  • Download:    XML     HTML
  • A designed ankyrin repeat protein evolved to picomolar affinity to Her2.
  • Zahnd C, Wyler E, Schwenk JM, Steiner D, Lawrence MC, McKern NM, Pecorari F, Ward CW, Joos TO, Pluckthun A
  • J Mol Biol 2007, 369(4):1015-1028
  • Download:    XML     HTML

 

Software and tools

A number of tools have been developed for the PSI-MI XML2.5 schema and below is a summary of the relevant ones to PSI-PAR. A complete description can be found on the PSI-MI web site.

  • Validating tools have been developed that can interface with ontologies/CVs and perform semantic validation on PSI-MI XML2.5 files. Validation can be carried out by (1) a Java API that enables the embedding of the validator into any third party application, (2) a command line interface and (3) a web application that allows the uploading of a PSI-MI data file and reporting of both syntactic and semantic discrepancies.
  • The Ontology Lookup Service (OLS) is an ontology viewer with browsing and search functionality. It comprises the PSI-PAR CV and a number of additional CVs that are used in conjugation with the PSI-MI XML2.5 schema such as the Gene Ontology, BioSapiens Annotations and Unit Ontology.
  • A Java XML parser has been developed that allow for import and export of PSI-MI XML2.5 files to and from databases. It comprises a Java library and may also be used to develop any type of software reading and/or writing PSI-MI XML2.5 data.
  • XML stylesheets(XLSTs) are available that can convert PSI-MI XML2.5 data files to HTML, thus providing user friendly human readable representation.
  • Finally, a complete, open source database implementation providing reading, writing, and interactive editing of data in PSI-MI XML 2.5 schema, exists, the IntAct molecular interaction database.

 

Meetings

 

Release schedule

The PSI-PAR format underwent document review by PSI. This process resulted in the first stable version of PSI-PAR, pulished in 2009 (PMID:19674966), publication of MIAPAR is planned 2009/10. The next version, 3.0, of the PSI-MI XML format is planned to be released between 2010 and 2012 and will be backwards compatible with version 2.5. For further information, see the release schedule for the PSI-MI XML schema.

 

List of planned features

Because we are following a levelled approach, we are interested in knowing what the community wishes to be included in the next level. New suggestions can be added on the PSI-MI tracker, where also existing suggestions can be viewed.

 

Links to further information

 

Contact

PSI-PAR master; email

 

Tags: 

The Minimum Information About a Bioactive Entity (MIABE)

Introduction to MIABE

The work on MIABE was initiated as part of the EMBL-EBI Industry Programme and carried out by EMBL-EBI and the PSI-MI work group. The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. For detailed information on all PSI activities, please see PSI Home Page. Drug-target interactions may be regarded as a form of molecular interactions and as such, the Molecular Interaction workgroup has become involved in this project.

The PSI-MI format is a standardized means of representing molecular interaction data and is designed to facilitate the exchange of information between different databases and/or LIMS systems. PSI-MI is not a proposed database structure. The PSI-MI format consists of the PSI-MI XML2.5 schema and the accompanying controlled vocabulary. The controlled vocabulary has been updated with many terms appropriate for the description of drugs, drug targets and the interactions they make.

The Minimum Information about a Bioactive Entity is available for public comment. MIABE is intended to be used as a guideline which should be consulted prior to the publication of data describing small molecules and their interactions with one or more target molecules.

Why do we need publication standards?

  • Database representation of pharma/bioceutical data is increasingly seen as essential for target validation/new lead identification
  • Data needs to be accurately and fully reported to enable curation/text-mining
  • Any report SHOULD allow repeat and/or reanalysis of published data – but frequently essential information is missing
  • Researchers perceived the need to define the Minimum information which should be included in a paper/deposition describing the properties or action of a bio(in)active molecule

Scope of MIABE

  • Entity types – small molecules, therapeutic proteins, peptides and antibodies, carbohydrates………
  • Drugs, herbicides, pesticides, nutraceuticals…..
    • Pre-clinical (or equivalent) data only – clinical data well regulated, documentation defined (though little published), would require separate document written by specialists
    • This is NOT a SOP/prescriptive list/attempt to dictate experimental procedure – it IS a simple checklist of information to include in one or more final publications

The MIABE Document 

The MIABE paper was published in August 2011 by Nature Reviews Drug Discovery

MIABE parent doc is attached here

Tags: 

Validator Tutorial: Download Validator's Tutorial Source Code

 

Home

Previous: Wiring It Together - Bringing All Components Together



Here are a few things you can download to get you started with the Validator:

  • The latest Validator framework can be downloaded from here.

  • The Simple Proteomics Experiment (SPE) sample project can be downloaded from here
    This archive contains 2 projects: the SPE data model and the SPE simple validator.

 

Tags: 

Validator Tutorial: Wiring It Together - Bringing All Components Together

 

Previous: Building Your Own Rules

Next: Download Validator's tutorial source code


 

Not that you have created your CV Mapping rules and/or your own object rules, the next logical step is to create your own validator.

Here is a graphical representation of the process of building a validator given the separate components:

 

As you can see in the above representation, in order to build your own validator, you will have to bring together your configuration files in order to define ontologies, cv mapping rules, and object rules (for which you also have to provide your rules). Once you have brought all of this together inside a project, you can create your own validator as follow :

 

 

In this code example, one can see that two methods have been written:

  • The constructor of the SPE Validator that essentially passes the 3 configuration files to the generic validator,
  • The validate method that takes an Experiment and run the cv mapping validation as well as the object rule validation. Any message generated in this process is stored into a collection and returned to the calling process.

Now that we have put everything together, it's time to run our validator on some data and display the result of this validation. Obviously, the aim of this tutorial is not to give a lecture on user interface or even how to write them in Java so we are going to aim at a simple, basic user interface that allows to print the result of our validation on the command line.

 

 

Here is what our little program output:

Validation run collected 3 message(s): 

ValidatorMessage{message='The result found at: /molecules/modifications/@id for which the values are ''BLA:0000X'' didn't match any of the 1 specified CV term:
- MOD:01157 (protein modification categorized by amino acid modified) or any of its children. The term can be repeated. The matching value has to be the identifier of the term, not its name.', level=WARN, context=Context(/molecules/modifications/@id ), rule=}

ValidatorMessage{message='The result found at: /molecules/type/@id for which the values are ''SPE:0328'' didn't match any of the 2 specified CV terms:
- The sole term SPE:0326 (protein) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.
- SPE:0318 (nucleic acid) or any of its children. A single instance of this term can be specified. The matching value has to be the identifier of the term, not its name.', level=ERROR, context=Context(/molecules/type/@id ), rule=}

ValidatorMessage{message='Experiment id:3 doesn't have a name.', level=WARN, context=null, rule=null}

 

 

 


Previous : Building Your Own Rules

Next: Download Validator's tutorial source code

 

Tags: 

Pages

Subscribe to RSS - Molecular Interactions