proBed Specification 1.0.0

proBed is one of the data standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.

The original BED format (Browser Extensive Data, https://genome.ucsc.edu/FAQ/FAQformat.html - format1), developed by the UCSC (University of California, Santa Cruz) team, is used to describe genome coordinate data across lines, for use on annotation tracks. In BED, data lines are defined as tab-separated plain text with 12 mandatory fields (columns). Of those, only the first three fields are required, and the other 9 are optional.

The proBed format builds upon this original structure by extending the 12 original BED fields to include a further 13 fields to describe information primarily on peptide-spectrum matches (PSMs). The format can also accommodate peptides (as groups of PSMs).

A manuscript describing this proBed format (together with the proBAM format) is available at Genome Biology.


  1. proBed 1.0.0 (Final Version): Specification document and example files
  2. proBed Tools and Implementations

proBed 1.0.0 (Final Version): Specification document and example files

The proBed file format is designed for storing and analyzing peptide spectrum matches (PSMs) within the context of the genome.

Direct links:

proBed Tools and Implementations

proBed example viewed in the Ensembl Genome Browser



mzTab Specification 1.0.0

mzTab is one of the standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.


  1. mzTab 1.0.0 (Final Version): Specification documents
  2. mzTab Tools and Implementations

mzTab 1.0.0 (Final Version): Specification documents

Submitted originally to the PSI document process on May 2012. Final version 1.0.0 accepted on June 2014.

More documentation is available in the mzTab Google code project at https://github.com/HUPO-PSI/mzTab

Direct links to deliverables:

mzTab Tools and Implementations

  • jmzTab: A Java API to read, write and merge mzTab files (link)
  • LipidDataAnalyzer: Tool to quantify lipids from LC-MS data (link)
  • OpenMS: Open-source software C++ library for LC/MS data management and analyses (link)
  • MSnBase: Bioconductor package. Basic plotting, data manipulation and processing of MS-based Proteomics data (link)
  • PRIDE Converter 2: A redesign of the PRIDE Converter tool, for performing data submissions to the PRIDE database (link)
  • MaxQuant: A quantitative proteomics software package designed for analyzing large mass-spectrometric data sets (link)
  • PIA: A toolbox for MS based protein inference and identification analysis (link),  PMID:25938255




Formal version 1.0 release (Specification 1.0.1)

Direct Links to current documents:

find Example Instance Documents HERE



The mzQuantML standard format is intended to store the systematic description of workflows quantifying molecules (principly peptides and proteins) by mass spectrometry. A large number of different software packages are available that produce output in a variety of different formats. It is intended that mzQuantML will provide a common format for the export of identification results from any software package. The format was originally developed under the name AnalysisXML as a format for several types of computational analyses performed over mass spectra in the proteomics context. It has been decided to split development into two formats: mzIdentML for peptide and protein identification and mzQuantML (described here), covering quantitative proteomic data derived from MS.

The development of mzQuantML is driven by some general principles, specific use cases and the goal of supporting specific techniques, as listed below. These were discussed and agreed at the development meeting in Tübingen in July 2011.


General principles, the format SHOULD support: 

  • Journal requirement for the reporting of quantitative proteomic data from mass spectrometry.
  • Reporting according to MIAPE-MSI (and the emerging MIAPE-Quant document).
  • Submission of quantitative data to public databases.
  • Data exchange between software tools, where data are defined as values about features (defined here as regions on MS1 mass spectra that report on a single peptide or small molecule), feature matches across different spectra or withing spectra, peptides, proteins and protein groups.
  • Import of data into statistical processing tools.
  • The ability to reprocess or recreate the analysis workflow using the same parameters, assuming no manual steps have taken place.


Use cases, the format SHOULD capture:

  • Final abundance values (relative or absolute) for peptides, proteins and protein groups where protein inference cannot be performed in an unambiguous manner.
  • Quantification values about peptide/protein modifications, such as post-translational modifications.
  • Abundance values at the level of a single run (called an assay in this context) and logical groupings of runs (called study variables in this context), which the user, for example, wishes to report relative values for.
  • The evidence trail for how final abundance values were calculated, such as the features used for quantifying peptides and proteins.
  • Relationships between features either on different regions of the same spectrum or on different spectra that report on the same peptide or small molecule. These are particularly required for relative quantification approaches.
  • Details about pre-fractionation sufficient to describe the combination of multiple input data files (e.g. raw files) into a single assay where this has been performed.


mzQuantML 1.0.1

The format extends support to SRM/MRM technique on the basis of version 1.0.0. The specification document of version 1.0.1 is in HERE

mzQuantML 1.0.0

More documentation is available in the mzQuantML Google code project at http://code.google.com/p/mzquantml/.

The format supports the following specific techniques used in proteomics (as shown in examples files):

  • MS1 label-free intensity
  • MS1 label-based e.g. SILAC and metabolic labelling such as 15N
  • MS2 tag-based e.g. iTRAQ / TMT
  • MS2 spectral counting

We expect that the format MAY also be able to cover the following techniques adequately, although these have not been tested in great detail at this stage, and we encourage further input from users of these techniques: 

  • Quantification by selected reaction monitoring (SRM)
  • Absolute quantification based on averaging the intensities of features e.g. Waters Hi3 technique
  • Small molecule quantification (in metabolomics)
  • MS2 intensity-based approaches
  • MS2 label-based approaches

Change log

The standard was submitted to the PSI document process in August 2011. The specifications have since been updated through version 1-rc2 and version 1-rc3 (current), with the release of version 1.0.0 in Feb 2013.

Major changes in versions (also see versioned schema documents on Google Code):

  • rc1 to rc2. Introduction of mapping rules/semantic rules for different techniques.
  • rc2 to rc3: Minor updates in responses to reviewer comments from journal review and fixes for cardinalities/internal references etc.
  • rc3 release to version1.0.0 release: no changes except update to version number.

The overall resource change log can also be consulted here: https://code.google.com/p/mzquantml/source/list




mzIdentML is one of the standards developed by the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.


  1. mzIdentML 1.2.0 (current release)
  2. mzIdentML 1.1.1
  3. mzIdentML 1.1.0: XML Schema, Documentation and Ontology
  4. mzIdentML Tools and Implementations
  5. mzIdentML 1.0.0 (Previous Version): Schema, documentation and ontology


mzIdentML 1.2.0 (Released March 2017 - current version of the standard)

In 2013-2017, PSI-PI has updated mzIdentML from version 1.1 to 1.2. The main update relates to improvement in the representation of protein grouping relationships, through the use of mandatory CV terms. Minor updates have also being proposed for capturing pre-fractionation of samples, de novo sequencing and the use of multiple search engines. Specifications have also been added for supporting proteogenomics and cross-linking MS.


mzIdentML 1.1.1: XML Schema, Documentation

Released in July 2015, as a minor update to version 1.1.0. This update should be viewed as a "bugfix" update only.
The only change is to ensure that mass deltas encoded in the format are consistently encoded as doubles and not as floats. As of March 2017, both mzIdentML 1.1.1 and 1.2 (see above) will be generally supported for some years, although we strongly encourage new implementers to work with mzIdentML 1.2.

This has resulted in a change to the schema (XSD) and the specification document only. All other resources are unchanged from version 1.1.0.


mzIdentML 1.1.0: XML Schema, Documentation and Ontology

Released in August 2011.

More documentation is available in the HUPO-PSI GitHub page at https://github.com/HUPO-PSI/mzIdentML.

Direct Links to deliverables:

  • Example Instance Documents:
    • Mascot MS MS example - a simple example of 4 ms-ms spectra searched against a protein database.
    • Mascot Nucleic Acid Example - an example of a search against an EST database
    • Mascot Top Down example - a single ms-ms spectra from a protein.
    • MPC Use case - use peptides from different search engines to assemble proteins with a third-party algorithm;
      false-discovery estimation using decoy database.
    • OMSSA - example MS-MS search results including decoy matches
    • PMF Example - example Peptide Mass Fingerprint search
    • Sequest -a simple example derived from a .out file
    • X! Tandem - example MS-MS search results including decoy matches



mzIdentML Tools and Implementations

Current status of tools that write and import mzIdentML are on this page.



mzIdentML 1.0.0 (Previous Version): Schema, documentation and ontology

 This was the first version of the mzIdentML format, released August 2009. mzIdentML 1.0.0 is NOW DEPRECATED - users should use mzIdentML 1.1.x or 1.2 versions.

mzIdentML was developed as an extension to the Functional Genomics Experiment (FuGE) object model. However, in a change agreed at the PSI Spring Meeting, 2008, the XML schema was developed directly rather than performing the design in UML and converting to XML. A cut-down version of the FuGE xsd has been developed to facilitate this. As a consequence, the UML class diagram in subversion is now out of date.



mzML 1.1.0 Specification


From 2005-2008 there existed two separate XML formats for encoding raw spectrometer output: mzData developed by the PSI and mzXML developed at the Seattle Proteome Center at the Institute for Systems Biology (ISB). It was recognized that the existence of two separate formats for essentially the same thing generated confusion and required extra programming effort. Therefore the PSI, with full participation by ISB, developed a new format called mzML by taking the best aspects of each of the precursor formats to form a single one. It is intended to replace the previous two formats, which are now deprecated, although still sometimes used by older software.

On 2008-06-01, mzML 1.0.0 was released. In early 2009, several implementation efforts identified a few minor shortcomings in mzML 1.0.0. Since no vendors had yet released software supporting mzML 1.0 yet, the working group decided to release an update in June 2009. It is expected that all software will support mzML 1.1 as the long-term-stable format instead of 1.0. Below is the available documentation for mzML 1.1.0 and related information. Please send feedback to psidev-ms-dev@lists.sourceforge.net.


mzML 1.1.0 was released on 2009-06-01 and has been stable every since. There were initial plans to update a new 1.2 release in 2017 to support ion mobility mass spectromy (IM-MS). However, as of 2017-11-03, it appears that support for IM-MS can be achieved without a schema change, with just some additional terms. Please contact psidev-ms-dev@lists.sourceforge.net for more information.

mzML Release Schedule

(updated 2017-11-03)

  • 2008-06-01 mzML 1.0.0 released
  • 2009-06-01 mzML 1.1.0 released
  • 2010-06-01 mzML index wrapper schema updated to 1.1.1
  • 2017-11      Minor updates to CV still occur, but no new schema changes are planned at this time


mzML 1.1.0 Finished Specification

(updated 2017-11-03)

The information and documents in this subsection are related to mzML 1.1.0, revised after going through the PSI document process on May 19, 2009. Everyone is encouraged to implement mzML 1.1.0. It is hoped that mzML 1.1.0 will remain stable for a long time.

NOTE: On 2010-06-01, the mzML index schema was updated from 1.1.0 to 1.1.1. There was no functional change, but rather the addition of an enumeration constraint to an attribute to prevent creative, unintended values. This could cause some files that previously validated to no longer validate. However, any such files should never have successfully validated in the first place.

XML schema definition files:

- mzML1.1.0.xsd (main schema)

- mzML1.1.1_idx.xsd (separate and optional index)

- Latest mapping file, which defines where certain controlled vocabulary terms may be used in a document.

- Latest version of the controlled vocabulary (CV) in OBO 1.2 format.  (OBO-Edit)

Documentation files:

- Full Specification Document: mzML1.1.0_specificationDocument.doc

- HTML schema documentation for mzML 1.1.0

- HTML schema documentation for mzML 1.1.0 index wrapper schema

Validation of mzML files

Although at one time there were on-line mzML validators, these have fallen into disrepair and are no longer functional.

You can download and run a local validator.

- The OpenMS validator can be installed locally by downloading and installing OpenMS.

- The Java-based validator can be downloaded from GitHub

Sample instance documents for all relevant formats:

All documents are meant to contain equivalent information in the various formats.

- tiny1.mzML1.1.0.mzML
- tiny1.mzData1.05.xml

- tiny1.mzXML2.0.mzXML
- tiny1.mzXML3.0.mzXML

Sample files generated by the ProteoWizard:

- small.RAW (a small Thermo RAW file with LTQ-FT data)

- small.pwiz.1.1.mzML (converted from small.RAW by msconvert)

- small_miape.pwiz.1.1.mzML (converted by msconvert, with example MIAPE fields added programatically)

- small_zlib.pwiz.1.1.mzML (converted by msconvert, with zlib compression and 32-bit precision)

Other sample files:

 - PDA example file (createdby Steffen Neumann)

- Sample files generated by the Proteios Software Environment

Other relevant websites:

- HUPO-PSI GitHub mzML

- General PSI guidelines for creating controlled vocabularies

 Current and future support for mzML:
(updated 2013-02-19)

Support comments
ProteoWizardUSCParag MallickFull mzML support today
TPPISBEric DeutschFull mzML support today (including embedded X!Tandem)
Insilicos ViewerInsilicosErik NilssonFull mzML support today
X!TandemGPMRon BeavisFull mzML support today
MyrimatchVanderbiltMatt ChambersFull mzML support today
InSilicoSpectroSIBAlex MasselotFull mzML support today
Proteios SEUniv LundFredrik LevanderFull mzML support today
NCBI C++ toolkitNCBIDouglas Slottaavailable in next release
OpenMS/TOPPUniv TübingenMarc SturmFull mzML support today
PhenyxGeneBioPierre-Alain BinzFull mzML support today
MascotMatrix ScienceDavid CreasyFull mzML support today
Mascot DistillerMatrix ScienceDavid CreasyFull mzML support today
jmzMLGhent/ EMBL-EBILennart MartensFull mzML support today
Conversion tool in Proteomics ToolboxThermo ScientificJim Shofstahlbeta testing
ReAdW (.RAW converter)ISBEric DeutschReplaced by ProteoWizard msconvert
mzWiff (.wiff converter)ISBEric DeutschReplaced by ProteoWizard msconvert
MassWolf (.raw/ converter)ISBEric DeutschReplaced by ProteoWizard msconvert
Trapper (Agilent data converter)ISBEric DeutschReplaced by ProteoWizard msconvert
mzML_ExporterABISean Seymourbeta testing
PEAKSBioinformatics Solutions IncKevin ZhangBeta Testing
PRIDE databaseEMBL-EBIJuan A. Vizcainoongoing
PRIDE InspectorEMBL-EBIJuan A. VizcainoFull mzML support today
MIAPE MS ExtractorProteoRedSalvador Martinez-BartolomeFull mzML support today
mzRBioconductorBernd Fischer, Steffen Neumann, Laurent GattoFull mzML support today
pymzMLUniv MünsterChristian FufezanFull mzML support today
CruxUniversity of WashingtonW. NobleFull mzML support



Released mzML 1.0.0 Specification

(updated 2009-02-10)

The information and documents below related to mzML 1.0.0, which is now obsolete. Do not use it.

Current xml schema definition files (.xsd):

- mzML1.0.0.xsd (main schema)

- mzML1.0.0_idx.xsd (separate and optional index)

Documentation files:

- Full Specification Document: mzML1.0.0_specificationDocument.doc

- HTML schema documentation for mzML 1.0.0

- HTML schema documentation for mzML 1.0.0 index wrapper schema

- ASMS June 2008 Poster (3MB PDF)


PSI-MI XML Specification


Proteomics Standards Initiative

Molecular Interaction XML Format Documentation

Version 2.5

Released 2005, Last maintenance update to version 2.5.4


Version 3.0

Available for use now, estimated formal release - Spring 2016


Table of Contents

  1. Introduction
  2. Purpose of the PSI-MI XML format
  3. Purpose of this document
  4. Directory structure
  5. Release schedule
  6. Changes from PSI-MI 1.0 to 2.5
  7. Maintenance releases
  8. Detailed Documentation
  9. Use of external controlled vocabularies
  10. List of planned features
  11. How to comment
  12. Available data
  13. Tools
  14. Data submission
  15. Further information and relevant links


The Proteomics Standards Initiative (PSI) aims to define community standards for data representation in proteomics to facilitate data comparison, exchange and verification. For detailed information on all PSI activities, please see PSI Home Page.

The PSI-MI interchange format and accompanying controlled vocabularies was originally designed by a consortium of molecular interaction data providers from both academia and industry, including BIND, DIP, IntAct, MINT, MIPS, GlaxoSmithKline, CellZome, Hybrigenics, Universities of Bielefeld, Bordeaux, Cambridge, and others. It is maintained, and kept fit for purpose by the Molecular Interaction workgroup of the HUPO PSI. Please contact us on psi-mi@ebi.ac.uk if you wish to become involved or have any questions.


Purpose of the PSI MI XML format

The PSI MI format is a data exchange format for molecular interactions. It is not a proposed database structure.

Purpose of this document

The purpose of this document is to describe the general structure of the PSI MI XML specification in a more user-friendly manner than the specification does itself. For the detailed and most up-to-date description of PSI-MI XML2.5 please see the  molecular interaction data exchange format, level 2.5. For the publication describing the development and use of this format, please look here. For documentation of the previous level 1.0 please see Version 1.0 Documentation. For level 3.0 please go to Version 3.0 documentation. This documentation will also provide additional information, e.g. sample data.
The XML schema is located at https://rawgit.com/MICommunity/psidev/master/psi/mi/rel25/doc/MIF254.html

Directory structure

This document is in the root directory of the PSI-MI XML2.5 release. Subdirectories are

doc/Auto-generated documentation of the PSI-MI XML schema

src/Source code for schema and related software

data/Controlled vocabularies

tools/Data management tools

Release schedule

  • Level 3.0 will be published in 2016.
  • Level 2.5 was released 5 December 2005. It is the format most commonly supported by PSI-compliant databases and tools. It will continue to be supported by the MI workgroup after the publication of PSI-MI XML3.0.
  • Level 1.0 support was discontinued in 2007.

Changes from PSI-MI XML1.0 to 2.5

Changes in the PSI-MI XML format and controlled vocabularies from version 1.0 to 2.5 are documented in this page.


PSI-MI XML2.5 Maintenance releases

  • 2.5.4
     Minor change to header
  • 2.5.3:
    Minor updates as a result of the PSI spring meeting in San Francisco, April 2006:
    • updated entrySet@minorVersion to 3
    • bioSourceType@taxId now mandatory. This was inadvertedly made non-mandatory with 2.5.2.
    • featuretype@id now mandatory. This was inadvertedly made non-mandatory with 2.5.2.
    • Optional attribute parameter/uncertainty added.
    • Added participant/parameterList and participant/attributeList to allow more complex modelling of participants.
    • Deleted XML constraints on entry level. They were not working due to syntax errors, and few XML validators can check them. This validation level will be performed by the PSI XML validator in the future.
  • 2.5.2:
    There were some inconsistencies in the naming of complex types in 2.5.1. These have been fixed. This has no impact on the XML data files. The only impact for users is a facilitation if they use code generators. Concrete changes:
    • complex type interactorType to interactorElementType
    • complex type interactionType to interactionElementType
    • complex type featureType to featureElementType
    • featureType@id has been moved into the complex type featureElementType and typed as xs:int
    • bioSourceType@ncbiTaxId has been moved into the complex type bioSourceType and typed as xs:int
    • updated entrySet@minorVersion to 2
  • 2.5.1:
    At the PSI meeting in Geneva, September 2005, it was discussed that participant/experimentalFormList/experimentalForm should have the possibility to assign a position, e.g. to describe an n-terminal protein modification in an experiment. It was decided to implement this using the existing featureType. The controlled vocabulary has been updated accordingly, but the change in the XML schema was not implemented. This required the maintenance release 2.5.1, with the following changes:

Detailed Documentation

see https://github.com/MICommunity/psimi/blob/wiki/PsimiXMLSpecifications.md

Use of external controlled vocabularies

Where possible, external controlled vocabularies are referenced from PSI MI. External controlled vocabularies are used in two forms:

  • Open controlled vocabularies: We think that no existing controlled vocabulary provides all necessary terms for the given attribute in the PSI MI format. In this case, it is up to the data provider to choose a controlled vocabulary, or to provide a free text string if no appropriate controlled vocabulary exists.
  • Closed controlled vocabularies: We think that there is a controlled vocabulary which appropriately covers all necessary terms for the given attribute. In this case, only terms from the defined vocabulary should be used.


The closed controlled vocabularies referenced by PSI MI are listed in the table below. All vocabularies are contained in a files in OBO flat file format: psi-mi25.obo. They can be browsed at the EBI OLS (Open Lookup Service). The correctness of references to external controlled vocabularies is currently not enforced by the PSI MI schema. It is the responsibility of the data provider to ensure that only existing terms at an up-to-date data source are referenced.

PSI MI XML schema elements and OBO major terms

PSI MI XML level 2.5 data element

term name

PSI-MI identifier


participant identification method



interaction detection method



interaction type



biological role
Example: enzyme



experimental preparation



experimental role
Example: bait



feature detection method



feature type


'featureType/featureRangeList/featureRange/baseLocationType/startStatus/'and '../endStatus/

feature range status



interactor type



database citation






alias type



attribute name



Obsolete terms

The OBO format has a special class “obsolete”, to which all obsolete PSI MI terms are assigned.


Mapping from OBO to MIF25 format

We recommend the following mapping from the file psi-mi25.obo to PSI MI 2.5 XML files:


OBO format element

PSI MI 2.5 XML file element











Because we are following a leveled approach, we are interested in knowing what the community wishes to be included in the next level. If you have a use case not covered by the current schema or controlled vocabulary terms, please contact us at psi-mi@ebi.ac.uk.


Data submission

We strongly support and encourage data deposition supporting published journal articles in public databases of the IMEx consortium. Please see the IMEx deposition page for contact details and deposition options.


Sandra Orchard, orchard@ebi.ac.uk, 11-DEC-2015


TraML 1.0.0 Specification

The HUPO PSI Mass Spectrometry Standards Working Group (MSS WG) has developed a specification for a standardized format for the exchange and transmission of transition lists for selected reaction monitoring (SRM) experiments. This specification has now completed rigorous review with the PSI document process and is complete. Please email the list psidev-ms-dev@lists.sourceforge.net with your questions, comments, and suggestions.

TraML Development Timeline

(updated 2013-04-22)

  • 2010-03-31 TraML 0.9.4 draft posted
  • 2010-07-01 TraML 0.9.4 submitted as 1.0.0RC1 to PSI document process
  • 2011-08-23 TraML 0.9.5 draft posted
  • 2011-12-12 TraML 1.0.0 released


New TraML 1.0.0 Finished Specification

(updated 2013-04-22)

The information and documents in this subsection are related to TraML 1.0.0, now complete in its development cycle. There are currently no open issues for a follow-on version. Everyone is encouraged to examine and implement the formatas widely as possible.

XML schema definition files (.xsd):

- TraML1.0.0.xsd (main schema)


Documentation files:

- Full Specification Document: TraML1.0.0.0_specificationDocument.pdf

- HTML schema documentation for TraML1.0.0

- Please cite this journal article when referencing TraML: Deutsch et al. 2011, MCP, 10, R111.015040


Controlled Vocabulary and Mapping Files:

- Latest semantic validator mapping file, which defines where certain controlled vocabulary terms may be used in a document.

- Latest development version of the CV in OBO 1.2 format

    You can explore the CV at the NCBO BioPortal, or at the EBI OLS (Ontology Lookup Service), or with the Java desktop application OBO-Edit.


Sample instance documents for all relevant formats:

- ToyExample1.TraML

- Yeast_ATAQS_gen.TraML (6 MB)

- Yeast_InclusionList.TraML

 - xcmsIncludeTest.TraML

 - TSQ_1832_jTraML_converted.TraML


Other resources:

 - On-line TraML converter using Compomics jTraML (current at version 1.0.0)

 - On-line TraML validator at OpenMS (current at version 1.0.0)





Subscribe to RSS - Specifications