Mass Spectrometry

Mass Spectrometry Standards Working Group Charter

Submitted: 2016-06-10

Template Rev2016b

1.Administrative Section

 Status (New/Update): Update

 Group Name:

A group name should be reasonably descriptive or identifiable.  Additionally, the group must define an acronym (maximum of 8 printable ASCII characters) to reference the group in the PSI directories, mailing lists, and general documents.  The name and acronym must not conflict with any other PSI name and acronym.

HUPO PSI Mass Spectrometry Standards Working Group (PSI-MS WG)

Chair (with affiliation and current email address):

Eric Deutsch, Institute for Systems Biology (

Co-Chairs (1 or 2) (with affiliation and current email address):

Pierre-Alain Binz, CHUV ( 


<position is currently unfilled>


Other officers (optional) (with affiliation and current email address):

Editor(s): <position is currently unfilled>

Minimal Reporting Requirements Coordinator(s): Pierre-Alain Binz, CHUV


Ontology Coordinator(s): Gerhard Mayer, Ruhr-University Bochum (

Web site Maintainer(s): <position is currently unfilled> 

Mailing list:

2.Description and objectives 

Focus and Purpose

The PSI-MS working group is composed of academic, government, and industry researchers, software developers, journal representatives, and instrument manufacturers. The main goal of the PSI-MS working group is to define community data formats and associated controlled vocabulary terms, facilitating data exchange and archiving of mass spectrometry raw data and other data needed as input to informatics analysis.

Current projects of the PSI-MS working group are:

  • Ongoing maintenance and enhancement of the mzML format

  • Completion of the PEFF format

  • Definition of a common set of spectral library/archive metadata


Goal 1: Complete an implementation of the blocked zip algorithm for mzML. The blocked zip algorithm, borrowed from the genomics field, enables random access into a zipped file. This would allow mzML files to remain zipped, which still easily allowing for sequential and random access, along with standard unzipping for complete backwards compatibility.

Goal 2: Complete and submit to the document process the PEFF (PSI Extended FASTA Format). This format is an enhancement of the FASTA format, with the important extension for reliable parsing of the description entries. PEFF files are mostly backwards compatible with FASTA, with the exceptions that FASTA parsers that do not skip lines beginning with # and those that do not handle very long description lines gracefully may have difficulties. 

Goal 3: Define a set of common metadata and controlled vocabulary for spectral libraries and archives. Although there is not currently enough momentum in defining a new PSI spectral library format, there is enough enthusiasm for creating a set of common terms for encoding spectral library and spectrum metadata, such that it could be used uniformly along with any of the existing spectral library or spectral archive formats.




proBed Specification 1.0.0

proBed is one of the data standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.

The original BED format (Browser Extensive Data, - format1), developed by the UCSC (University of California, Santa Cruz) team, is used to describe genome coordinate data across lines, for use on annotation tracks. In BED, data lines are defined as tab-separated plain text with 12 mandatory fields (columns). Of those, only the first three fields are required, and the other 9 are optional.

The proBed format builds upon this original structure by extending the 12 original BED fields to include a further 13 fields to describe information primarily on peptide-spectrum matches (PSMs). The format can also accommodate peptides (as groups of PSMs).

A manuscript describing this proBed format (together with the proBAM format) is available at Genome Biology.


  1. proBed 1.0.0 (Final Version): Specification document and example files
  2. proBed Tools and Implementations

proBed 1.0.0 (Final Version): Specification document and example files

The proBed file format is designed for storing and analyzing peptide spectrum matches (PSMs) within the context of the genome.

Direct links:

proBed Tools and Implementations

proBed example viewed in the Ensembl Genome Browser



mzTab Specification 1.0.0

mzTab is one of the standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.


  1. mzTab 1.0.0 (Final Version): Specification documents
  2. mzTab Tools and Implementations

mzTab 1.0.0 (Final Version): Specification documents

Submitted originally to the PSI document process on May 2012. Final version 1.0.0 accepted on June 2014.

More documentation is available in the mzTab Google code project at

Direct links to deliverables:

mzTab Tools and Implementations

  • jmzTab: A Java API to read, write and merge mzTab files (link)
  • LipidDataAnalyzer: Tool to quantify lipids from LC-MS data (link)
  • OpenMS: Open-source software C++ library for LC/MS data management and analyses (link)
  • MSnBase: Bioconductor package. Basic plotting, data manipulation and processing of MS-based Proteomics data (link)
  • PRIDE Converter 2: A redesign of the PRIDE Converter tool, for performing data submissions to the PRIDE database (link)
  • MaxQuant: A quantitative proteomics software package designed for analyzing large mass-spectrometric data sets (link)
  • PIA: A toolbox for MS based protein inference and identification analysis (link),  PMID:25938255


mzIdentML Development Timeline

1. Spring 2006, Meeting in San Francisco, USA – start of a UML model for AnalysisXML (universal standard for all types of proteome informatics)

Orchard et al. Proteomics 2006, 6, 4439–4443:

“PSI-Proteomics Informatics (PSI-PI) working group now has responsibility for the production of the mass spectrometry informatics standards, such as analysisXML, which will cover, among other things, protein identification reporting. The remit of the groups is to produce a UML data model with an XML implementation, example instance documentation, a validation tool, and an accompanying ontology. The use cases were reviewed and expanded upon and the existing version analysisXML reviewed in the light of these use cases. Migration to a UML model should be achieved in time for ASMS in order to generate an XML schema for public viewing.”

2.  Fall 2006, meeting in Washington DC, USA  - continued work on the AnalysisXML schema

Orchard et al. Proteomics 2007, 7, 337–339:

“Work also continued on AnalysisXML, with the revision of a file containing a list of information available in output files from a majority of currently available search engines. A number of common elements have been mapped to the current model and have been associated to appropriate CV terms.”

3. Spring 2007, Meeting in Lyon, France – continued work on the AnalysisXML schema

Orchard et al.  Proteomics 2007, 7, 3436–3440:

“A draft analysisXML schema and example instance documents were produced at the PSI Autumn 2005 workshop in Washington [5]. In the last few months, feedback has been received from all the major search engine vendors on the parameter spreadsheet and a draft CV prepared as an .OBO file. The aims of the meeting were to further develop the schema, review the instance document and improve the general documentation. By the end of the workshop, the schema had been tested against all of the MIAPE-MSI requirements, with the exception of the requirements for quantification for which a structure has been discussed. SILAC and iTRAQ features have been added as a feature set and these sets can be combined to give a ratio. Instance documents were reviewed and modified with new use cases such as top-down, mixed MS and MS/MS, de Novo sequencing and error tolerant tag searches discussed. Protein inference analysis has been more clearly split from peptide identification. Finally, a decision was made to put the terms required by two or more search engines directly into the schema as attributes/elements rather than described in a CV.”

4. Spring 2008, Meeting in Toledo, Spain – agreed to switch to direct XSD development to speed completion

Orchard et al.  Proteomics 2008, 8, 4168–4172:

“The development of analysisXML has proven far from straightforward, partly because the scope of the project has changed often in a fast moving field. The main aim of this meeting was to readdress the goals of the project and produce a timeline for completing the first release. Fundamental questions such as whether it is practical to try to write a schema that can cover all scenarios, including quantitation support, in the first implementation were considered.

analysisXML has been developed as an extension to FuGE by creating a schema from UML. It was agreed to continue by developing the XML schema directly and extending a cut-down version of the FuGE xsd. Rather than use the FuGE format for the controlled vocabulary, it was agreed to use the same format as for mzML version 1.0. It was also agreed that quantitation will not be addressed until version 2.0. However, a scheme was developed that will ensure that version 1.0 documents will be backwards compatible with the 2.0 schema. Development of quantitation support will be carried out in parallel to the version 1.0 release.”


5. December 2008, submission of AnalysisXML to the PSI document process


6. Spring 2009, Meeting in Turku, Finland  – AnalysisXML split into identification format (mzIdentML) and quantitation (mzQuantML), minor changes to the schema

Orchard et al.  Proteomics 2009, 9, 4429–4432

“The scope of the current format is limited to protein identification and the format previously known as AnalysisXML has been renamed mzIdentML to reflect this. The resources now include semantic validation tools, specification document, tables of conformance to both the MIAPE and MCP guidelines and 12 example instance documents. A manuscript has been prepared and the format was submitted to the PSI document review process in December 2008. The feedback from this process has resulted in a minor set of changes to the schema, documentation and examples.”


“It is now planned to develop a separate schema, mzQuantML, with a structure broadly similar to mzIdentML to add the ability to handle quantitation data.”


7. August 2009 – completion of the PSI document process and formal release of version 1 of mzIdentML


8. 2010-2011  - Some minor issues identified with verbosity in the files, and some redundant information captured. A few minor bugs identified. A decision was taken to fix all bugs in one go and release a new1.1 version


9. August 2011 - Version 1.1 released from the PSI document process, and considered to be the stable development release of the format.





This article is translated to
Serbo-Croatian language by Vera Djuraskovic from

Subscribe to RSS - Mass Spectrometry