Use Cases for mzIdentML
- It should be possible to create a tool that loads an mzIdentML document and enables users to examine results from an MS, MS-MS, MSn or tag searches. (For MSn searches, the assumption is that matches will be of a similar format to those from MS-MS searches and there will be no attempt to model combining, say MS4 matches with the corresponding MS3 and MS-MS results). There should be sufficient information for the tool to generate output reports that conform to the requirements made by journals for publication and that conform to the relevant MIAPE guidelines. For example:
· For a PMF search, it should be possible to display the spectrum and show the matches of the peaks to the relevant peptides, but only if the spectrum is available.
· For an MS-MS search, it should be possible to locate which spectrum matched to which peptide in the original file.
· For a tag search, there should be sufficient information to validate that a result is correct. - There should be sufficient information stored in the instance document to enable a user to run the same search on the same or another search engine. This means that all search parameters should be described in sufficient detail and that sufficient information is available to determine which database (if any) the data were searched against. The peak lists data (if any) do not need to be included in the instance document, but do need to be suitably referenced.
- A PMF search and an MS-MS search of the same sample can be saved in the same instance document as long as the result is one combined protein list.
- It should be possible to save the results of searching a decoy database in the same instance document as the results from the forward database. It should then be possible to write a viewer application that enables a user to investigate the effect of changing, for example, a threshold value on the false discovery rate. This would only be possible if all results (rather than just top matches) from the search are saved in the mzIdentML document and if the results from the decoy search are also saved. It would only be possible to do this at the peptide level for an ms-ms search, because changing thresholds would normally have some effect on the protein grouping algorithm.
- It should be possible to save manual or automated annotation of proteins/peptides in an instance document. A third party tool could be used to save annotations and validations of identified proteins/peptides to an existing instance document
- It should be possible to save the results from a search of a metabolically labeled sample. For example, with a 14N/15N experiment, two separate sets of amino acid masses are used, and it must be possible to tell which masses were used for each peptide result.
- For a search of multiple peaks lists, it should be possible to identify the spectrum that obtained a match to a particular peptide or protein reported by the search engine. For example, in an LC-MS-MS run, it should be possible to refer back to the spectrum in the peak list file that was searched and from there, if the information is available to be able to determine the retention time of the spectrum. For an mzML file, the unique 'id' of the spectrum should be available. For other peak list formats, some other unique identifier should be stored where possible. There is no requirement to store other redundant information in the mzIdentML file that will be available in the peak list data.
- It should be possible to search an anlysisXML file to retrieve all molecules that have a specified modification.
- It should be possible to store the results of a search of spectra against other spectra - i.e. a spectral library search.
- It should be possible to store the results of a top down search i.e. analysis of complete proteins.
- Support for storing fragmentation data so that for example viewers could display which ions in the input data match predicted ion fragment masses.
- There should be support for storing the results of searches of peptides against nucleic acid databases, including the information about which translation frame the matches were found in.
- It should be possible to combine the results from multiple search engines into one mzIdentML document. For example, the peptide identification results from two different search engines could be combined using a third tool to give one set of protein results.
There will be limited support for the following use cases:
- De novo. De novo peptide sequencing results will be supported to the extent that it will be possible to enumerate through and record all possible matches found by a denovo technique, however, we anticipate that this will produce extremely large files. In version 2, solutions will be investigated for defining a standard way of reporting ambiguous combinations of residues.
The following use cases will not be supported in version 1 of mzIdentML:
- It should be possible to store relative and absolute quantitation information at the peptide and protein level using all the popular techniques [Deferred to version 2].
- Support for LC-MS biomarker discovery.
- Support for complex workflows where multiple data processing algorithms are tagged together; i.e. only “final” results are represented in mzIdentML v1, no intermediate results.