mzSpecLib

mzSpecLib is a formal standard and file format in development at HUPO-PSI to store and distribute spectral libraries. The main target audience for this format is the developers of spectral library search tools and resources.

Over past years several file formats have been created to store and disseminate spectral libraries, such as MSP, X!Hunter binary MGF, BiblioSpec SQLite and SSL/MS2, SpectraST SPLIB/SPTXT, MassBank formats, and Spectronaut CSV. Each spectral library provider uses one of these formats. For example, PeptideAtlas uses SPLIB, PRIDE and NIST uses MSP, and GPMDB uses X!Hunter binary MGF. Some spectral library search engines support multiple formats, and some do not, making it difficult to share libraries and compare spectral library searching tools. In the proteomics community, there has been a long-standing effort to standardize raw mass spectrometric data and the results of data analysis, primarily identification. But spectral libraries straddle the boundary between the two and cannot be adequately served by either effort.

As there remains much fluidity and disagreement in what information should go into a spectral library, the format must be flexible enough to fit all the potential use cases of spectral libraries, and yet retain sufficient structure for it to be a practically useful standard. The specification therefore focuses on the data model and mechanism for providing controlled vocabulary-based metadata, while leaving open the serialization mechanism to several interchangeable possibilities. Further detailed information, including any updates to this document, implementations, and examples is available at github.com/HUPO-PSI/mzSpecLib.

Community input and contributions are always welcome in the form of GitHub issues or comments and suggestions in the relevant Google documents: