proBed Specification 1.0.0

proBed is one of the data standards developed by members of the Proteomics Informatics working group of the PSI.

For general information of the activities and the organization of this working group see HERE.

The original BED format (Browser Extensive Data, https://genome.ucsc.edu/FAQ/FAQformat.html - format1), developed by the UCSC (University of California, Santa Cruz) team, is used to describe genome coordinate data across lines, for use on annotation tracks. In BED, data lines are defined as tab-separated plain text with 12 mandatory fields (columns). Of those, only the first three fields are required, and the other 9 are optional.

The proBed format builds upon this original structure by extending the 12 original BED fields to include a further 13 fields to describe information primarily on peptide-spectrum matches (PSMs). The format can also accommodate peptides (as groups of PSMs).

A manuscript describing this proBed format (together with the proBAM format) is available at Genome Biology.

Contents

  1. proBed 1.0.0 (Final Version): Specification document and example files
  2. proBed Tools and Implementations

proBed 1.0.0 (Final Version): Specification document and example files

The proBed file format is designed for storing and analyzing peptide spectrum matches (PSMs) within the context of the genome.

Direct links:


proBed Tools and Implementations

  • ms-data-core-api: A Java API to write and merge proBed files.
  • PGConverter: A Java command line tool to convert and validate mzIdentML, mzTab, PRIDE XML and proBed files. Please see the README file for usage defails.
  • bedToBigBed: Complied linux ELF executable UCSC Utility tool to convert from BED to to proBed.
  • Ensembl genome browser: Able to visualize proBed data, example screenshot below.

proBed example viewed in the Ensembl Genome Browser