PEFF - PSI Extended Fasta Format

PEFF (PSI Extended Fasta Format) is a proposed unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). This format enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. It also allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. The proposed format has the form of a plain text file that extends the formalism of the individual sequence entries as presented in the FASTA format and that includes a header of meta data to describe relevant information about the database(s) from which the sequences have been obtained (i.e., name, version, etc). The format is named PEFF (PSI Extended FASTA Format). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools.

Status
(updated 2017-10-09)

The specification is nearly complete and almost ready to enter the PSI Document Process for formal review. See below for the list of remaining TO DO items.

Available Materials

- Main GitHub page with most relevant materials: https://github.com/HUPO-PSI/PEFF

- Current and earlier specification documents

- Online PEFF Validator - Upload a prospective PEFF file and see validation status

 

Current Implementations

- neXtProt is exporting PEFF

- Comet is reading PEFF as input database format

- Compomics has a PEFF viewer

- ProteinPilot reads PEFF as input database format


TO DO Items:

 - Discuss what is to be done about proteoform support. Dare we try to add a more compact form?

 - Transfer the CV to PSI-MS

 - Review and agree on the specification

 - Update the Perl library and the validator

 - Review and edit the manuscript

 - Get a final updated export from neXtProt

 - Get Harald to update and export the Java library and viewer

 - Submit the specification to the document process

 - After it clears the Steering Group phase, submit the manuscript

 - Promote additional implementations

 - Implement a PEFF programmatic editor

 - Update the current validator for some additional checks and output and new code

 - Assess what happens with duplicate keys (e.g. two \VariantSimple in the same record) validation error?

 


Tags: