PEFF - PSI Extended Fasta Format




The PSI Extended Fasta Format (PEFF) is a proposed unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). This format enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. It also allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. The proposed format has the form of a plain text file that extends the formalism of the individual sequence entries as presented in the FASTA format and that includes a header of meta data to describe relevant information about the database(s) from which the sequences have been obtained (i.e., name, version, etc). The format is named PEFF (PSI Extended FASTA Format). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools. 

(updated 2019-01-10)

The specification has been reviewed by the steering group. Comments from the steering group and minor issues identified at the Heidelberg workshop are currently being addressed. The next draft of the PEFF specification will re-enter the PSI Document Process for formal review shortly.

Available Materials

- Main GitHub page with most relevant materials:

Current and earlier specification documents

Online PEFF Validator - Upload a prospective PEFF file and see validation status

- Downloadable Perl PEFF Validator - Validate PEFF files locally with this Perl library


Current Implementations

PEFF Producers
Info Date Product Detail Link Comment
2019-01-10 neXtProt Download Exports all curated PTMs and nsSNPs into PEFF compliant with PEFF1.0_DRAFT28
2019-01-11 UniProt Proteins API variation services Exports nsSNPs for requested UniProtKB entries
2019-03-25 Pyteomics 4.0 peff class ?
2019-01-10 Proteomics::PEFF Example Tutorial Converts FASTA file into PEFF, or alters existing PEFF files, compliant with PEFF1.0_DRAFT31
2019-01-10 Proteoformer   Determines RIBO-seq derived proteoforms and can write its output in PEFF, compliant with PEFF1.0_DRAFT28
2018-12-12 PatternLab   Unknown. Yasset?
2018-12-12 canprovar   Unknown. Yasset?
2018-12-12 Mascot   Unknown. Yasset?


PEFF Consumers
Info Date Product Detail Link Comment
2019-01-10 Comet PEFF Parameters Searches an MS run using a PEFF file as a reference, can search for known nsSNPs and PTMs, compliant with PEFF1.0_DRAFT28
2019-03-25 Pyteomics 4.0 peff class ?
2019-01-10 ProteinPilot   Rumored to support early draft of PEFF
2019-01-10 Online Validator Upload and Validate Accepts PEFF upload and validates that PEFF is compliant with PEFF1.0_DRAFT31
2019-01-10 Proteomics::PEFF Example Tutorial Validates, reads, and writes PEFF, compliant with PEFF1.0_DRAFT31
2019-01-10 phpMs   Supports the use and viewing of PEFF files, compliant with PEFF1.0_DRAFT??
2019-01-10 ProteoMapper On-line version Supports searching variation in PEFF files given a list of input peptides, compliant with PEFF1.0_DRAFT31
2019-01-10 TPP   Full support for viewing of PEFF reference proteome is planned for second quarter 2019.

In the above tables, the Info Date column represents the date on which the information in that row was judged to be up-to-date (or actively updated). The Product column is the name of and a hyperlink to the resource supporting PEFF. The Detail Link column provides one or more hyperlinks to details of PEFF support in the product if available.


TO DO Items:

 - Resubmit the specification to the document process

 - Review, edit, and submit the manuscript

 - Promote additional implementations

 - Assess what happens/should happen with duplicate keys (e.g. two \VariantSimple in the same record). Is that a validation error? or just concatenate?