PEFF - PSI Extended Fasta Format

The PSI Extended Fasta Format (PEFF) is a proposed unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). This format enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. It also allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. The proposed format has the form of a plain text file that extends the formalism of the individual sequence entries as presented in the FASTA format and that includes a header of meta data to describe relevant information about the database(s) from which the sequences have been obtained (i.e., name, version, etc). The format is named PEFF (PSI Extended FASTA Format). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools. 

(updated 2018-12-11)

The specification has been reviewed by the steering group. Comments from the steering group and minor issues identified at the Heidelberg workshop are currently being addressed. The next draft of the PEFF specification will re-enter the PSI Document Process for formal review shortly.

Available Materials

- Main GitHub page with most relevant materials:

Current and earlier specification documents

Online PEFF Validator - Upload a prospective PEFF file and see validation status

- Downloadable Perl PEFF Validator - Validate PEFF files locally with this Perl library


Current Implementations

- neXtProt is exporting PEFF

- EBI's Proteins REST API /variation endpoint can output PEFF format

- Proteoformer determines RIBO-seq derived proteoforms and can write its output in PEFF

- Comet can read PEFF as input database format

- ProteinPilot can readPEFF as input database format

- Proteomics::PEFF Perl library. Follow the Windows tutorial or the Linux tutorial

- Compomics has a PEFF viewer


TO DO Items:

  - Resubmit the specification to the document process

 - Review, edit, and submit the manuscript

 - Promote additional implementations

 - Assess what happens/should happen with duplicate keys (e.g. two \VariantSimple in the same record). Is that a validation error? or just concatenate?