PEFF - PSI Extended Fasta Format

PEFF Logo

 

 

The PSI Extended Fasta Format (PEFF) is a proposed unified format for protein and nucleotide sequence databases to be used by sequence search engines and other associated tools (spectra library search tools, sequence alignment software, data repositories, etc). This format enables consistent extraction, display and processing of information such as protein/nucleotide sequence database entry identifier, description, taxonomy, etc. across software platforms. It also allows the representation of structural annotations such as post-translational modifications, mutations and other processing events. The proposed format has the form of a plain text file that extends the formalism of the individual sequence entries as presented in the FASTA format and that includes a header of meta data to describe relevant information about the database(s) from which the sequences have been obtained (i.e., name, version, etc). The format is named PEFF (PSI Extended FASTA Format). Sequence database providers are encouraged to generate this format as part of their release policy or to provide appropriate converters that can be incorporated into processing tools. 

Status
(updated 2019-06-19)

The specification has been completed its journey through the document process and has been ratified and released.

Available Materials

- Main GitHub page with most relevant materials: https://github.com/HUPO-PSI/PEFF

Current and earlier specification documents

Online PEFF Validator - Upload a prospective PEFF file and see validation status

- Downloadable Perl PEFF Validator - Validate PEFF files locally with this Perl library

- BioRxiv preprint of the journal article providing a overview of the PEFF (not a substitute for the full specification)

- Reformatted preprint of the journal article providing a general overview of PEFF

 

Current Implementations

PEFF Producers
Info Date Product Detail Link Comment
2019-01-10 neXtProt Download Exports all curated PTMs and nsSNPs into PEFF compliant with PEFF1.0_DRAFT28
2019-01-11 UniProt Proteins API variation services Exports nsSNPs for requested UniProtKB entries
2019-03-25 Pyteomics 4.0 peff class .
2019-01-10 Proteomics::PEFF Example Tutorial Converts FASTA file into PEFF, or alters existing PEFF files, compliant with PEFF1.0_DRAFT31
2019-01-10 Proteoformer   Determines RIBO-seq derived proteoforms and can write its output in PEFF, compliant with PEFF1.0_DRAFT28
       
       
       
       

 

PEFF Consumers
Info Date Product Detail Link Comment
2019-01-10 Comet PEFF Parameters Searches an MS run using a PEFF file as a reference, can search for known nsSNPs and PTMs, compliant with PEFF1.0_DRAFT28
2019-06-19 Protein Prospector PEFF usage poster Searching PEFF databases supported from v5.18.0 (9/2016). PEFF1.0_DRAFT28 supported from v5.24.0 (6/2019). Variable modifications can be restricted to sites specified by \ModRes*
2019-03-25 Pyteomics 4.0 peff class .
2019-01-10 ProteinPilot   Searching PEFF databases supported in ProteinPilot V5.0 (released 2014) onward
2019-01-10 Online Validator Upload and Validate Accepts PEFF upload and validates that PEFF is compliant with PEFF1.0_DRAFT31
2019-01-10 Proteomics::PEFF Example Tutorial Validates, reads, and writes PEFF, compliant with PEFF1.0_DRAFT31
2019-01-10 phpMs   Supports the use and viewing of PEFF files, compliant with PEFF1.0_DRAFT..
2019-01-10 ProteoMapper On-line version Supports searching variation in PEFF files given a list of input peptides, compliant with PEFF1.0_DRAFT31
2019-01-10 TPP   Full support for viewing of PEFF reference proteome is planned for second quarter 2019.
       
       
       

In the above tables, the Info Date column represents the date on which the information in that row was judged to be up-to-date (or actively updated). The Product column is the name of and a hyperlink to the resource supporting PEFF. The Detail Link column provides one or more hyperlinks to details of PEFF support in the product if available.

 

TO DO Items:

 - Resubmit the specification to the document process

 - Review, edit, and submit the manuscript

 - Promote additional implementations

 - Assess what happens/should happen with duplicate keys (e.g. two \VariantSimple in the same record). Is that a validation error? or just concatenate?