OpenMS
Loading...
Searching...
No Matches
PercolatorInfile Class Reference

Class for storing Percolator tab-delimited input files. More...

#include <OpenMS/FORMAT/PercolatorInfile.h>

Static Public Member Functions

static void store (const std::string &pin_file, const PeptideIdentificationList &peptide_ids, const StringList &feature_set, const std::string &enz, int min_charge, int max_charge)
 
static PeptideIdentificationList load (const std::string &pin_file, bool higher_score_better, const std::string &score_name, const StringList &extra_scores, StringList &filenames, std::string decoy_prefix="", double threshold=0.01, bool SageAnnotation=false)
 Loads peptide identifications from a Percolator input file.
 
static std::string getScanIdentifier (const PeptideIdentification &pid, size_t index)
 
static StringList getStandardFeatureSet (int min_charge, int max_charge)
 Returns the standard Percolator feature columns every .pin file should declare.
 
static std::set< std::pair< size_t, size_t > > stampPinFeaturesOnHits (PeptideIdentificationList &peptide_ids, const std::string &enz, int min_charge, int max_charge)
 Compute and stamp PIN-equivalent meta values on every PeptideHit.
 

Static Protected Member Functions

static TextFile preparePin_ (const PeptideIdentificationList &peptide_ids, const StringList &feature_set, const std::string &enz, int min_charge, int max_charge)
 
static bool isEnz_ (const char &n, const char &c, const std::string &enz)
 
static Size countEnzymatic_ (const std::string &peptide, const std::string &enz)
 

Detailed Description

Class for storing Percolator tab-delimited input files.

Member Function Documentation

◆ countEnzymatic_()

static Size countEnzymatic_ ( const std::string &  peptide,
const std::string &  enz 
)
staticprotected

◆ getScanIdentifier()

static std::string getScanIdentifier ( const PeptideIdentification pid,
size_t  index 
)
static

◆ getStandardFeatureSet()

static StringList getStandardFeatureSet ( int  min_charge,
int  max_charge 
)
static

Returns the standard Percolator feature columns every .pin file should declare.

The list contains the three mandatory header columns (SpecId, Label, ScanNr) followed by the standard per-PSM features that preparePin_ computes and sets on every hit: ExpMass, CalcMass, mass, peplen, charge{min..max}, enzN, enzC, enzInt, dm, absdm. Callers should append their search-engine-specific extra_features (and finally Peptide, Proteins) to this list before calling store. This is the single source of truth used by PercolatorAdapter and any other tool that emits .pin for external percolator consumption.

◆ isEnz_()

static bool isEnz_ ( const char &  n,
const char &  c,
const std::string &  enz 
)
staticprotected

◆ load()

static PeptideIdentificationList load ( const std::string &  pin_file,
bool  higher_score_better,
const std::string &  score_name,
const StringList extra_scores,
StringList filenames,
std::string  decoy_prefix = "",
double  threshold = 0.01,
bool  SageAnnotation = false 
)
static

Loads peptide identifications from a Percolator input file.

This function reads a Percolator input file (pin_file) and returns a vector of PeptideIdentification objects. It extracts relevantinformation such as peptide sequences, scores, charges, annotations, and protein accessions, applying specified thresholds and handling decoy targets as needed. Note: If a filename column is encountered the set of filenames is filled in the order of appearance and PeptideIdentifications annotated with the id_merge_index meta value to link them to the filename (similar to a merged idXML file).

Parameters
[in]pin_filehe path to the Percolator input file with a .pin extension.
[in]higher_score_betterA boolean flag indicating whether higher scores are considered better (true) or lower scores are better (false).
[in]score_nameThe name of the primary score to be used for ranking peptide hits.
[out]extra_scoresA list of additional score names that should be extracted and stored in each PeptideHit.
[out]filenamesWill be populated with the unique raw file names extracted from the input data.
[in]decoy_prefixThe prefix used to identify decoy protein accessions. Proteins with accessions starting with this prefix are marked as decoys. Otherwise, it assumes that the pin file already contains the correctly annotated decoy status.
[in]thresholdA double value representing the threshold for the spectrum_q value. Only spectra with spectrum_q below this threshold are processed. Implemented to allow prefiltering of Sage results.
[in]SageAnnotationA boolean value used to determine if the pin file is coming from Sage or not
Returns
A std::vector of PeptideIdentification objects containing the peptide identifications.
Exceptions
`Exception::ParseError`if any line in the input file does not have the expected number of columns. TODO: implement something similar to PepXMLFile().setPreferredFixedModifications(getModifications_(fixed_modifications_names));

◆ preparePin_()

static TextFile preparePin_ ( const PeptideIdentificationList peptide_ids,
const StringList feature_set,
const std::string &  enz,
int  min_charge,
int  max_charge 
)
staticprotected

◆ stampPinFeaturesOnHits()

static std::set< std::pair< size_t, size_t > > stampPinFeaturesOnHits ( PeptideIdentificationList peptide_ids,
const std::string &  enz,
int  min_charge,
int  max_charge 
)
static

Compute and stamp PIN-equivalent meta values on every PeptideHit.

Runs the same per-hit computation that preparePin_ applies when writing a .pin file — but mutates the PeptideIdentifications in place instead of writing to a text file. After this call, each kept hit carries the full set of PIN meta values: SpecId, ScanNr, Label, CalcMass, ExpMass, deltamass, retentiontime, mass, score, peplen, charge1..chargeN, enzN, enzC, enzInt, dm, absdm, Peptide, Proteins.

Useful for in-process Percolator training (see OpenMS::Percolator): callers can then train on the exact same feature vectors the subprocess path would have seen via the .pin round-trip.

Hits with empty PeptideEvidences or UNKNOWN target/decoy status are left untouched; their (pid_index, hit_index) pairs are returned so callers know to skip them.

Parameters
peptide_idsMutated in place; each kept hit gets new meta values.
enzEnzyme name (same values accepted as for store).
min_chargeLower bound for the charge{N} one-hot features.
max_chargeUpper bound for the charge{N} one-hot features.
Returns
Indices of skipped hits as (pid_index, hit_index) pairs.

◆ store()

static void store ( const std::string &  pin_file,
const PeptideIdentificationList peptide_ids,
const StringList feature_set,
const std::string &  enz,
int  min_charge,
int  max_charge 
)
static