OpenMS
Loading...
Searching...
No Matches
PercolatorTypes.h File Reference
Include dependency graph for PercolatorTypes.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  RescoreInput
 Input to domain-agnostic Percolator::rescore. More...
 
struct  RescoreOutput
 Output from Percolator::rescore. Aligned 1:1 with RescoreInput::features. More...
 
struct  PercolatorModel
 Trained Percolator model: averaged SVM weights in raw feature space. More...
 

Namespaces

namespace  OpenMS
 Main OpenMS namespace.
 

Class Documentation

◆ OpenMS::RescoreInput

struct OpenMS::RescoreInput

Input to domain-agnostic Percolator::rescore.

Row ordering is preserved in the output. Each row corresponds to one data point — the row's semantics (PSM, transition, peak group, etc.) are determined by the caller.

Collaboration diagram for RescoreInput:
[legend]
Class Members
vector< double > calc_masses
vector< int > cv_group_keys

Per-row integer key used to group rows into the same cross-validation fold. Rows sharing a key will never be split across folds. Leave empty to use row index (each row in its own group). Supply this when rows have natural duplication (e.g., multiple PSMs from one spectrum, multiple transitions from one precursor).

vector< double > exp_masses
StringList feature_names Names aligned 1:1 with feature columns; used for logging only.
vector< vector< double > > features

[n_rows][n_features] scalar features per row. Rows must all have the same length.

vector< bool > is_decoy Target (false) or decoy (true) label per row.
vector< int > scan_numbers Optional per-row PIN-compatible fields.

When supplied, these override the synthetic defaults (scan = row_index, specFileNr = 0, expMass = calcMass = 0.0) that the wrapper uses internally. Populate them when the in-process output must be bitwise-identical to running the external percolator binary on a .pin file derived from the same PSMs: Percolator's internal PSM sort order (OrderScanHash hashes specFileNr and scan) determines the CV fold assignment, so scan_numbers and spec_file_numbers must match those that PercolatorInfile::store would emit.

Percolator::fillPINCompatibleFields() is a helper that derives all four vectors from a vector of PeptideIdentifications using the same conventions as PercolatorInfile::store.

Each vector is either empty (in which case the default is used) or contains exactly n_rows entries, one per feature row.

vector< int > spec_file_numbers

◆ OpenMS::RescoreOutput

struct OpenMS::RescoreOutput

Output from Percolator::rescore. Aligned 1:1 with RescoreInput::features.

Collaboration diagram for RescoreOutput:
[legend]
Class Members
vector< double > peps posterior error probability per row
vector< double > q_values q-value per row
vector< double > scores SVM discriminant score per row.

◆ OpenMS::PercolatorModel

struct OpenMS::PercolatorModel

Trained Percolator model: averaged SVM weights in raw feature space.

Produced by Percolator::train and consumed by Percolator::score. The weights are un-normalized: they are intended to multiply raw input features directly. The normalization transform learned by the SVM has already been folded into the weights and bias by Normalizer::unnormalizeweight(). Callers must therefore not normalize features before score(); doing so would apply the transform twice.

The raw SVM dot product for a row with feature vector f is raw = sum_j(f[j] * weights[j]) + weights[n_features] // bias last Percolator::score() applies a further FDR-based rescaling on top of this raw value to produce the final SVM discriminant reported in RescoreOutput.scores; see Percolator::score for the exact formula.

Collaboration diagram for PercolatorModel:
[legend]
Class Members
StringList feature_names

Feature column names. Must be non-empty and must match RescoreInput::feature_names positionally at score time. Any string value is permitted: the bias is stored in the header by saveModel(), so feature names carry no reserved meaning.

int format_version = 1 Integer schema version for the on-disk format.
string normalizer_type

"stdv" | "uni" | "none" — the normalizer used during training. Informational only; all three produce raw-space weights that score() can apply directly, since the normalization transform is already folded into the weights and bias. Recorded so that reproducibility tooling can identify the learner configuration that produced the model.

int seed = 0 Random seed used during training. Informational.
vector< double > weights Size = n_features + 1. The last entry is the bias.