OpenMS
PercolatorAdapter

PercolatorAdapter facilitates the input to, the call of and output integration of Percolator. Percolator (http://percolator.ms/) is a tool to apply semi-supervised learning for peptide identification from shotgun proteomics datasets.

Experimental classes:
This tool is work in progress and usage and input requirements might change.
pot. predecessor tools → PercolatorAdapter → pot. successor tools
PSMFeatureExtractor IDFilter

Percolator is search engine sensitive, i.e. it's input features vary, depending on the search engine. Must be prepared beforehand. If you do not want to use the specific features, use the generic_feature_set flag. Will incorporate the score attribute of a PSM, so be sure, the score you want is set as main score with IDScoreSwitcher . Be aware, that you might very well experience a performance loss compared to the search engine specific features. You can also perform protein inference with percolator when you activate the protein fdr parameter. Additionally you need to set the enzyme setting. We only read the q-value for protein groups since Percolator has a more elaborate FDR estimation. For proteins we add q-value as main score and PEP as metavalue. For PSMs you can choose the main score. Peptide level FDRs cannot be parsed and used yet.

Multithreading: The thread parameter is passed to percolator. Note: By default, a minimum of 3 threads is used (default of percolator) even if the number of threads is set to e.g. 1 for backwards compatibility reasons. You can still force the usage of less than 3 threads by setting the force flag.

The command line parameters of this tool are:

stty: 'standard input': Inappropriate ioctl for device

PercolatorAdapter -- Facilitate input to Percolator and reintegrate.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_PercolatorAdapter.html
Version: 3.4.0-pre-nightly-2025-01-20 Jan 21 2025, 02:20:01, Revision: 91e1ce6
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  PercolatorAdapter <options>

Options (mandatory options marked with '*'):
  -in <files>                           Input file(s) (valid formats: 'mzid', 'idXML')
  -in_decoy <files>                     Input decoy file(s) in case of separate searches (valid formats: 'mzid', 'idXML')
  -in_osw <file>                        Input file in OSW format (valid formats: 'OSW')
  -out <file>*                          Output file (valid formats: 'idXML', 'mzid', 'osw')
  -out_type <type>                      Output file type -- default: determined from file extension or content. (valid: 'mzid', 'idXML', 'osw')
  -enzyme <enzyme>                      Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinp (default: 'trypsin') (valid: 'no_enzyme', 'elastase', 'pepsin', 'proteinasek', 'thermolysin', 'chymotrypsin', 'lys-n', 'lys-c', 'arg-c', 'asp-n', 'glu-c', 'trypsin', 'trypsinp')
  -percolator_executable <executable>*  The Percolator executable. Provide a full or relative path, or make sure it can be found in your PATH environment.
  -peptide_level_fdrs                   Calculate peptide-level FDRs instead of PSM-level FDRs.
  -protein_level_fdrs                   Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy_pattern to set the Fasta file and decoy pattern.
  -osw_level <osw_level>                OSW: the data level selected for scoring. (default: 'ms2') (valid: 'ms1', 'ms2', 'transition')
  -score_type <type>                    Type of the peptide main score (default: 'q-value') (valid: 'q-value', 'pep', 'svm')
                                        
Common TOPP options:
  -ini <file>                           Use the given TOPP INI file
  -threads <n>                          Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>                     Writes the default configuration file
  --help                                Shows options
  --helphelp                            Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+PercolatorAdapterFacilitate input to Percolator and reintegrate.
version3.4.0-pre-nightly-2025-01-20 Version of the tool that generated this parameters file.
++1Instance '1' section for 'PercolatorAdapter'
in[] Input file(s)input file*.mzid, *.idXML
in_decoy[] Input decoy file(s) in case of separate searchesinput file*.mzid, *.idXML
in_osw Input file in OSW formatinput file*.OSW
out Output fileoutput file*.idXML, *.mzid, *.osw
out_pin Write pin file (e.g., for debugging)output file*.tsv
out_pout_target Write pout file (e.g., for debugging)output file*.tab
out_pout_decoy Write pout file (e.g., for debugging)output file*.tab
out_pout_target_proteins Write pout file (e.g., for debugging)output file*.tab
out_pout_decoy_proteins Write pout file (e.g., for debugging)output file*.tab
out_type Output file type -- default: determined from file extension or content.mzid, idXML, osw
enzymetrypsin Type of enzyme: no_enzyme,elastase,pepsin,proteinasek,thermolysin,chymotrypsin,lys-n,lys-c,arg-c,asp-n,glu-c,trypsin,trypsinpno_enzyme, elastase, pepsin, proteinasek, thermolysin, chymotrypsin, lys-n, lys-c, arg-c, asp-n, glu-c, trypsin, trypsinp
percolator_executablepercolator The Percolator executable. Provide a full or relative path, or make sure it can be found in your PATH environment.input file, is_executable
peptide_level_fdrsfalse Calculate peptide-level FDRs instead of PSM-level FDRs.true, false
protein_level_fdrsfalse Use the picked protein-level FDR to infer protein probabilities. Use the -fasta option and -decoy_pattern to set the Fasta file and decoy pattern.true, false
osw_levelms2 OSW: the data level selected for scoring.ms1, ms2, transition
score_typeq-value Type of the peptide main scoreq-value, pep, svm
generic_feature_setfalse Use only generic (i.e. not search engine specific) features. Generating search engine specific features for common search engines by PSMFeatureExtractor will typically boost the identification rate significantly.true, false
subset_max_train0 Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal.
cpos0.0 Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.
cneg0.0 Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified.
testFDR0.01 False discovery rate threshold for evaluating best cross validation result and the reported end result.
trainFDR0.01 False discovery rate threshold to define positive examples in training. Set to testFDR if 0.
maxiter10 Maximal number of iterations
nested_xval_bins1 Number of nested cross-validation bins in the 3 splits.
quick_validationfalse Quicker execution by reduced internal cross-validation.true, false
weights Output final weights to the given fileoutput file*.tsv
init_weights Read initial weights to the given fileinput file*.tsv
staticfalse Use static model (requires init-weights parameter to be set)true, false
default_direction The most informative feature given as the feature name, can be negated to indicate that a lower value is better.
verbose2 Set verbosity of output: 0=no processing info, 5=all.
unitnormfalse Use unit normalization [0-1] instead of standard deviation normalizationtrue, false
test_each_iterationfalse Measure performance on test set each iterationtrue, false
overridefalse Override error check and do not fall back on default score vector in case of suspect score vectortrue, false
seed1 Setting seed of the random number generator.
doc0 Include description of correct features
klammerfalse Retention time features calculated as in Klammer et al. Only available if -doc is settrue, false
fasta Provide the fasta file as the argument to this flag, which will be used for protein grouping based on an in-silico digest (only valid if option -protein_level_fdrs is active).input file*.FASTA
decoy_patternrandom Define the text pattern to identify the decoy proteins and/or PSMs, set this up if the label that identifies the decoys in the database is not the default (Only valid if option -protein_level_fdrs is active).
post_processing_tdcfalse Use target-decoy competition to assign q-values and PEPs.true, false
train_best_positivefalse Enforce that, for each spectrum, at most one PSM is included in the positive set during each training iteration. If the user only provides one PSM per spectrum, this filter will have no effect.true, false
ipf_max_peakgroup_pep0.7 OSW/IPF: Assess transitions only for candidate peak groups until maximum posterior error probability.
ipf_max_transition_isotope_overlap0.5 OSW/IPF: Maximum isotope overlap to consider transitions in IPF.
ipf_min_transition_sn0.0 OSW/IPF: Minimum log signal-to-noise level to consider transitions in IPF. Set -1 to disable this filter.
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false

Percolator is written by Lukas Käll (http://per-colator.com/ Copyright Lukas Käll lukas.nosp@m..kal.nosp@m.l@sci.nosp@m.life.nosp@m.lab.s.nosp@m.e)