OpenMS
|
Tool to estimate the probability of peptide hits to be incorrectly assigned.
potential predecessor tools | → IDPosteriorErrorProbability → | potential successor tools |
---|---|---|
MascotAdapter (or other ID engines) | ConsensusID |
By default an estimation is performed using the (inverse) Gumbel distribution for incorrectly assigned sequences and a Gaussian distribution for correctly assigned sequences. The probabilities are calculated by using Bayes' law, similar to PeptideProphet. Alternatively, a second Gaussian distribution can be used for incorrectly assigned sequences. At the moment, IDPosteriorErrorProbability is able to handle X! Tandem, Mascot, MyriMatch and OMSSA scores.
No target/decoy information needs to be provided, since the model fits are done on the mixed distribution.
In order to validate the computed probabilities an optional plot output can be generated. There are two parameters for the plot: The scores are plotted in the form of bins. Each bin represents a set of scores in a range of '(highest_score - smallest_score) / number_of_bins' (if all scores have positive values). The midpoint of the bin is the mean of the scores it represents. The parameter 'out_plot' should be used to give the plot a unique name. Two files are created. One with the binned scores and one with all steps of the estimation. If parameter top_hits_only
is set, only the top hits of each peptide identification are used for the estimation process. Additionally, if 'top_hits_only' is set, target/decoy information is available and a FalseDiscoveryRate run was performed previously, an additional plot will be generated with target and decoy bins ('out_plot' must not be empty). A peptide hit is assumed to be a target if its q-value is smaller than fdr_for_targets_smaller
. The plots are saved as a Gnuplot file. An attempt is made to call Gnuplot, which will create a PDF file containing all steps of the estimation. If this fails, the user has to run Gnuplot manually - or adjust the PATH environment such that Gnuplot can be found and retry.
The command line parameters of this tool are:
IDPosteriorErrorProbability -- Estimates probabilities for incorrectly assigned peptide sequences and a set of search engine scores using a mixture model. Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_IDPosteriorErrorProbability.html Version: 3.3.0-pre-nightly-2024-11-20 Nov 21 2024, 02:34:56, Revision: decb5c8 To cite OpenMS: + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7. Usage: IDPosteriorErrorProbability <options> This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript ion or use the --helphelp option Options (mandatory options marked with '*'): -in <file>* Input file (valid formats: 'idXML') -out <file>* Output file (valid formats: 'idXML') -out_plot <file> Txt file (if gnuplot is available, a corresponding PDF will be created as well.) (valid formats: 'txt') -split_charge The search engine scores are split by charge if this flag is set. Thus, for each charge state a new model will be computed. -top_hits_only If set only the top hits of every PeptideIdentification will be used -ignore_bad_data If set errors will be written but ignored. Useful for pipelines with many datasets where only a few are bad, but the pipeline should run through. -prob_correct If set scores will be calculated as '1 - ErrorProbabilities' and can be interpreted as probabilities for correct identifications. Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced) The following configuration subsections are valid: - fit_algorithm Algorithm parameter subsection You can write an example INI file using the '-write_ini' option. Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor. For more information, please consult the online documentation for this tool: - http://www.openms.de/doxygen/nightly/html/TOPP_IDPosteriorErrorProbability.html
INI file documentation of this tool:
For the parameters of the algorithm section see the algorithms documentation:
fit_algorithm