OpenMS
|
Computes a consensus from results of multiple peptide identification engines.
potential predecessor tools | → ConsensusID → | potential successor tools |
---|---|---|
IDPosteriorErrorProbability | PeptideIndexer | |
IDFilter | ||
IDMapper |
Reference:
Nahnsen et al.: Probabilistic consensus scoring improves tandem mass spectrometry peptide identification (J. Proteome Res., 2011, PMID: 21644507).
Algorithms:
ConsensusID offers several algorithms that can aggregate results from multiple peptide identification engines ("search engines") into consensus identifications - typically one per MS2 spectrum. This works especially well for search engines that provide more than one peptide hit per spectrum, i.e. that report not just the best hit, but also a list of runner-up candidates with corresponding scores.
The available algorithms are (see also OpenMS::ConsensusIDAlgorithm and its subclasses):
PEPMatrix:
Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities. This algorithm uses a substitution matrix to score the similarity of sequences not listed by all search engines. It requires PEPs as the scores for all peptide hits. PEPIons:
Scoring based on posterior error probabilities (PEPs) and fragment ion similarities ("shared peak count"). This algorithm, too, requires PEPs as scores. best:
For each peptide ID, this uses the best score of any search engine as the consensus score. All peptide IDs must have the same score type. worst:
For each peptide ID, this uses the worst score of any search engine as the consensus score. All peptide IDs must have the same score type. average:
For each peptide ID, this uses the average score of all search engines as the consensus score. Again, all peptide IDs must have the same score type. ranks:
Calculates a consensus score based on the ranks of peptide IDs in the results of different search engines. The final score is in the range (0, 1], with 1 being the best score. The input peptide IDs do not need to have the same score type.PEPs for search results can be calculated using the IDPosteriorErrorProbability tool, which supports a variety of search engines.
PEPMatrix
algorithm: The similarity scoring method used there can only take unmodified peptide sequences into account, so PTMs are ignored during that step. However, the PTMs are not removed from the peptides, and there will be separate results for differently-modified peptides.File types:
Different input files types are supported:
rt_delta
and mz_delta
). One consensus identification will be generated for each group. With the per_spectrum flag you can also input multiple idXML files. A consensus will be made per combination of originating mzml file and spectrum_ref. Filtering:
Generally, search results can be filtered according to various criteria using IDFilter before (or after) applying this tool. ConsensusID itself offers only a limited number of filtering options that are especially useful in its context (see the filter
parameter section):
considered_hits:
Limits the number of alternative peptide hits considered per spectrum/feature for each identification run. This helps to reduce runtime, especially for the PEPMatrix
and PEPIons
algorithms, which involve costly "all vs. all" comparisons of peptide hits. min_support:
This allows filtering of peptide hits based on agreement between search engines. Every peptide sequence in the analysis has been identified by at least one search run. This parameter defines which fraction (between 0 and 1) of the remaining search runs must "support" a peptide identification that should be kept. The meaning of "support" differs slightly between algorithms: For best
, worst
, average
and rank
, each search run supports peptides that it has also identified among its top considered_hits
candidates. So min_support
simply gives the fraction of additional search engines that must have identified a peptide. (For example, if there are three search runs, and only peptides identified by at least two of them should be kept, set min_support
to 0.5.) For the similarity-based algorithms PEPMatrix
and PEPIons
, the "support" for a peptide is the average similarity of the most-similar peptide from each (other) search run. (In the context of the JPR publication, this is the average of the similarity scores used in the consensus score calculation for a peptide.) count_empty:
Typically not all search engines will provide results for all searched MS2 spectra. This parameter determines whether search runs that provided no results should be counted in the "support" calculation; by default, they are ignored.The command line parameters of this tool are:
ConsensusID -- Computes a consensus of peptide identifications of several identification engines. Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_ConsensusID.html Version: 3.4.0-pre-nightly-2024-11-30 Nov 30 2024, 02:33:34, Revision: 7ff3f2e To cite OpenMS: + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7. Usage: ConsensusID <options> This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript ion or use the --helphelp option Options (mandatory options marked with '*'): -in <file(s)>* Input file (valid formats: 'idXML', 'featureXML', 'consensusXML') -out <file>* Output file (valid formats: 'idXML', 'featureXML', 'consensusXML') -rt_delta <value> [idXML input only] Maximum allowed retention time deviation between ident ifications belonging to the same spectrum. (default: '0.1') (min: '0.0') -mz_delta <value> [idXML input only] Maximum allowed precursor m/z deviation between identi fications belonging to the same spectrum. (default: '0.1') (min: '0.0') -per_spectrum (only idXML) if set, mapping will be done based on exact matching of orig inating mzml file and spectrum_ref Options for filtering peptide hits: -filter:considered_hits <number> The number of top hits in each ID run that are considered for consensus scoring ('0' for all hits). (default: '0') (min: '0') -filter:min_support <value> For each peptide hit from an ID run, the fraction of other ID runs that must support that hit (otherwise it is removed). (default: '0.0') (min: '0.0' max: '1.0') -filter:count_empty Count empty ID runs (i.e. those containing no peptide hit for the current spectrum) when calculating 'min_support'? -filter:keep_old_scores If set, keeps the original scores as user params -algorithm <choice> Algorithm used for consensus scoring. * PEPMatrix: Scoring based on posterior error probabilities (PEPs) and peptide sequence similarities (scored by a substitution matrix). Requires PEPs as scores. * PEPIons: Scoring based on posterior error probabilities (PEPs) and frag ment ion similarities ('shared peak count'). Requires PEPs as scores. * best: For each peptide ID, use the best score of any search engine as the consensus score. Requires the same score type in all ID runs. ... types. (default: 'PEPMatrix') (valid: 'PEPMatrix', 'PEPIons', 'best', 'worst', 'average', 'ranks') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced) The following configuration subsections are valid: - PEPIons PEPIons algorithm parameters - PEPMatrix PEPMatrix algorithm parameters You can write an example INI file using the '-write_ini' option. Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor. For more information, please consult the online documentation for this tool: - http://www.openms.de/doxygen/nightly/html/TOPP_ConsensusID.html
INI file documentation of this tool: