Tool to estimate the false discovery rate on peptide and protein level
This TOPP tool calculates the false discovery rate (FDR) for results of target-decoy searches. The FDR calculation can be performed for proteins and/or for peptides (more exactly, peptide spectrum matches).
The false discovery rate is defined as the number of false discoveries (decoy hits) divided by the number of false and correct discoveries (both target and decoy hits) with a score better than a given threshold.
PeptideIndexer must be applied to the search results (idXML file) to index the data and to annotate peptide and protein hits with their target/decoy status.
- Note
- When no decoy hits were found you will get a warning like this:
"FalseDiscoveryRate: #decoy sequences is zero! Setting all target sequences to q-value/FDR 0!"
This should be a serious concern, since it indicates a possible problem with the target/decoy annotation step (PeptideIndexer), e.g. due to a misconfigured database.
-
FalseDiscoveryRate only annotates peptides and proteins with their FDR. By setting FDR:PSM or FDR:protein the maximum q-value (e.g., 0.05 corresponds to an FDR of 5%) can be controlled on the PSM and protein level. Alternatively, FDR filtering can be performed in the IDFilter tool by setting score:pep and score:prot to the maximum q-value. After potential filtering, associations are automatically updated and unreferenced proteins/peptides removed based on the advanced cleanup parameters.
-
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.
The command line parameters of this tool are:
FalseDiscoveryRate -- Estimates the false discovery rate on peptide and protein level using decoy searches.
Full documentation: http://www.openms.de/doxygen/release/3.2.0/html/TOPP_FalseDiscoveryRate.html
Version: 3.2.0 Nov 18 2024, 16:14:00, Revision: 03223c3
To cite OpenMS:
+ Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.
Usage:
FalseDiscoveryRate <options>
This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option
Options (mandatory options marked with '*'):
-in <file>* Identifications from searching a target-decoy database. (valid formats: 'idXML')
-out <file>* Identifications with annotated FDR (valid formats: 'idXML')
-PSM <FDR level> Perform FDR calculation on PSM level (default: 'true') (valid: 'true', 'false')
-peptide <FDR level> Perform FDR calculation on peptide level and annotates it as meta value
(Note: if set, also calculates FDR/q-value on PSM level.) (default: 'false') (vali
d: 'true', 'false')
-protein <FDR level> Perform FDR calculation on protein level (default: 'true') (valid: 'true', 'false'
)
FDR control:
-FDR:PSM <fraction> Filter PSMs based on q-value (e.g., 0.05 = 5% FDR, disabled for 1) (default: '1.0'
) (min: '0.0' max: '1.0')
-FDR:protein <fraction> Filter proteins based on q-value (e.g., 0.05 = 5% FDR, disabled for 1) (default:
'1.0') (min: '0.0' max: '1.0')
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
The following configuration subsections are valid:
- algorithm Parameter section for the FDR calculation algorithm
You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
- http://www.openms.de/doxygen/release/3.2.0/html/TOPP_FalseDiscoveryRate.html
INI file documentation of this tool:
Legend:
required parameter
advanced parameter
+FalseDiscoveryRateEstimates the false discovery rate on peptide and protein level using decoy searches.
version3.2.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'FalseDiscoveryRate'
in
Identifications from searching a target-decoy database.input file*.idXML
out
Identifications with annotated FDRoutput file*.idXML
PSMtrue
Perform FDR calculation on PSM leveltrue, false
peptidefalse
Perform FDR calculation on peptide level and annotates it as meta value
(Note: if set, also calculates FDR/q-value on PSM level.)true, false
proteintrue
Perform FDR calculation on protein leveltrue, false
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue, false
forcefalse
Overrides tool-specific checkstrue, false
testfalse
Enables the test mode (needed for internal use only)true, false
+++FDRFDR control
PSM1.0
Filter PSMs based on q-value (e.g., 0.05 = 5% FDR, disabled for 1)0.0:1.0
protein1.0
Filter proteins based on q-value (e.g., 0.05 = 5% FDR, disabled for 1)0.0:1.0
++++cleanupCleanup references after FDR control
remove_proteins_without_psmstrue
Remove proteins without PSMs (due to being decoy or below PSM FDR threshold).true, false
remove_psms_without_proteinstrue
Remove PSMs without proteins (due to being decoy or below protein FDR threshold).true, false
remove_spectra_without_psmstrue
Remove spectra without PSMs (due to being decoy or below protein FDR threshold). Caution: if remove_psms_without_proteins is false, protein level filtering does not propagate.true, false
+++algorithmParameter section for the FDR calculation algorithm
no_qvaluesfalse
If 'true' strict FDRs will be calculated instead of q-values (the default)true, false
use_all_hitsfalse
If 'true' not only the first hit, but all are used (peptides only)true, false
split_charge_variantsfalse
If 'true' charge variants are treated separately (for peptides of combined target/decoy searches only).true, false
treat_runs_separatelyfalse
If 'true' different search runs are treated separately (for peptides of combined target/decoy searches only).true, false
add_decoy_peptidesfalse
If 'true' decoy peptides will be written to output file, too. The q-value is set to the closest target score.true, false
add_decoy_proteinsfalse
If 'true' decoy proteins will be written to output file, too. The q-value is set to the closest target score.true, false
conservativetrue
If 'true' (D+1)/T instead of (D+1)/(T+D) is used as a formula.true, false