OpenMS
DatabaseSuitability

Calculates the suitability of a database which was used a for peptide identification search. Also reports the quality of LC-MS spectra.

The metric this tool uses to determine the suitability of a database is based on a de novo model. Therefore it is crucial that your workflow is set up the right way. Above you can see an example.
Most importantly the peptide identification search needs to be done with a combination of the database in question and a de novo "database".
To generate the de novo "database":

For re-ranking all cases where a peptide hit only found in the de novo "database" scores above a peptide hit found in the actual database are checked. In all these cases the cross-correlation scores of those peptide hits are compared. If they are similar enough, the database hit will be re-ranked to be on top of the de novo hit. You can control how much of cases with similar scores will be re-ranked by using the reranking_cutoff_percentile.
For this to work it is important PeptideIndexer ran before. However it is also crucial that no FDR was performed. This tool does this itself and will crash if a q-value is found. You can still control the FDR that you want to establish using the corresponding flag.

Note
For identification search the recommended search engine is Comet because the Comet cross-correlation score is recommended for re-ranking.
If you use other search engines re-ranking will be turned off automatically. You can still enforce re-ranking by using the 'force' flag.
In this case the tool will use the default score of your search engine. This can result in undefined behaviour. Be warned.

The results are written directly into the console. But you can provide an optional tsv output file where the most important results will be exported to.

This tool uses the metrics and algorithms first presented in:
Assessing protein sequence database suitability using de novo sequencing. Molecular & Cellular Proteomics. January 1, 2020; 19, 1: 198-208. doi:10.1074/mcp.TIR119.001752.
Richard S. Johnson, Brian C. Searle, Brook L. Nunn, Jason M. Gilmore, Molly Phillips, Chris T. Amemiya, Michelle Heck, Michael J. MacCoss.

The command line parameters of this tool are:

DatabaseSuitability -- Computes a suitability score for a database which was used for a peptide identificatio
n search. Also reports the quality of LC-MS spectra.
Full documentation: http://www.openms.de/doxygen/release/3.0.0/html/TOPP_DatabaseSuitability.html
Version: 3.0.0 Jul 14 2023, 11:57:33, Revision: be787e9
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.
To cite DatabaseSuitability:
 + Richard S. Johnson, Brian C. Searle, Brook L. Nunn, Jason M. Gilmore, Molly Phillips, Chris T. Amemiya, 
   Michelle Heck, Michael J. MacCoss. Assessing protein sequence database suitability using de novo sequencin
   g. Molecular & Cellular Proteomics. January 1, 2020; 19, 1: 198-208. doi:10.1074/mcp.TIR119.001752.

Usage:
  DatabaseSuitability <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
  -in_id <file>*          Input idXML file from a peptide identification search with a combined database. 
                          PeptideIndexer is needed, FDR is forbidden. (valid formats: 'idXML')
  -in_spec <file>*        Input MzML file used for the peptide identification (valid formats: 'mzML')
  -in_novo <file>*        Input idXML file containing de novo peptides (unfiltered) (valid formats: 'idXML')
  -database <file>*       Input FASTA file of the database in question (valid formats: 'FASTA')
  -novo_database <file>*  Input deNovo sequences derived from MzML given in 'in_spec' concatenated to one 
                          FASTA entry (valid formats: 'FASTA')
  -out <file>             Optional tsv output containing database suitability information as well as spectral
                           quality. (valid formats: 'tsv')
                          
Common TOPP options:
  -ini <file>             Use the given TOPP INI file
  -threads <n>            Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>       Writes the default configuration file
  --help                  Shows options
  --helphelp              Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Parameter section for the suitability calculation algorithm

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/3.0.0/html/TOPP_DatabaseSuitability.html

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+DatabaseSuitabilityComputes a suitability score for a database which was used for a peptide identification search. Also reports the quality of LC-MS spectra.
version3.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'DatabaseSuitability'
in_id Input idXML file from a peptide identification search with a combined database. PeptideIndexer is needed, FDR is forbidden.input file*.idXML
in_spec Input MzML file used for the peptide identificationinput file*.mzML
in_novo Input idXML file containing de novo peptides (unfiltered)input file*.idXML
database Input FASTA file of the database in questioninput file*.FASTA
novo_database Input deNovo sequences derived from MzML given in 'in_spec' concatenated to one FASTA entryinput file*.FASTA
out Optional tsv output containing database suitability information as well as spectral quality.output file*.tsv
novo_threshold60.0 Minimum score a de novo sequence has to have to be defined as 'correct'. The default of 60 is proven to be a good estimate for sequences generated by Novor.0.0:∞
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++algorithmParameter section for the suitability calculation algorithm
no_rerankfalse Use this flag if you want to disable re-ranking. Cases, where a de novo peptide scores just higher than the database peptide, are overlooked and counted as a de novo hit. This might underestimate the database quality.true, false
reranking_cutoff_percentile0.01 Swap a top-scoring deNovo hit with a lower scoring DB hit if their xcorr score difference is in the given percentile of all score differences between the first two decoy hits of a PSM. The lower the value the lower the decoy cut-off will be. Therefore it will be harder for a lower scoring DB hit to be re-ranked to the top.0.0:1.0
FDR0.01 Filter peptide hits based on this q-value. (e.g., 0.05 = 5 % FDR)0.0:1.0
number_of_subsampled_runs1 Controls how many runs should be done for calculating corrected suitability. (0 : number of runs will be estimated automaticly) ATTENTION: For each run a seperate ID-search is performed. This can result in some serious run time.0:∞
keep_search_filesfalse Set this flag if you wish to keep the files used by and produced by the internal ID search.true, false
disable_correctionfalse Set this flag to disable the calculation of the corrected suitability.true, false
forcefalse Set this flag to enforce re-ranking when no cross correlation score is present. For re-ranking the default score found at each peptide hit is used. Use with care!true, false