OpenMS  2.7.0
Classes | Public Member Functions | Private Member Functions | Private Attributes | List of all members
DBSuitability Class Reference

This class holds the functionality of calculating the database suitability. More...

#include <OpenMS/QC/DBSuitability.h>

Inheritance diagram for DBSuitability:
[legend]
Collaboration diagram for DBSuitability:
[legend]

Classes

struct  SuitabilityData
 struct to store results More...
 

Public Member Functions

 DBSuitability ()
 
 ~DBSuitability ()=default
 Destructor. More...
 
void compute (std::vector< PeptideIdentification > pep_ids)
 Computes suitability of a database used to search a mzML. More...
 
const std::vector< SuitabilityData > & getResults () const
 Returns results calculated by this metric. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
virtual DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Private Member Functions

double getDecoyDiff_ (const PeptideIdentification &pep_id)
 Calculates the xcorr difference between the top two hits marked as decoy. More...
 
double getDecoyCutOff_ (const std::vector< PeptideIdentification > &pep_ids, double reranking_cutoff_percentile)
 Calculates a xcorr cut-off based on decoy hits. More...
 
bool isNovoHit_ (const PeptideHit &hit)
 Tests if a PeptideHit is considered a deNovo hit. More...
 
bool passesFDR_ (const PeptideHit &hit, double FDR)
 Tests if a PeptideHit has a lower q-value than the given FDR threshold, i.e. passes FDR. More...
 

Private Attributes

std::vector< SuitabilityDataresults_
 result vector More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

This class holds the functionality of calculating the database suitability.

To calculate the suitability of a database for a specific mzML for identification search, it is vital to perform a combined deNovo+database identification search. Meaning that the database should be appended with an additional entry derived from concatenated deNovo sequences from said mzML. Currently only Comet search is supported.

This class will calculate q-values by itself and will throw an error if any q-value calculation was done beforehand.

The algorithm parameters can be set using setParams().

Allows for multiple usage of the compute function. The result of each call is stored internally in a vector. Therefore old results will not be overridden by a new call. This vector then can be returned using getResults().

This class serves as the library representation of DatabaseSuitability


Class Documentation

◆ OpenMS::DBSuitability::SuitabilityData

struct OpenMS::DBSuitability::SuitabilityData

struct to store results

Collaboration diagram for DBSuitability::SuitabilityData:
[legend]
Class Members
double cut_off

the cut-off that was used to determine when a score difference was "small enough" this is normalized by mw

Size num_interest number of times a deNovo hit scored on top of a database hit
Size num_re_ranked

number of times a deNovo hit scored on top of a database hit, but their score difference was small enough, that it was still counted as a database hit

Size num_top_db number of times the top hit is considered to be a database hit
Size num_top_novo number of times the top hit is considered to be a deNovo hit
double suitability

the suitability of the database used for identification search, calculated with: #db_hits / (#db_hits + #deNovo_hit) can reach from 0 -> the database was not at all suited to 1 -> the perfect database was used

Preliminary tests have shown that databases of the right organism or close related organisms score around 0.9 to 0.95, organisms from the same class can still score around 0.8, organisms from the same phylum score around 0.5 to 0.6 and after that it quickly falls to suitabilities of 0.15 or even 0.05. Note that these test were only performed for one mzML and your results might differ.

Constructor & Destructor Documentation

◆ DBSuitability()

Constructor Settings are initialized with their default values: no_rerank = false, reranking_cutoff_percentile = 1, FDR = 0.01

◆ ~DBSuitability()

~DBSuitability ( )
default

Destructor.

Member Function Documentation

◆ compute()

void compute ( std::vector< PeptideIdentification pep_ids)

Computes suitability of a database used to search a mzML.

Counts top deNovo and top database hits. The ratio of db hits vs all hits yields the suitability. To re-rank cases, where a de novo peptide scores just higher than the database peptide, a decoy cut-off is calculated. This functionality can be turned off. This will result in an underestimated suitability, but it can solve problems like different search engines or to few decoy hits.

Parameters can be set using the functionality of DefaultParamHandler. Parameters are: no_rerank - re-ranking can be turned off with this reranking_cutoff_percentile - percentile that determines which cut-off will be returned FDR - q-value that should be filtered for Preliminary tests have shown that database suitability is rather stable across common FDR thresholds from 0 - 5 %

Since q-values need to be calculated the identifications are taken by copy.

Result is appended to the result member. This allows for multiple usage.

Parameters
pep_idsvector containing pepIDs with target/decoy annotation coming from a deNovo+database identification search (currently only Comet-support) without FDR vector is modified internally, and is thus copied
Exceptions
MissingInformationif no target/decoy annotation is found
MissingInformationif no xcorr is found
Preconditionif a q-value is found in the input

◆ getDecoyCutOff_()

double getDecoyCutOff_ ( const std::vector< PeptideIdentification > &  pep_ids,
double  reranking_cutoff_percentile 
)
private

Calculates a xcorr cut-off based on decoy hits.

Decoy differences of all N pepIDs are calculated. The (1-reranking_cutoff_percentile)*N highest one is returned. It is assumed that this difference accounts for 'reranking_cutoff_percentile' of the re-ranking cases.

Parameters
pep_idsvector containing the pepIDs
reranking_cutoff_percentilepercentile that determines which cut-off will be returned
Returns
xcorr cut-off
Exceptions
IllegalArgumentif reranking_cutoff_percentile isn't in range [0,1]
IllegalArgumentif reranking_cutoff_percentile is too low for a decoy cut-off to be calculated
MissingInformationif no more than 20 % of the peptide IDs have two decoys in their top ten peptide hits

◆ getDecoyDiff_()

double getDecoyDiff_ ( const PeptideIdentification pep_id)
private

Calculates the xcorr difference between the top two hits marked as decoy.

Only searches the top ten hits for two decoys. If there aren't two decoys, DBL_MAX is returned.

Parameters
pep_idpepID from where the decoy difference will be calculated
Returns
xcorr difference
Exceptions
MissingInformationif no target/decoy annotation is found
MissingInformationif no xcorr is found

◆ getResults()

const std::vector<SuitabilityData>& getResults ( ) const

Returns results calculated by this metric.

The returned vector contains one DBSuitabilityData object for each time compute was called. Each of these objects contains the suitability information that was extracted from the identifications used for the corresponding call of compute.

Returns
DBSuitabilityData objects in a vector

◆ isNovoHit_()

bool isNovoHit_ ( const PeptideHit hit)
private

Tests if a PeptideHit is considered a deNovo hit.

To test this the function looks into the protein accessions. If only the deNovo protein is found, 'true' is returned. If at least one database protein is found, 'false' is returned.

Parameters
hitPepHit in question
Returns
true/false

◆ passesFDR_()

bool passesFDR_ ( const PeptideHit hit,
double  FDR 
)
private

Tests if a PeptideHit has a lower q-value than the given FDR threshold, i.e. passes FDR.

Q-value is searched at score and at meta-value level.

Parameters
hitPepHit in question
FDRFDR threshold to check against
Returns
true/false

Member Data Documentation

◆ results_

std::vector<SuitabilityData> results_
private

result vector