OpenMS
2.7.0
|
This class holds the functionality of calculating the database suitability. More...
#include <OpenMS/QC/DBSuitability.h>
Classes | |
struct | SuitabilityData |
struct to store results More... | |
Public Member Functions | |
DBSuitability () | |
~DBSuitability ()=default | |
Destructor. More... | |
void | compute (std::vector< PeptideIdentification > pep_ids) |
Computes suitability of a database used to search a mzML. More... | |
const std::vector< SuitabilityData > & | getResults () const |
Returns results calculated by this metric. More... | |
Public Member Functions inherited from DefaultParamHandler | |
DefaultParamHandler (const String &name) | |
Constructor with name that is displayed in error messages. More... | |
DefaultParamHandler (const DefaultParamHandler &rhs) | |
Copy constructor. More... | |
virtual | ~DefaultParamHandler () |
Destructor. More... | |
virtual DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
Assignment operator. More... | |
virtual bool | operator== (const DefaultParamHandler &rhs) const |
Equality operator. More... | |
void | setParameters (const Param ¶m) |
Sets the parameters. More... | |
const Param & | getParameters () const |
Non-mutable access to the parameters. More... | |
const Param & | getDefaults () const |
Non-mutable access to the default parameters. More... | |
const String & | getName () const |
Non-mutable access to the name. More... | |
void | setName (const String &name) |
Mutable access to the name. More... | |
const std::vector< String > & | getSubsections () const |
Non-mutable access to the registered subsections. More... | |
Private Member Functions | |
double | getDecoyDiff_ (const PeptideIdentification &pep_id) |
Calculates the xcorr difference between the top two hits marked as decoy. More... | |
double | getDecoyCutOff_ (const std::vector< PeptideIdentification > &pep_ids, double reranking_cutoff_percentile) |
Calculates a xcorr cut-off based on decoy hits. More... | |
bool | isNovoHit_ (const PeptideHit &hit) |
Tests if a PeptideHit is considered a deNovo hit. More... | |
bool | passesFDR_ (const PeptideHit &hit, double FDR) |
Tests if a PeptideHit has a lower q-value than the given FDR threshold, i.e. passes FDR. More... | |
Private Attributes | |
std::vector< SuitabilityData > | results_ |
result vector More... | |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &prefix="") |
Writes all parameters to meta values. More... | |
Protected Member Functions inherited from DefaultParamHandler | |
virtual void | updateMembers_ () |
This method is used to update extra member variables at the end of the setParameters() method. More... | |
void | defaultsToParam_ () |
Updates the parameters after the defaults have been set in the constructor. More... | |
Protected Attributes inherited from DefaultParamHandler | |
Param | param_ |
Container for current parameters. More... | |
Param | defaults_ |
Container for default parameters. This member should be filled in the constructor of derived classes! More... | |
std::vector< String > | subsections_ |
Container for registered subsections. This member should be filled in the constructor of derived classes! More... | |
String | error_name_ |
Name that is displayed in error messages during the parameter checking. More... | |
bool | check_defaults_ |
If this member is set to false no checking if parameters in done;. More... | |
bool | warn_empty_defaults_ |
If this member is set to false no warning is emitted when defaults are empty;. More... | |
This class holds the functionality of calculating the database suitability.
To calculate the suitability of a database for a specific mzML for identification search, it is vital to perform a combined deNovo+database identification search. Meaning that the database should be appended with an additional entry derived from concatenated deNovo sequences from said mzML. Currently only Comet search is supported.
This class will calculate q-values by itself and will throw an error if any q-value calculation was done beforehand.
The algorithm parameters can be set using setParams().
Allows for multiple usage of the compute function. The result of each call is stored internally in a vector. Therefore old results will not be overridden by a new call. This vector then can be returned using getResults().
This class serves as the library representation of DatabaseSuitability
struct OpenMS::DBSuitability::SuitabilityData |
struct to store results
Class Members | ||
---|---|---|
double | cut_off |
the cut-off that was used to determine when a score difference was "small enough" this is normalized by mw |
Size | num_interest | number of times a deNovo hit scored on top of a database hit |
Size | num_re_ranked |
number of times a deNovo hit scored on top of a database hit, but their score difference was small enough, that it was still counted as a database hit |
Size | num_top_db | number of times the top hit is considered to be a database hit |
Size | num_top_novo | number of times the top hit is considered to be a deNovo hit |
double | suitability |
the suitability of the database used for identification search, calculated with: #db_hits / (#db_hits + #deNovo_hit) can reach from 0 -> the database was not at all suited to 1 -> the perfect database was used Preliminary tests have shown that databases of the right organism or close related organisms score around 0.9 to 0.95, organisms from the same class can still score around 0.8, organisms from the same phylum score around 0.5 to 0.6 and after that it quickly falls to suitabilities of 0.15 or even 0.05. Note that these test were only performed for one mzML and your results might differ. |
DBSuitability | ( | ) |
Constructor Settings are initialized with their default values: no_rerank = false, reranking_cutoff_percentile = 1, FDR = 0.01
|
default |
Destructor.
void compute | ( | std::vector< PeptideIdentification > | pep_ids | ) |
Computes suitability of a database used to search a mzML.
Counts top deNovo and top database hits. The ratio of db hits vs all hits yields the suitability. To re-rank cases, where a de novo peptide scores just higher than the database peptide, a decoy cut-off is calculated. This functionality can be turned off. This will result in an underestimated suitability, but it can solve problems like different search engines or to few decoy hits.
Parameters can be set using the functionality of DefaultParamHandler. Parameters are: no_rerank - re-ranking can be turned off with this reranking_cutoff_percentile - percentile that determines which cut-off will be returned FDR - q-value that should be filtered for Preliminary tests have shown that database suitability is rather stable across common FDR thresholds from 0 - 5 %
Since q-values need to be calculated the identifications are taken by copy.
Result is appended to the result member. This allows for multiple usage.
pep_ids | vector containing pepIDs with target/decoy annotation coming from a deNovo+database identification search (currently only Comet-support) without FDR vector is modified internally, and is thus copied |
MissingInformation | if no target/decoy annotation is found |
MissingInformation | if no xcorr is found |
Precondition | if a q-value is found in the input |
|
private |
Calculates a xcorr cut-off based on decoy hits.
Decoy differences of all N pepIDs are calculated. The (1-reranking_cutoff_percentile)*N highest one is returned. It is assumed that this difference accounts for 'reranking_cutoff_percentile' of the re-ranking cases.
pep_ids | vector containing the pepIDs |
reranking_cutoff_percentile | percentile that determines which cut-off will be returned |
IllegalArgument | if reranking_cutoff_percentile isn't in range [0,1] |
IllegalArgument | if reranking_cutoff_percentile is too low for a decoy cut-off to be calculated |
MissingInformation | if no more than 20 % of the peptide IDs have two decoys in their top ten peptide hits |
|
private |
Calculates the xcorr difference between the top two hits marked as decoy.
Only searches the top ten hits for two decoys. If there aren't two decoys, DBL_MAX is returned.
pep_id | pepID from where the decoy difference will be calculated |
MissingInformation | if no target/decoy annotation is found |
MissingInformation | if no xcorr is found |
const std::vector<SuitabilityData>& getResults | ( | ) | const |
Returns results calculated by this metric.
The returned vector contains one DBSuitabilityData object for each time compute was called. Each of these objects contains the suitability information that was extracted from the identifications used for the corresponding call of compute.
|
private |
Tests if a PeptideHit is considered a deNovo hit.
To test this the function looks into the protein accessions. If only the deNovo protein is found, 'true' is returned. If at least one database protein is found, 'false' is returned.
hit | PepHit in question |
|
private |
Tests if a PeptideHit has a lower q-value than the given FDR threshold, i.e. passes FDR.
Q-value is searched at score and at meta-value level.
hit | PepHit in question |
FDR | FDR threshold to check against |
|
private |
result vector