OpenMS  2.4.0
Classes | Public Member Functions | Private Member Functions | List of all members
FalseDiscoveryRate Class Reference

Calculates an FDR from identifications. More...

#include <OpenMS/ANALYSIS/ID/FalseDiscoveryRate.h>

Inheritance diagram for FalseDiscoveryRate:
DefaultParamHandler

Classes

struct  FalseFunctor
 
struct  GetLabelFunctor
 
struct  TrueFunctor
 

Public Member Functions

 FalseDiscoveryRate ()
 Default constructor. More...
 
void apply (std::vector< PeptideIdentification > &fwd_ids, std::vector< PeptideIdentification > &rev_ids) const
 Calculates the FDR of two runs, a forward run and a decoy run on peptide level. More...
 
void apply (std::vector< PeptideIdentification > &id) const
 Calculates the FDR of one run from a concatenated sequence db search. More...
 
void apply (std::vector< ProteinIdentification > &fwd_ids, std::vector< ProteinIdentification > &rev_ids) const
 Calculates the FDR of two runs, a forward run and decoy run on protein level. More...
 
void apply (std::vector< ProteinIdentification > &ids) const
 Calculate the FDR of one run from a concatenated sequence db search. More...
 
void applyEstimated (std::vector< ProteinIdentification > &ids) const
 Calculate the FDR based on PEPs pr PPs (if present) and modifies the IDs inplace. More...
 
double applyEvaluateProteinIDs (const std::vector< ProteinIdentification > &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2)
 Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives). More...
 
double applyEvaluateProteinIDs (const ProteinIdentification &ids, double pepCutoff=1.0, UInt fpCutoff=50, double diffWeight=0.2)
 
void applyBasic (std::vector< PeptideIdentification > &ids)
 simpler reimplemetation of the apply function above. More...
 
void applyBasic (ProteinIdentification &id, bool groups_too=true)
 
double rocN (const std::vector< PeptideIdentification > &ids, Size fp_cutoff) const
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
virtual DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Private Member Functions

 FalseDiscoveryRate (const FalseDiscoveryRate &)
 Not implemented. More...
 
FalseDiscoveryRateoperator= (const FalseDiscoveryRate &)
 Not implemented. More...
 
void getScores_ (std::vector< std::pair< double, bool >> &scores_labels, const ProteinIdentification &id) const
 
void getScores_ (std::vector< std::pair< double, bool >> &scores_labels, const std::vector< ProteinIdentification::ProteinGroup > &grps, const std::unordered_set< std::string > &decoy_accs) const
 
void getScores_ (std::vector< std::pair< double, bool >> &scores_labels, const std::vector< PeptideIdentification > &ids, bool all_hits, int charge, String identifier) const
 
void getScores_ (std::vector< std::pair< double, bool >> &scores_labels, const std::vector< PeptideIdentification > &targets, const std::vector< PeptideIdentification > &decoys, bool all_hits, int charge, const String &identifier) const
 
void setScores_ (const std::map< double, double > &scores_to_FDR, std::vector< PeptideIdentification > &id, const std::string &score_type, bool higher_better) const
 
template<typename IDType >
void setScores_ (const std::map< double, double > &scores_to_FDR, IDType &id, const std::string &score_type, bool higher_better) const
 
void setScores_ (const std::map< double, double > &scores_to_FDR, std::vector< ProteinIdentification::ProteinGroup > &grps, const std::string &score_type, bool higher_better) const
 
template<typename IDType >
void checkTDAnnotation_ (const IDType &id) const
 
template<typename HitType >
std::pair< double, bool > getScoreLabel_ (const HitType &hit, std::function< bool(const HitType &)> fun) const
 
void calculateFDRs_ (Map< double, double > &score_to_fdr, std::vector< double > &target_scores, std::vector< double > &decoy_scores, bool q_value, bool higher_score_better) const
 calculates the fdr given two vectors of scores and fills a map for lookup in scores_to_FDR More...
 
void calculateEstimatedQVal_ (std::map< double, double > &scores_to_FDR, std::vector< std::pair< double, bool >> &scores_labels, bool higher_score_better) const
 
void calculateFDRBasic_ (std::map< double, double > &scores_to_FDR, std::vector< std::pair< double, bool >> &scores_labels, bool qvalue, bool higher_score_better)
 calculates the FDR with a basic and faster algorithm More...
 
double diffEstimatedEmpirical_ (const std::vector< std::pair< double, bool >> &scores_labels, double pepCutoff=1.0)
 calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results. More...
 
double rocN_ (std::vector< std::pair< double, bool >> const &scores_labels, Size fpCutoff=50) const
 
double trapezoidal_area_xEqy (double exp1, double exp2, double act1, double act2) const
 
double trapezoidal_area (double x1, double x2, double y1, double y2) const
 calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC More...
 

Additional Inherited Members

- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Detailed Description

Calculates an FDR from identifications.

Either two runs of forward and decoy database identification or one run containing both (with marks) can be used to annotate each of the peptide hits with a FDR.

Also q-values can be reported instead of p-values. q-values are basically only adjusted p-values, also ranging from 0 to 1, with lower values being preferable. When looking at the list of hits ordered by q-values, then a hit with q-value of x means that there is an x*100 percent chance that all hits with a q-value <= x are a false positive hit.

Todo:
implement combined searches properly (Andreas)
Improvement:
implement charge state separated fdr/q-values (Andreas)
Parameters of this class are:

NameTypeDefaultRestrictionsDescription
no_qvalues stringfalse true, falseIf 'true' strict FDRs will be calculated instead of q-values (the default)
use_all_hits stringfalse true, falseIf 'true' not only the first hit, but all are used (peptides only)
split_charge_variants stringfalse true, falseIf 'true' charge variants are treated separately (for peptides of combined target/decoy searches only).
treat_runs_separately stringfalse true, falseIf 'true' different search runs are treated separately (for peptides of combined target/decoy searches only).
add_decoy_peptides stringfalse true, falseIf 'true' decoy peptides will be written to output file, too. The q-value is set to the closest target score.
add_decoy_proteins stringfalse true, falseIf 'true' decoy proteins will be written to output file, too. The q-value is set to the closest target score.

Note:

Constructor & Destructor Documentation

◆ FalseDiscoveryRate() [1/2]

Default constructor.

◆ FalseDiscoveryRate() [2/2]

FalseDiscoveryRate ( const FalseDiscoveryRate )
private

Not implemented.

Member Function Documentation

◆ apply() [1/4]

void apply ( std::vector< PeptideIdentification > &  fwd_ids,
std::vector< PeptideIdentification > &  rev_ids 
) const

Calculates the FDR of two runs, a forward run and a decoy run on peptide level.

Parameters
fwd_idsforward peptide identifications
rev_idsreverse peptide identifications

Referenced by RNPxlSearch::main_().

◆ apply() [2/4]

void apply ( std::vector< PeptideIdentification > &  id) const

Calculates the FDR of one run from a concatenated sequence db search.

Parameters
idpeptide identifications, containing target and decoy hits

◆ apply() [3/4]

void apply ( std::vector< ProteinIdentification > &  fwd_ids,
std::vector< ProteinIdentification > &  rev_ids 
) const

Calculates the FDR of two runs, a forward run and decoy run on protein level.

Parameters
fwd_idsforward protein identifications
rev_idsreverse protein identifications

◆ apply() [4/4]

void apply ( std::vector< ProteinIdentification > &  ids) const

Calculate the FDR of one run from a concatenated sequence db search.

Parameters
idsprotein identifications, containing target and decoy hits

◆ applyBasic() [1/2]

void applyBasic ( std::vector< PeptideIdentification > &  ids)

simpler reimplemetation of the apply function above.

Referenced by TOPPBayesianProteinInference::main_(), and UTILProteomicsLFQ::main_().

◆ applyBasic() [2/2]

void applyBasic ( ProteinIdentification id,
bool  groups_too = true 
)

◆ applyEstimated()

void applyEstimated ( std::vector< ProteinIdentification > &  ids) const

Calculate the FDR based on PEPs pr PPs (if present) and modifies the IDs inplace.

Parameters
idsprotein identifications, containing PEP scores (not necessarily) annotated with target decoy.

◆ applyEvaluateProteinIDs() [1/2]

double applyEvaluateProteinIDs ( const std::vector< ProteinIdentification > &  ids,
double  pepCutoff = 1.0,
UInt  fpCutoff = 50,
double  diffWeight = 0.2 
)

Calculate a linear combination of the area of the difference in estimated vs. empirical (TD) FDR and the ROC-N value (AUC up to first N false positives).

Parameters
idsprotein identifications, containing PEP scores annotated with target decoy. If vector, only first will be evaluated-
pepCutoffup to which PEP should the differences between the two FDRs be calculated
fpCutoffup to which nr. of false positives should the target-decoy AUC be evaluated
diffWeightwhich weight should the difference get. The ROC-N value gets 1 - this weight.

◆ applyEvaluateProteinIDs() [2/2]

double applyEvaluateProteinIDs ( const ProteinIdentification ids,
double  pepCutoff = 1.0,
UInt  fpCutoff = 50,
double  diffWeight = 0.2 
)

◆ calculateEstimatedQVal_()

void calculateEstimatedQVal_ ( std::map< double, double > &  scores_to_FDR,
std::vector< std::pair< double, bool >> &  scores_labels,
bool  higher_score_better 
) const
private

calculates an estimated FDR (based on P(E)Ps) given a vector of score value pairs and fills a map for lookup in scores_to_FDR

◆ calculateFDRBasic_()

void calculateFDRBasic_ ( std::map< double, double > &  scores_to_FDR,
std::vector< std::pair< double, bool >> &  scores_labels,
bool  qvalue,
bool  higher_score_better 
)
private

calculates the FDR with a basic and faster algorithm

◆ calculateFDRs_()

void calculateFDRs_ ( Map< double, double > &  score_to_fdr,
std::vector< double > &  target_scores,
std::vector< double > &  decoy_scores,
bool  q_value,
bool  higher_score_better 
) const
private

calculates the fdr given two vectors of scores and fills a map for lookup in scores_to_FDR

◆ checkTDAnnotation_()

void checkTDAnnotation_ ( const IDType &  id) const
inlineprivate

◆ diffEstimatedEmpirical_()

double diffEstimatedEmpirical_ ( const std::vector< std::pair< double, bool >> &  scores_labels,
double  pepCutoff = 1.0 
)
private

calculates the area of the difference between estimated and empirical FDR on the fly. Does not store results.

◆ getScoreLabel_()

std::pair<double,bool> getScoreLabel_ ( const HitType &  hit,
std::function< bool(const HitType &)>  fun 
) const
inlineprivate

◆ getScores_() [1/4]

void getScores_ ( std::vector< std::pair< double, bool >> &  scores_labels,
const ProteinIdentification id 
) const
private

◆ getScores_() [2/4]

void getScores_ ( std::vector< std::pair< double, bool >> &  scores_labels,
const std::vector< ProteinIdentification::ProteinGroup > &  grps,
const std::unordered_set< std::string > &  decoy_accs 
) const
private

◆ getScores_() [3/4]

void getScores_ ( std::vector< std::pair< double, bool >> &  scores_labels,
const std::vector< PeptideIdentification > &  ids,
bool  all_hits,
int  charge,
String  identifier 
) const
private

◆ getScores_() [4/4]

void getScores_ ( std::vector< std::pair< double, bool >> &  scores_labels,
const std::vector< PeptideIdentification > &  targets,
const std::vector< PeptideIdentification > &  decoys,
bool  all_hits,
int  charge,
const String identifier 
) const
private

◆ operator=()

FalseDiscoveryRate& operator= ( const FalseDiscoveryRate )
private

Not implemented.

◆ rocN()

double rocN ( const std::vector< PeptideIdentification > &  ids,
Size  fp_cutoff 
) const

calculates the auc until the first fp_cutoff False positive pep IDs (currently only takes all runs together) if fp_cutoff = 0, it will calculate the full AUC

◆ rocN_()

double rocN_ ( std::vector< std::pair< double, bool >> const &  scores_labels,
Size  fpCutoff = 50 
) const
private

calculates AUC of empirical FDR up to the first fpCutoff false positives on the fly. Does not store results. use e.g. fpCutoff = scores_labels.size() for complete AUC

◆ setScores_() [1/3]

void setScores_ ( const std::map< double, double > &  scores_to_FDR,
std::vector< PeptideIdentification > &  id,
const std::string &  score_type,
bool  higher_better 
) const
private

◆ setScores_() [2/3]

void setScores_ ( const std::map< double, double > &  scores_to_FDR,
IDType &  id,
const std::string &  score_type,
bool  higher_better 
) const
inlineprivate

◆ setScores_() [3/3]

void setScores_ ( const std::map< double, double > &  scores_to_FDR,
std::vector< ProteinIdentification::ProteinGroup > &  grps,
const std::string &  score_type,
bool  higher_better 
) const
private

◆ trapezoidal_area()

double trapezoidal_area ( double  x1,
double  x2,
double  y1,
double  y2 
) const
private

calculates the trapezoidal area for a trapezoid with a flat horizontal base e.g. for an AUC

◆ trapezoidal_area_xEqy()

double trapezoidal_area_xEqy ( double  exp1,
double  exp2,
double  act1,
double  act2 
) const
private

calculates the error area around the x=x line between two consecutive values of expected and actual i.e. it assumes exp2 > exp1