OpenMS
BasicProteinInferenceAlgorithm Class Reference

Algorithm class that implements simple protein inference by aggregation of peptide scores. It has multiple parameter options like the aggregation method, when to distinguish peptidoforms, and if you want to use shared peptides ("use_shared_peptides"). First, the best PSM per spectrum is used, then only the best PSM per peptidoform is aggregated. Peptidoforms can optionally be distinguished via the treat_X_separate parameters: More...

#include <OpenMS/ANALYSIS/ID/BasicProteinInferenceAlgorithm.h>

Inheritance diagram for BasicProteinInferenceAlgorithm:
[legend]
Collaboration diagram for BasicProteinInferenceAlgorithm:
[legend]

Public Types

enum class  AggregationMethod { PROD , SUM , BEST }
 The aggregation method. More...
 
typedef std::unordered_map< std::string, std::map< Int, PeptideHit * > > SequenceToChargeToPSM
 
- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 

Public Member Functions

 BasicProteinInferenceAlgorithm ()
 Default constructor. More...
 
void run (std::vector< PeptideIdentification > &pep_ids, std::vector< ProteinIdentification > &prot_ids) const
 
void run (std::vector< PeptideIdentification > &pep_ids, ProteinIdentification &prot_id) const
 
void run (ConsensusMap &cmap, ProteinIdentification &prot_id, bool include_unassigned) const
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor. More...
 
virtual ~ProgressLogger ()
 Destructor. More...
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor. More...
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator. More...
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE! More...
 
LogType getLogType () const
 Returns the type of progress log being used. More...
 
void setLogger (ProgressLoggerImpl *logger)
 Sets the logger to be used for progress logging. More...
 
void startProgress (SignedSize begin, SignedSize end, const String &label) const
 Initializes the progress display. More...
 
void setProgress (SignedSize value) const
 Sets the current progress. More...
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end) More...
 

Private Types

typedef double(* fptr) (double, double)
 get lambda function to aggregate scores More...
 

Private Member Functions

void processRun_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size >> &acc_to_protein_hitP_and_count, SequenceToChargeToPSM &best_pep, ProteinIdentification &prot_run, std::vector< PeptideIdentification > &pep_ids) const
 Performs simple aggregation-based inference on one protein run. More...
 
void aggregatePeptideScores_ (SequenceToChargeToPSM &best_pep, std::vector< PeptideIdentification > &pep_ids, const String &overall_score_type, bool higher_better, const std::string &run_id) const
 fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings) More...
 
void updateProteinScores_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size >> &acc_to_protein_hitP_and_count, const SequenceToChargeToPSM &best_pep, bool pep_scores, bool higher_better) const
 aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep More...
 
AggregationMethod aggFromString_ (const std::string &method_string) const
 get the AggregationMethod enum from a method_string More...
 
void checkCompat_ (const String &score_type, const AggregationMethod &aggregation_method) const
 
void checkCompat_ (const IDScoreSwitcherAlgorithm::ScoreType &score_type, const AggregationMethod &aggregation_method) const
 
double getInitScoreForAggMethod_ (const AggregationMethod &aggregation_method, bool higher_better) const
 get the initial score value based on the chosen aggregation_method, higher_better is needed for "best" score More...
 
fptr aggFunFromEnum_ (const BasicProteinInferenceAlgorithm::AggregationMethod &agg_method, bool higher_better) const
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

Algorithm class that implements simple protein inference by aggregation of peptide scores. It has multiple parameter options like the aggregation method, when to distinguish peptidoforms, and if you want to use shared peptides ("use_shared_peptides"). First, the best PSM per spectrum is used, then only the best PSM per peptidoform is aggregated. Peptidoforms can optionally be distinguished via the treat_X_separate parameters:

  • Modifications (modified sequence string)
  • Charge states The algorithm assumes posteriors or posterior error probabilities and converts to posteriors initially. Possible aggregation methods that can be set via the parameter "aggregation_method" are:
  • "maximum" (default)
  • "sum"
  • "product" (ignoring zeroes) Annotation of the number of peptides used for aggregation can be disabled (see parameters). Supports multiple runs but goes through them one by one iterating over the full PeptideIdentification vector.

Member Typedef Documentation

◆ fptr

typedef double(* fptr) (double, double)
private

get lambda function to aggregate scores

◆ SequenceToChargeToPSM

typedef std::unordered_map<std::string, std::map<Int, PeptideHit*> > SequenceToChargeToPSM

Member Enumeration Documentation

◆ AggregationMethod

enum AggregationMethod
strong

The aggregation method.

Enumerator
PROD 

aggregate by product (ignore zeroes)

SUM 

aggregate by summing

BEST 

aggregate by maximum/minimum

Constructor & Destructor Documentation

◆ BasicProteinInferenceAlgorithm()

Default constructor.

Member Function Documentation

◆ aggFromString_()

AggregationMethod aggFromString_ ( const std::string &  method_string) const
private

get the AggregationMethod enum from a method_string

◆ aggFunFromEnum_()

fptr aggFunFromEnum_ ( const BasicProteinInferenceAlgorithm::AggregationMethod agg_method,
bool  higher_better 
) const
private

◆ aggregatePeptideScores_()

void aggregatePeptideScores_ ( SequenceToChargeToPSM best_pep,
std::vector< PeptideIdentification > &  pep_ids,
const String overall_score_type,
bool  higher_better,
const std::string &  run_id 
) const
private

fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings)

Parameters
best_pep(mod.) sequence to charge to pointer of best PSM (PeptideHit*)
pep_idsthe spectra with PSMs
overall_score_typethe pre-determined type name to raise an error if mixed types occur
higher_betterif for this score type higher is better
run_idonly process peptides associated with this run_id (e.g. proteinID run getIdentifier())

◆ checkCompat_() [1/2]

void checkCompat_ ( const IDScoreSwitcherAlgorithm::ScoreType score_type,
const AggregationMethod aggregation_method 
) const
private

check if a score_type is compatible to the chosen aggregation_method I.e. only probabilities can be used for multiplication

◆ checkCompat_() [2/2]

void checkCompat_ ( const String score_type,
const AggregationMethod aggregation_method 
) const
private

check if a score_name is compatible to the chosen aggregation_method I.e. only probabilities can be used for multiplication

◆ getInitScoreForAggMethod_()

double getInitScoreForAggMethod_ ( const AggregationMethod aggregation_method,
bool  higher_better 
) const
private

get the initial score value based on the chosen aggregation_method, higher_better is needed for "best" score

◆ processRun_()

void processRun_ ( std::unordered_map< std::string, std::pair< ProteinHit *, Size >> &  acc_to_protein_hitP_and_count,
SequenceToChargeToPSM best_pep,
ProteinIdentification prot_run,
std::vector< PeptideIdentification > &  pep_ids 
) const
private

Performs simple aggregation-based inference on one protein run.

Parameters
acc_to_protein_hitP_and_countMaps Accessions to a pair of ProteinHit pointers and number of peptidoforms encountered
best_pepMaps (un)modified peptide sequence to a map from charge (0 when unconsidered) to the best PeptideHit pointer
prot_runThe current run to process
pep_idsPeptides for the current run to process

◆ run() [1/3]

void run ( ConsensusMap cmap,
ProteinIdentification prot_id,
bool  include_unassigned 
) const

Performs the actual inference based on best psm per peptide in cmap for proteins from prot_id. Ideally prot_id is the union of proteins in all runs of cmap. Sorts and filters psms in pep_ids. Annotates results in prot_id. Associations (via getIdentifier) for peptides to protein runs ARE IGNORED and all pep_ids used.

Todo:
allow checking matching IDs

◆ run() [2/3]

void run ( std::vector< PeptideIdentification > &  pep_ids,
ProteinIdentification prot_id 
) const

Performs the actual inference based on best psm per peptide in pep_ids per run in prot_id. Sorts and filters psms in pep_ids. Annotates results in prot_id. Associations (via getIdentifier) for peptides to protein runs need to be correct.

◆ run() [3/3]

void run ( std::vector< PeptideIdentification > &  pep_ids,
std::vector< ProteinIdentification > &  prot_ids 
) const

Performs the actual inference based on best psm per peptide in pep_ids per run in prot_ids. Sorts and filters psms in pep_ids. Annotates results in prot_ids. Associations (via getIdentifier) for peptides to protein runs need to be correct.

◆ updateProteinScores_()

void updateProteinScores_ ( std::unordered_map< std::string, std::pair< ProteinHit *, Size >> &  acc_to_protein_hitP_and_count,
const SequenceToChargeToPSM best_pep,
bool  pep_scores,
bool  higher_better 
) const
private

aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep

Parameters
acc_to_protein_hitP_and_countthe results to fill
best_pepbest psm per peptide to read the score
pep_scoresif the score is a posterior error probability -> Auto-converts to posterior probability
higher_betterif for the score higher is better. Assume score is unconverted.