OpenMS
Loading...
Searching...
No Matches
BasicProteinInferenceAlgorithm Class Reference

Simple protein inference by aggregation of per-peptide PSM scores. More...

#include <OpenMS/ANALYSIS/ID/BasicProteinInferenceAlgorithm.h>

Inheritance diagram for BasicProteinInferenceAlgorithm:
[legend]
Collaboration diagram for BasicProteinInferenceAlgorithm:
[legend]

Public Types

enum class  AggregationMethod { PROD , SUM , BEST }
 The aggregation method. More...
 
typedef std::unordered_map< std::string, std::map< Int, PeptideHit * > > SequenceToChargeToPSM
 
- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 

Public Member Functions

 BasicProteinInferenceAlgorithm ()
 Default constructor.
 
void run (PeptideIdentificationList &pep_ids, std::vector< ProteinIdentification > &prot_ids) const
 Run inference per protein-ID run, iterating each prot_ids entry separately.
 
void run (PeptideIdentificationList &pep_ids, ProteinIdentification &prot_id) const
 Run inference for a single protein-ID run.
 
void run (ConsensusMap &cmap, ProteinIdentification &prot_id, bool include_unassigned) const
 Run inference over a ConsensusMap, treating every peptide identification it carries as evidence for the proteins in prot_id.
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const std::string &name)
 Constructor with name that is displayed in error messages.
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor.
 
virtual ~DefaultParamHandler ()
 Destructor.
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator.
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator.
 
void setParameters (const Param &param)
 Sets the parameters.
 
const ParamgetParameters () const
 Non-mutable access to the parameters.
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters.
 
const std::string & getName () const
 Non-mutable access to the name.
 
void setName (const std::string &name)
 Mutable access to the name.
 
const std::vector< std::string > & getSubsections () const
 Non-mutable access to the registered subsections.
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor.
 
virtual ~ProgressLogger ()
 Destructor.
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor.
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator.
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE!
 
LogType getLogType () const
 Returns the type of progress log being used.
 
void setLogger (ProgressLoggerImpl *logger)
 Sets the logger to be used for progress logging.
 
void startProgress (SignedSize begin, SignedSize end, const std::string &label) const
 Initializes the progress display.
 
void setProgress (SignedSize value) const
 Sets the current progress.
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end)
 

Private Types

typedef double(* fptr) (double, double)
 Function-pointer type for a two-argument score accumulator.
 

Private Member Functions

void processRun_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &acc_to_protein_hitP_and_count, SequenceToChargeToPSM &best_pep, ProteinIdentification &prot_run, PeptideIdentificationList &pep_ids) const
 Performs simple aggregation-based inference on one protein run.
 
void aggregatePeptideScores_ (SequenceToChargeToPSM &best_pep, PeptideIdentificationList &pep_ids, const std::string &overall_score_type, bool higher_better, const std::string &run_id) const
 fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings)
 
void updateProteinScores_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &acc_to_protein_hitP_and_count, const SequenceToChargeToPSM &best_pep, bool pep_scores, bool higher_better) const
 aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep
 
AggregationMethod aggFromString_ (const std::string &method_string) const
 Map a score_aggregation_method parameter string to the AggregationMethod enum.
 
void checkCompat_ (const std::string &score_type, const AggregationMethod &aggregation_method) const
 Reject score-type / aggregation-method combinations that don't make statistical sense.
 
void checkCompat_ (const IDScoreSwitcherAlgorithm::ScoreType &score_type, const AggregationMethod &aggregation_method) const
 Same as the string overload, but takes a typed IDScoreSwitcherAlgorithm::ScoreType so the check can be done after the score-switcher has classified the score.
 
double getInitScoreForAggMethod_ (const AggregationMethod &aggregation_method, bool higher_better) const
 Return the identity-element initial score for the chosen aggregation method.
 
fptr aggFunFromEnum_ (const BasicProteinInferenceAlgorithm::AggregationMethod &agg_method, bool higher_better) const
 Pick the two-argument accumulator function matching the chosen aggregation method.
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="")
 Writes all parameters to meta values.
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method.
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor.
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters.
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes!
 
std::vector< std::string > subsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes!
 
std::string error_name_
 Name that is displayed in error messages during the parameter checking.
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;.
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;.
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

Simple protein inference by aggregation of per-peptide PSM scores.

First takes the best PSM per spectrum, then keeps the best PSM per peptidoform (where "peptidoform" is widened or narrowed by the treat_charge_variants_separately and treat_modification_variants_separately parameters), and finally aggregates the peptide-level scores onto the proteins using one of the methods exposed via the score_aggregation_method parameter.

Configurable behaviour is exposed through DefaultParamHandler — see the defaults installed by the constructor for the full list of supported keys, in particular:

  • "score_aggregation_method" — one of "best", "product", "sum", "maximum"; maps onto AggregationMethod via aggFromString_. The "best" / "maximum" string both produce the BEST mode.
  • "treat_charge_variants_separately" — distinguish charge variants of the same modified sequence as distinct peptidoforms (default "true").
  • "treat_modification_variants_separately" — distinguish modified vs. unmodified variants of the same backbone sequence (default "true").
  • "use_shared_peptides" — if "true", shared peptides count as evidence for every protein they map to (default "true").
  • "skip_count_annotation" — if set, the per-peptide count annotation on the protein hits is suppressed (default "false").
  • "annotate_indistinguishable_groups" — compute and annotate indistinguishable protein groups (default "true").
  • "greedy_group_resolution" — resolve shared peptides to a single best protein (razor-peptide style; default "false").
  • "min_peptides_per_protein" — minimum peptide count required for a protein to be reported (default 1).
  • "score_type" — explicit PSM score type to use; empty falls back to the main score.

The algorithm assumes posteriors or posterior error probabilities; PEPs are converted to posteriors as part of scoring. Multiple runs are supported, each processed independently.

Member Typedef Documentation

◆ fptr

typedef double(* fptr) (double, double)
private

Function-pointer type for a two-argument score accumulator.

◆ SequenceToChargeToPSM

typedef std::unordered_map<std::string, std::map<Int, PeptideHit*> > SequenceToChargeToPSM

Member Enumeration Documentation

◆ AggregationMethod

enum class AggregationMethod
strong

The aggregation method.

Enumerator
PROD 

aggregate by product (ignore zeroes)

SUM 

aggregate by summing

BEST 

aggregate by maximum/minimum

Constructor & Destructor Documentation

◆ BasicProteinInferenceAlgorithm()

Default constructor.

Member Function Documentation

◆ aggFromString_()

AggregationMethod aggFromString_ ( const std::string &  method_string) const
private

Map a score_aggregation_method parameter string to the AggregationMethod enum.

Recognised values: "product", "sum", "best", "maximum" ("best" and "maximum" both produce AggregationMethod::BEST).

Parameters
[in]method_stringParameter string.
Returns
Matching AggregationMethod.

◆ aggFunFromEnum_()

fptr aggFunFromEnum_ ( const BasicProteinInferenceAlgorithm::AggregationMethod agg_method,
bool  higher_better 
) const
private

Pick the two-argument accumulator function matching the chosen aggregation method.

Parameters
[in]agg_methodAggregation mode chosen via "score_aggregation_method".
[in]higher_betterWhether higher score values are better (used only for BEST, to pick max vs min).
Returns
Function pointer with signature double(double,double).

◆ aggregatePeptideScores_()

void aggregatePeptideScores_ ( SequenceToChargeToPSM best_pep,
PeptideIdentificationList pep_ids,
const std::string &  overall_score_type,
bool  higher_better,
const std::string &  run_id 
) const
private

fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings)

Parameters
[in,out]best_pep(mod.) sequence to charge to pointer of best PSM (PeptideHit*)
[in,out]pep_idsthe spectra with PSMs
[in]overall_score_typethe pre-determined type name to raise an error if mixed types occur
[in]higher_betterif for this score type higher is better
[in]run_idonly process peptides associated with this run_id (e.g. proteinID run getIdentifier())

◆ checkCompat_() [1/2]

void checkCompat_ ( const IDScoreSwitcherAlgorithm::ScoreType score_type,
const AggregationMethod aggregation_method 
) const
private

Same as the string overload, but takes a typed IDScoreSwitcherAlgorithm::ScoreType so the check can be done after the score-switcher has classified the score.

Parameters
[in]score_typeTyped score classification.
[in]aggregation_methodAggregation mode chosen via "score_aggregation_method".

◆ checkCompat_() [2/2]

void checkCompat_ ( const std::string &  score_type,
const AggregationMethod aggregation_method 
) const
private

Reject score-type / aggregation-method combinations that don't make statistical sense.

Multiplication (AggregationMethod::PROD) is only meaningful for probability-typed scores; other combinations either throw or log a warning. Uses the score-type name.

Parameters
[in]score_typeName of the PSM score type.
[in]aggregation_methodAggregation mode chosen via "score_aggregation_method".

◆ getInitScoreForAggMethod_()

double getInitScoreForAggMethod_ ( const AggregationMethod aggregation_method,
bool  higher_better 
) const
private

Return the identity-element initial score for the chosen aggregation method.

For example, 0 for SUM, 1 for PROD, and the worst-possible score (depending on higher_better) for BEST.

Parameters
[in]aggregation_methodAggregation mode chosen via "score_aggregation_method".
[in]higher_betterWhether higher score values are better (used only for BEST).
Returns
Initial accumulator value to start the aggregation from.

◆ processRun_()

void processRun_ ( std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &  acc_to_protein_hitP_and_count,
SequenceToChargeToPSM best_pep,
ProteinIdentification prot_run,
PeptideIdentificationList pep_ids 
) const
private

Performs simple aggregation-based inference on one protein run.

Parameters
[in,out]acc_to_protein_hitP_and_countMaps Accessions to a pair of ProteinHit pointers and number of peptidoforms encountered
[in,out]best_pepMaps (un)modified peptide sequence to a map from charge (0 when unconsidered) to the best PeptideHit pointer
[in,out]prot_runThe current run to process
[in,out]pep_idsPeptides for the current run to process

◆ run() [1/3]

void run ( ConsensusMap cmap,
ProteinIdentification prot_id,
bool  include_unassigned 
) const

Run inference over a ConsensusMap, treating every peptide identification it carries as evidence for the proteins in prot_id.

Differs from the per-run overloads above by ignoring the getIdentifier association between peptides and protein runs — every peptide id in cmap (and optionally in the unassigned list, see include_unassigned) is used. prot_id is expected to be the union of the proteins of all runs in cmap.

Parameters
[in,out]cmapConsensus map providing the peptide identifications; PSMs may be sorted/filtered in place.
[in,out]prot_idProtein-identification run to annotate with aggregated scores.
[in]include_unassignedIf true, also include ConsensusMap::getUnassignedPeptideIdentifications.
Exceptions
Exception::InvalidParameterIf PSMs of a peptide carry different score types.
Todo:
JuliaP Allow checking that peptide / protein IDs reference the same run identifier.

◆ run() [2/3]

void run ( PeptideIdentificationList pep_ids,
ProteinIdentification prot_id 
) const

Run inference for a single protein-ID run.

Convenience overload of the multi-run version: only peptides whose getIdentifier matches prot_id.getIdentifier() are processed; the others are ignored.

Parameters
[in,out]pep_idsPeptide identifications for this run; sorted/filtered in place.
[in,out]prot_idProtein-identification run to annotate with aggregated scores.
Exceptions
Exception::InvalidParameterIf PSMs of a peptide carry different score types.

◆ run() [3/3]

void run ( PeptideIdentificationList pep_ids,
std::vector< ProteinIdentification > &  prot_ids 
) const

Run inference per protein-ID run, iterating each prot_ids entry separately.

For every entry in prot_ids, only peptides whose getIdentifier matches that run's getIdentifier are processed (other peptides are ignored for that run). pep_ids is sorted and filtered to best-PSM-per-peptidoform; prot_ids is annotated with the aggregated scores and (unless "skip_count_annotation" is set) with per-protein peptide counts.

Parameters
[in,out]pep_idsPeptide identifications across all runs; sorted/filtered in place.
[in,out]prot_idsOne protein-identification run per entry; scores and per-peptide counts annotated in place.
Exceptions
Exception::InvalidParameterIf PSMs of a peptide carry different score types (mixed score types are not supported).

◆ updateProteinScores_()

void updateProteinScores_ ( std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &  acc_to_protein_hitP_and_count,
const SequenceToChargeToPSM best_pep,
bool  pep_scores,
bool  higher_better 
) const
private

aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep

Parameters
[in,out]acc_to_protein_hitP_and_countthe results to fill
[in]best_pepbest psm per peptide to read the score
[in]pep_scoresif the score is a posterior error probability -> Auto-converts to posterior probability
[in]higher_betterif for the score higher is better. Assume score is unconverted.