![]() |
OpenMS
|
Simple protein inference by aggregation of per-peptide PSM scores. More...
#include <OpenMS/ANALYSIS/ID/BasicProteinInferenceAlgorithm.h>
Public Types | |
| enum class | AggregationMethod { PROD , SUM , BEST } |
| The aggregation method. More... | |
| typedef std::unordered_map< std::string, std::map< Int, PeptideHit * > > | SequenceToChargeToPSM |
Public Types inherited from ProgressLogger | |
| enum | LogType { CMD , GUI , NONE } |
| Possible log types. More... | |
Public Member Functions | |
| BasicProteinInferenceAlgorithm () | |
| Default constructor. | |
| void | run (PeptideIdentificationList &pep_ids, std::vector< ProteinIdentification > &prot_ids) const |
Run inference per protein-ID run, iterating each prot_ids entry separately. | |
| void | run (PeptideIdentificationList &pep_ids, ProteinIdentification &prot_id) const |
| Run inference for a single protein-ID run. | |
| void | run (ConsensusMap &cmap, ProteinIdentification &prot_id, bool include_unassigned) const |
Run inference over a ConsensusMap, treating every peptide identification it carries as evidence for the proteins in prot_id. | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const std::string &name) | |
| Constructor with name that is displayed in error messages. | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. | |
| virtual | ~DefaultParamHandler () |
| Destructor. | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. | |
| const std::string & | getName () const |
| Non-mutable access to the name. | |
| void | setName (const std::string &name) |
| Mutable access to the name. | |
| const std::vector< std::string > & | getSubsections () const |
| Non-mutable access to the registered subsections. | |
Public Member Functions inherited from ProgressLogger | |
| ProgressLogger () | |
| Constructor. | |
| virtual | ~ProgressLogger () |
| Destructor. | |
| ProgressLogger (const ProgressLogger &other) | |
| Copy constructor. | |
| ProgressLogger & | operator= (const ProgressLogger &other) |
| Assignment Operator. | |
| void | setLogType (LogType type) const |
| Sets the progress log that should be used. The default type is NONE! | |
| LogType | getLogType () const |
| Returns the type of progress log being used. | |
| void | setLogger (ProgressLoggerImpl *logger) |
| Sets the logger to be used for progress logging. | |
| void | startProgress (SignedSize begin, SignedSize end, const std::string &label) const |
| Initializes the progress display. | |
| void | setProgress (SignedSize value) const |
| Sets the current progress. | |
| void | endProgress (UInt64 bytes_processed=0) const |
| void | nextProgress () const |
| increment progress by 1 (according to range begin-end) | |
Private Types | |
| typedef double(* | fptr) (double, double) |
| Function-pointer type for a two-argument score accumulator. | |
Private Member Functions | |
| void | processRun_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &acc_to_protein_hitP_and_count, SequenceToChargeToPSM &best_pep, ProteinIdentification &prot_run, PeptideIdentificationList &pep_ids) const |
| Performs simple aggregation-based inference on one protein run. | |
| void | aggregatePeptideScores_ (SequenceToChargeToPSM &best_pep, PeptideIdentificationList &pep_ids, const std::string &overall_score_type, bool higher_better, const std::string &run_id) const |
fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings) | |
| void | updateProteinScores_ (std::unordered_map< std::string, std::pair< ProteinHit *, Size > > &acc_to_protein_hitP_and_count, const SequenceToChargeToPSM &best_pep, bool pep_scores, bool higher_better) const |
aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep | |
| AggregationMethod | aggFromString_ (const std::string &method_string) const |
Map a score_aggregation_method parameter string to the AggregationMethod enum. | |
| void | checkCompat_ (const std::string &score_type, const AggregationMethod &aggregation_method) const |
| Reject score-type / aggregation-method combinations that don't make statistical sense. | |
| void | checkCompat_ (const IDScoreSwitcherAlgorithm::ScoreType &score_type, const AggregationMethod &aggregation_method) const |
| Same as the string overload, but takes a typed IDScoreSwitcherAlgorithm::ScoreType so the check can be done after the score-switcher has classified the score. | |
| double | getInitScoreForAggMethod_ (const AggregationMethod &aggregation_method, bool higher_better) const |
| Return the identity-element initial score for the chosen aggregation method. | |
| fptr | aggFunFromEnum_ (const BasicProteinInferenceAlgorithm::AggregationMethod &agg_method, bool higher_better) const |
| Pick the two-argument accumulator function matching the chosen aggregation method. | |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="") |
| Writes all parameters to meta values. | |
Protected Member Functions inherited from DefaultParamHandler | |
| virtual void | updateMembers_ () |
| This method is used to update extra member variables at the end of the setParameters() method. | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. | |
Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ |
| Container for current parameters. | |
| Param | defaults_ |
| Container for default parameters. This member should be filled in the constructor of derived classes! | |
| std::vector< std::string > | subsections_ |
| Container for registered subsections. This member should be filled in the constructor of derived classes! | |
| std::string | error_name_ |
| Name that is displayed in error messages during the parameter checking. | |
| bool | check_defaults_ |
| If this member is set to false no checking if parameters in done;. | |
| bool | warn_empty_defaults_ |
| If this member is set to false no warning is emitted when defaults are empty;. | |
Protected Attributes inherited from ProgressLogger | |
| LogType | type_ |
| time_t | last_invoke_ |
| ProgressLoggerImpl * | current_logger_ |
Static Protected Attributes inherited from ProgressLogger | |
| static int | recursion_depth_ |
Simple protein inference by aggregation of per-peptide PSM scores.
First takes the best PSM per spectrum, then keeps the best PSM per peptidoform (where "peptidoform" is widened or narrowed by the treat_charge_variants_separately and treat_modification_variants_separately parameters), and finally aggregates the peptide-level scores onto the proteins using one of the methods exposed via the score_aggregation_method parameter.
Configurable behaviour is exposed through DefaultParamHandler — see the defaults installed by the constructor for the full list of supported keys, in particular:
"score_aggregation_method" — one of "best", "product", "sum", "maximum"; maps onto AggregationMethod via aggFromString_. The "best" / "maximum" string both produce the BEST mode."treat_charge_variants_separately" — distinguish charge variants of the same modified sequence as distinct peptidoforms (default "true")."treat_modification_variants_separately" — distinguish modified vs. unmodified variants of the same backbone sequence (default "true")."use_shared_peptides" — if "true", shared peptides count as evidence for every protein they map to (default "true")."skip_count_annotation" — if set, the per-peptide count annotation on the protein hits is suppressed (default "false")."annotate_indistinguishable_groups" — compute and annotate indistinguishable protein groups (default "true")."greedy_group_resolution" — resolve shared peptides to a single best protein (razor-peptide style; default "false")."min_peptides_per_protein" — minimum peptide count required for a protein to be reported (default 1)."score_type" — explicit PSM score type to use; empty falls back to the main score.The algorithm assumes posteriors or posterior error probabilities; PEPs are converted to posteriors as part of scoring. Multiple runs are supported, each processed independently.
|
private |
Function-pointer type for a two-argument score accumulator.
| typedef std::unordered_map<std::string, std::map<Int, PeptideHit*> > SequenceToChargeToPSM |
|
strong |
Default constructor.
|
private |
Map a score_aggregation_method parameter string to the AggregationMethod enum.
Recognised values: "product", "sum", "best", "maximum" ("best" and "maximum" both produce AggregationMethod::BEST).
| [in] | method_string | Parameter string. |
|
private |
Pick the two-argument accumulator function matching the chosen aggregation method.
| [in] | agg_method | Aggregation mode chosen via "score_aggregation_method". |
| [in] | higher_better | Whether higher score values are better (used only for BEST, to pick max vs min). |
double(double,double).
|
private |
fills and updates the map of best peptide scores best_pep (by sequence or modified sequence, depending on algorithm settings)
| [in,out] | best_pep | (mod.) sequence to charge to pointer of best PSM (PeptideHit*) |
| [in,out] | pep_ids | the spectra with PSMs |
| [in] | overall_score_type | the pre-determined type name to raise an error if mixed types occur |
| [in] | higher_better | if for this score type higher is better |
| [in] | run_id | only process peptides associated with this run_id (e.g. proteinID run getIdentifier()) |
|
private |
Same as the string overload, but takes a typed IDScoreSwitcherAlgorithm::ScoreType so the check can be done after the score-switcher has classified the score.
| [in] | score_type | Typed score classification. |
| [in] | aggregation_method | Aggregation mode chosen via "score_aggregation_method". |
|
private |
Reject score-type / aggregation-method combinations that don't make statistical sense.
Multiplication (AggregationMethod::PROD) is only meaningful for probability-typed scores; other combinations either throw or log a warning. Uses the score-type name.
| [in] | score_type | Name of the PSM score type. |
| [in] | aggregation_method | Aggregation mode chosen via "score_aggregation_method". |
|
private |
Return the identity-element initial score for the chosen aggregation method.
For example, 0 for SUM, 1 for PROD, and the worst-possible score (depending on higher_better) for BEST.
| [in] | aggregation_method | Aggregation mode chosen via "score_aggregation_method". |
| [in] | higher_better | Whether higher score values are better (used only for BEST). |
|
private |
Performs simple aggregation-based inference on one protein run.
| [in,out] | acc_to_protein_hitP_and_count | Maps Accessions to a pair of ProteinHit pointers and number of peptidoforms encountered |
| [in,out] | best_pep | Maps (un)modified peptide sequence to a map from charge (0 when unconsidered) to the best PeptideHit pointer |
| [in,out] | prot_run | The current run to process |
| [in,out] | pep_ids | Peptides for the current run to process |
| void run | ( | ConsensusMap & | cmap, |
| ProteinIdentification & | prot_id, | ||
| bool | include_unassigned | ||
| ) | const |
Run inference over a ConsensusMap, treating every peptide identification it carries as evidence for the proteins in prot_id.
Differs from the per-run overloads above by ignoring the getIdentifier association between peptides and protein runs — every peptide id in cmap (and optionally in the unassigned list, see include_unassigned) is used. prot_id is expected to be the union of the proteins of all runs in cmap.
| [in,out] | cmap | Consensus map providing the peptide identifications; PSMs may be sorted/filtered in place. |
| [in,out] | prot_id | Protein-identification run to annotate with aggregated scores. |
| [in] | include_unassigned | If true, also include ConsensusMap::getUnassignedPeptideIdentifications. |
| Exception::InvalidParameter | If PSMs of a peptide carry different score types. |
| void run | ( | PeptideIdentificationList & | pep_ids, |
| ProteinIdentification & | prot_id | ||
| ) | const |
Run inference for a single protein-ID run.
Convenience overload of the multi-run version: only peptides whose getIdentifier matches prot_id.getIdentifier() are processed; the others are ignored.
| [in,out] | pep_ids | Peptide identifications for this run; sorted/filtered in place. |
| [in,out] | prot_id | Protein-identification run to annotate with aggregated scores. |
| Exception::InvalidParameter | If PSMs of a peptide carry different score types. |
| void run | ( | PeptideIdentificationList & | pep_ids, |
| std::vector< ProteinIdentification > & | prot_ids | ||
| ) | const |
Run inference per protein-ID run, iterating each prot_ids entry separately.
For every entry in prot_ids, only peptides whose getIdentifier matches that run's getIdentifier are processed (other peptides are ignored for that run). pep_ids is sorted and filtered to best-PSM-per-peptidoform; prot_ids is annotated with the aggregated scores and (unless "skip_count_annotation" is set) with per-protein peptide counts.
| [in,out] | pep_ids | Peptide identifications across all runs; sorted/filtered in place. |
| [in,out] | prot_ids | One protein-identification run per entry; scores and per-peptide counts annotated in place. |
| Exception::InvalidParameter | If PSMs of a peptide carry different score types (mixed score types are not supported). |
|
private |
aggregates and updates protein scores based on aggregation settings and aggregated peptide level results in prefilled best_pep
| [in,out] | acc_to_protein_hitP_and_count | the results to fill |
| [in] | best_pep | best psm per peptide to read the score |
| [in] | pep_scores | if the score is a posterior error probability -> Auto-converts to posterior probability |
| [in] | higher_better | if for the score higher is better. Assume score is unconverted. |