![]() |
OpenMS
|
Algorithm for merging multiple protein and peptide identification runs. More...
#include <OpenMS/ANALYSIS/ID/IDMergerAlgorithm.h>
Public Member Functions | |
| IDMergerAlgorithm (const String &runIdentifier="merged", bool addTimeStampToID=true) | |
| Constructor for the IDMergerAlgorithm. More... | |
| void | insertRuns (std::vector< ProteinIdentification > &&prots, PeptideIdentificationList &&peps) |
| Insert runs using move semantics. More... | |
| void | insertRuns (const std::vector< ProteinIdentification > &prots, const PeptideIdentificationList &peps) |
| Insert runs using copy semantics. More... | |
| void | returnResultsAndClear (ProteinIdentification &prots, PeptideIdentificationList &peps) |
| Return the merged results and reset internal state. More... | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const String &name) | |
| Constructor with name that is displayed in error messages. More... | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. More... | |
| virtual | ~DefaultParamHandler () |
| Destructor. More... | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. More... | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. More... | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. More... | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. More... | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. More... | |
| const String & | getName () const |
| Non-mutable access to the name. More... | |
| void | setName (const String &name) |
| Mutable access to the name. More... | |
| const std::vector< String > & | getSubsections () const |
| Non-mutable access to the registered subsections. More... | |
Public Member Functions inherited from ProgressLogger | |
| ProgressLogger () | |
| Constructor. More... | |
| virtual | ~ProgressLogger () |
| Destructor. More... | |
| ProgressLogger (const ProgressLogger &other) | |
| Copy constructor. More... | |
| ProgressLogger & | operator= (const ProgressLogger &other) |
| Assignment Operator. More... | |
| void | setLogType (LogType type) const |
| Sets the progress log that should be used. The default type is NONE! More... | |
| LogType | getLogType () const |
| Returns the type of progress log being used. More... | |
| void | setLogger (ProgressLoggerImpl *logger) |
| Sets the logger to be used for progress logging. More... | |
| void | startProgress (SignedSize begin, SignedSize end, const String &label) const |
| Initializes the progress display. More... | |
| void | setProgress (SignedSize value) const |
| Sets the current progress. More... | |
| void | endProgress (UInt64 bytes_processed=0) const |
| void | nextProgress () const |
| increment progress by 1 (according to range begin-end) More... | |
Private Types | |
| using | hash_type = std::size_t(*)(const ProteinHit &) |
| Type alias for the hash function. More... | |
| using | equal_type = bool(*)(const ProteinHit &, const ProteinHit &) |
| Type alias for the equality function. More... | |
Private Member Functions | |
| String | getNewIdentifier_ (bool addTimeStampToID) const |
| Generate a new identifier for the merged run. More... | |
| bool | checkOldRunConsistency_ (const std::vector< ProteinIdentification > &protRuns, const String &experiment_type) const |
| Check consistency of search engines and settings across runs. More... | |
| bool | checkOldRunConsistency_ (const std::vector< ProteinIdentification > &protRuns, const ProteinIdentification &ref, const String &experiment_type) const |
| Check consistency of search engines and settings against a reference. More... | |
| void | insertProteinIDs_ (std::vector< ProteinIdentification > &&old_protRuns) |
| Insert protein identifications into the merged result. More... | |
| void | updateAndMovePepIDs_ (PeptideIdentificationList &&pepIDs, const std::map< String, Size > &runID_to_runIdx, const std::vector< StringList > &originFiles, bool annotate_origin) |
| Update peptide ID references and move them to the result. More... | |
| void | movePepIDsAndRefProteinsToResultFaster_ (PeptideIdentificationList &&pepIDs, std::vector< ProteinIdentification > &&old_protRuns) |
| Optimized method to move peptide IDs and reference proteins to result. More... | |
Static Private Member Functions | |
| static void | copySearchParams_ (const ProteinIdentification &from, ProteinIdentification &to) |
| Copy search parameters between protein identifications. More... | |
| static size_t | accessionHash_ (const ProteinHit &p) |
| Hash function for protein hits based on accession. More... | |
| static bool | accessionEqual_ (const ProteinHit &p1, const ProteinHit &p2) |
| Equality function for protein hits based on accession. More... | |
Private Attributes | |
| ProteinIdentification | prot_result_ |
| The resulting merged protein identification. More... | |
| PeptideIdentificationList | pep_result_ |
| The resulting merged peptide identifications. More... | |
| std::unordered_set< ProteinHit, hash_type, equal_type > | collected_protein_hits_ |
| Set of collected protein hits using custom hash and equality functions. More... | |
| bool | filled_ = false |
| Flag indicating whether the resulting protein ID is already filled. More... | |
| std::map< String, Size > | file_origin_to_idx_ |
| Mapping to keep track of the mzML origins of spectra. More... | |
| String | id_ |
| The new identifier string for the merged run. More... | |
| bool | fixed_identifier_ |
| Flag indicating whether the identifier should be fixed (i.e., not contain a timestamp) More... | |
Additional Inherited Members | |
Public Types inherited from ProgressLogger | |
| enum | LogType { CMD , GUI , NONE } |
| Possible log types. More... | |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="") |
| Writes all parameters to meta values. More... | |
Protected Member Functions inherited from DefaultParamHandler | |
| virtual void | updateMembers_ () |
| This method is used to update extra member variables at the end of the setParameters() method. More... | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. More... | |
Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ |
| Container for current parameters. More... | |
| Param | defaults_ |
| Container for default parameters. This member should be filled in the constructor of derived classes! More... | |
| std::vector< String > | subsections_ |
| Container for registered subsections. This member should be filled in the constructor of derived classes! More... | |
| String | error_name_ |
| Name that is displayed in error messages during the parameter checking. More... | |
| bool | check_defaults_ |
| If this member is set to false no checking if parameters in done;. More... | |
| bool | warn_empty_defaults_ |
| If this member is set to false no warning is emitted when defaults are empty;. More... | |
Protected Attributes inherited from ProgressLogger | |
| LogType | type_ |
| time_t | last_invoke_ |
| ProgressLoggerImpl * | current_logger_ |
Static Protected Attributes inherited from ProgressLogger | |
| static int | recursion_depth_ |
Algorithm for merging multiple protein and peptide identification runs.
This class creates a new Protein ID run into which other runs can be inserted. It performs the following operations:
The algorithm differs from the IDMerger tool in two key aspects:
The class handles the complexity of merging identification data from different sources while ensuring consistency and maintaining proper references between proteins and peptides. It can be used in workflows where identification results from multiple files or runs need to be combined into a single comprehensive result set.
The algorithm can optionally annotate the origin of each identification to maintain traceability of the merged results back to their source files.
|
private |
Type alias for the equality function.
|
private |
Type alias for the hash function.
|
explicit |
Constructor for the IDMergerAlgorithm.
Initializes a new merger with the specified run identifier.
| runIdentifier | Base identifier for the merged run (default: "merged") |
| addTimeStampToID | Whether to append a timestamp to the run identifier for uniqueness (default: true) |
|
inlinestaticprivate |
Equality function for protein hits based on accession.
| p1 | First protein hit to compare |
| p2 | Second protein hit to compare |
References ProteinHit::getAccession().
|
inlinestaticprivate |
Hash function for protein hits based on accession.
| p | Protein hit to hash |
References ProteinHit::getAccession().
|
private |
Check consistency of search engines and settings against a reference.
Verifies that all runs have compatible search engine settings before merging, using an explicitly provided reference run.
| protRuns | The runs to check |
| ref | An external protein run to use as reference |
| experiment_type | Experiment type to allow certain mismatches (e.g., "SILAC") |
| BaseException | for disagreeing settings |
|
private |
Check consistency of search engines and settings across runs.
Verifies that all runs have compatible search engine settings before merging. Uses the first run as an implicit reference.
| protRuns | The runs to check (first = implicit reference) |
| experiment_type | Experiment type to allow certain mismatches (e.g., "SILAC") |
| BaseException | for disagreeing settings |
|
staticprivate |
Copy search parameters between protein identifications.
Transfers search parameters from one protein identification to another.
| from | Source protein identification |
| to | Destination protein identification |
|
private |
Generate a new identifier for the merged run.
Creates a new identifier by combining the base identifier with a timestamp if requested.
| addTimeStampToID | Whether to append a timestamp to the identifier |
|
private |
Insert protein identifications into the merged result.
Moves and inserts protein IDs if not yet present, then clears the input.
| old_protRuns | Vector of protein identifications to insert |
| void insertRuns | ( | const std::vector< ProteinIdentification > & | prots, |
| const PeptideIdentificationList & | peps | ||
| ) |
Insert runs using copy semantics.
Inserts (copies) protein and peptide identifications into the internal merged data structures. This version preserves the source data. Note:
prots (noop if prots is empty)| prots | Vector of protein identifications to be merged |
| peps | Vector of peptide identifications to be merged |
| void insertRuns | ( | std::vector< ProteinIdentification > && | prots, |
| PeptideIdentificationList && | peps | ||
| ) |
Insert runs using move semantics.
Inserts (moves and clears) protein and peptide identifications into the internal merged data structures. This version uses move semantics for better performance when the source data is no longer needed. Note:
prots (noop if prots is empty)| prots | Vector of protein identifications to be merged |
| peps | Vector of peptide identifications to be merged |
|
private |
Optimized method to move peptide IDs and reference proteins to result.
A faster implementation for moving peptide IDs and their referenced proteins to the result data structures.
| pepIDs | Vector of peptide identifications to move |
| old_protRuns | Vector of protein identifications to reference |
| void returnResultsAndClear | ( | ProteinIdentification & | prots, |
| PeptideIdentificationList & | peps | ||
| ) |
Return the merged results and reset internal state.
Retrieves the merged protein and peptide identifications and clears all internal data structures, preparing the algorithm instance for potential reuse.
This method should be called after all desired runs have been inserted to obtain the final merged result.
| prots | [out] The merged protein identification containing the union of all protein hits |
| peps | [out] The merged peptide identifications containing all PSMs from the inserted runs |
|
private |
Update peptide ID references and move them to the result.
Updates the references in peptide IDs to point to the new protein ID run, then moves the peptide IDs based on the provided mapping.
| pepIDs | Vector of peptide identifications to update and move |
| runID_to_runIdx | Mapping from run IDs to run indices |
| originFiles | List of origin files for each run |
| annotate_origin | Whether to annotate peptide IDs with their origin |
|
private |
Set of collected protein hits using custom hash and equality functions.
Mapping to keep track of the mzML origins of spectra.
|
private |
Flag indicating whether the resulting protein ID is already filled.
|
private |
Flag indicating whether the identifier should be fixed (i.e., not contain a timestamp)
|
private |
The new identifier string for the merged run.
|
private |
The resulting merged peptide identifications.
|
private |
The resulting merged protein identification.