OpenMS
IDMergerAlgorithm Class Reference

Creates a new Protein ID run into which other runs can be inserted. Creates union of protein hits but concatenates PSMs. Checks search engine consistency of all inserted runs. It differs from the IDMerger tool, in that it is an algorithm class and it allows inserting multiple peptide hits per peptide sequence (not only the first occurrence). More...

#include <OpenMS/ANALYSIS/ID/IDMergerAlgorithm.h>

Inheritance diagram for IDMergerAlgorithm:
[legend]
Collaboration diagram for IDMergerAlgorithm:
[legend]

Public Member Functions

 IDMergerAlgorithm (const String &runIdentifier="merged")
 
void insertRuns (std::vector< ProteinIdentification > &&prots, std::vector< PeptideIdentification > &&peps)
 
void insertRuns (const std::vector< ProteinIdentification > &prots, const std::vector< PeptideIdentification > &peps)
 
void returnResultsAndClear (ProteinIdentification &prots, std::vector< PeptideIdentification > &peps)
 Return the merged results and reset/clear all internal data. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor. More...
 
virtual ~ProgressLogger ()
 Destructor. More...
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor. More...
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator. More...
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE! More...
 
LogType getLogType () const
 Returns the type of progress log being used. More...
 
void startProgress (SignedSize begin, SignedSize end, const String &label) const
 Initializes the progress display. More...
 
void setProgress (SignedSize value) const
 Sets the current progress. More...
 
void endProgress () const
 Ends the progress display. More...
 
void nextProgress () const
 increment progress by 1 (according to range begin-end) More...
 

Private Types

using hash_type = std::size_t(*)(const ProteinHit &)
 
using equal_type = bool(*)(const ProteinHit &, const ProteinHit &)
 

Private Member Functions

String getNewIdentifier_ () const
 Returns the new identifier. The initial identifier plus a timestamp. More...
 
bool checkOldRunConsistency_ (const std::vector< ProteinIdentification > &protRuns, const String &experiment_type) const
 
bool checkOldRunConsistency_ (const std::vector< ProteinIdentification > &protRuns, const ProteinIdentification &ref, const String &experiment_type) const
 
void insertProteinIDs_ (std::vector< ProteinIdentification > &&old_protRuns)
 
void updateAndMovePepIDs_ (std::vector< PeptideIdentification > &&pepIDs, const std::map< String, Size > &runID_to_runIdx, const std::vector< StringList > &originFiles, bool annotate_origin)
 
void movePepIDsAndRefProteinsToResultFaster_ (std::vector< PeptideIdentification > &&pepIDs, std::vector< ProteinIdentification > &&old_protRuns)
 

Static Private Member Functions

static void copySearchParams_ (const ProteinIdentification &from, ProteinIdentification &to)
 Copies over search parameters. More...
 
static size_t accessionHash_ (const ProteinHit &p)
 
static bool accessionEqual_ (const ProteinHit &p1, const ProteinHit &p2)
 

Private Attributes

ProteinIdentification prot_result_
 the resulting new Protein IDs More...
 
std::vector< PeptideIdentificationpep_result_
 the resulting new Peptide IDs More...
 
std::unordered_set< ProteinHit, hash_type, equal_typecollected_protein_hits_
 
bool filled_ = false
 is the resulting protein ID already filled? More...
 
std::map< String, Sizefile_origin_to_idx_
 to keep track of the mzML origins of spectra More...
 
String id_
 the new identifier string More...
 

Additional Inherited Members

- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 
- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Static Protected Member Functions inherited from ProgressLogger
static String logTypeToFactoryName_ (LogType type)
 Return the name of the factory product used for this log type. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

Creates a new Protein ID run into which other runs can be inserted. Creates union of protein hits but concatenates PSMs. Checks search engine consistency of all inserted runs. It differs from the IDMerger tool, in that it is an algorithm class and it allows inserting multiple peptide hits per peptide sequence (not only the first occurrence).

Todo:
allow filtering for peptide sequence to supersede the IDMerger tool. Make it keep the best PSMs though.

Member Typedef Documentation

◆ equal_type

using equal_type = bool (*)(const ProteinHit&, const ProteinHit&)
private

◆ hash_type

using hash_type = std::size_t (*)(const ProteinHit&)
private

Constructor & Destructor Documentation

◆ IDMergerAlgorithm()

IDMergerAlgorithm ( const String runIdentifier = "merged")
explicit

Member Function Documentation

◆ accessionEqual_()

static bool accessionEqual_ ( const ProteinHit p1,
const ProteinHit p2 
)
inlinestaticprivate

◆ accessionHash_()

static size_t accessionHash_ ( const ProteinHit p)
inlinestaticprivate

◆ checkOldRunConsistency_() [1/2]

bool checkOldRunConsistency_ ( const std::vector< ProteinIdentification > &  protRuns,
const ProteinIdentification ref,
const String experiment_type 
) const
private

Same as above, if you want to use a specific reference

Parameters
protRunsThe runs to check (first = reference)
refA possibly external protein run reference
experiment_typeallow some mismatches in case of other experiment types (e.g. SILAC)
Returns
all same? TODO: a merged RunDescription about what to put in the new runs (e.g. for SILAC)
Exceptions
BaseExceptionfor disagreeing settings

◆ checkOldRunConsistency_() [2/2]

bool checkOldRunConsistency_ ( const std::vector< ProteinIdentification > &  protRuns,
const String experiment_type 
) const
private

Checks consistency of search engines and settings across runs before merging.

Parameters
protRunsThe runs to check (first = implicit reference)
experiment_typeallow some mismatches in case of other experiment types (e.g. SILAC)
Returns
all same? TODO: a merged RunDescription about what to put in the new runs (e.g. for SILAC)
Exceptions
BaseExceptionfor disagreeing settings

◆ copySearchParams_()

static void copySearchParams_ ( const ProteinIdentification from,
ProteinIdentification to 
)
staticprivate

Copies over search parameters.

◆ getNewIdentifier_()

String getNewIdentifier_ ( ) const
private

Returns the new identifier. The initial identifier plus a timestamp.

◆ insertProteinIDs_()

void insertProteinIDs_ ( std::vector< ProteinIdentification > &&  old_protRuns)
private

moves and inserts protein IDs if not yet present then clears the input

◆ insertRuns() [1/2]

void insertRuns ( const std::vector< ProteinIdentification > &  prots,
const std::vector< PeptideIdentification > &  peps 
)

◆ insertRuns() [2/2]

void insertRuns ( std::vector< ProteinIdentification > &&  prots,
std::vector< PeptideIdentification > &&  peps 
)

Insert (=move and clear) a run with its peptide IDs into the internal merged data structures, based on the initial mapping from fileorigins to new run

◆ movePepIDsAndRefProteinsToResultFaster_()

void movePepIDsAndRefProteinsToResultFaster_ ( std::vector< PeptideIdentification > &&  pepIDs,
std::vector< ProteinIdentification > &&  old_protRuns 
)
private

◆ returnResultsAndClear()

void returnResultsAndClear ( ProteinIdentification prots,
std::vector< PeptideIdentification > &  peps 
)

Return the merged results and reset/clear all internal data.

◆ updateAndMovePepIDs_()

void updateAndMovePepIDs_ ( std::vector< PeptideIdentification > &&  pepIDs,
const std::map< String, Size > &  runID_to_runIdx,
const std::vector< StringList > &  originFiles,
bool  annotate_origin 
)
private

updates the references in pepIDs to the new protein ID run then moves the peptide IDs based on the mapping in

Member Data Documentation

◆ collected_protein_hits_

std::unordered_set<ProteinHit, hash_type, equal_type> collected_protein_hits_
private

◆ file_origin_to_idx_

std::map<String, Size> file_origin_to_idx_
private

to keep track of the mzML origins of spectra

◆ filled_

bool filled_ = false
private

is the resulting protein ID already filled?

◆ id_

String id_
private

the new identifier string

◆ pep_result_

std::vector<PeptideIdentification> pep_result_
private

the resulting new Peptide IDs

◆ prot_result_

ProteinIdentification prot_result_
private

the resulting new Protein IDs