OpenMS
Loading...
Searching...
No Matches
ModifiedPeptideGenerator Class Reference

Generate fixed- and variable-modification variants of an AASequence, lock-free. More...

#include <OpenMS/CHEMISTRY/ModifiedPeptideGenerator.h>

Collaboration diagram for ModifiedPeptideGenerator:
[legend]

Classes

struct  MapToResidueType
 Cached mapping ResidueModification* -> already-instantiated modified Residue*. More...
 

Static Public Member Functions

static MapToResidueType getModifications (const StringList &modNames)
 Resolve modification names from UniMod terms and pre-cache the corresponding modified Residue pointers.
 
static void applyFixedModifications (const MapToResidueType &fixed_mods, AASequence &peptide)
 Apply all compatible fixed modifications in fixed_mods to peptide in place.
 
static void applyVariableModifications (const MapToResidueType &var_mods, const AASequence &peptide, Size max_variable_mods_per_peptide, std::vector< AASequence > &all_modified_peptides, bool keep_original=true)
 Enumerate all peptide variants obtained by combinatorially placing up to max_variable_mods_per_peptide variable modifications.
 

Static Protected Member Functions

static MapToResidueType createResidueModificationToResidueMap_ (const std::vector< const ResidueModification * > &mods)
 Build the ResidueModification -> Residue cache used by the apply* methods (the lock-free trick — see class brief).
 
static void applyAtMostOneVariableModification_ (const MapToResidueType &var_mods, const AASequence &peptide, std::vector< AASequence > &all_modified_peptides, bool keep_original=true)
 Fast path for the common case where exactly one variable modification is placed per peptide.
 

Static Protected Attributes

static const int N_TERM_MODIFICATION_INDEX
 Sentinel index used internally to mark a strict N_TERM-only modification placed at the N-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)
 
static const int C_TERM_MODIFICATION_INDEX
 Sentinel index used internally to mark a strict C_TERM-only modification placed at the C-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)
 

Static Private Member Functions

static void applyAllModsAtIdxAndExtend_ (std::vector< AASequence > &original_sequences, int idx_to_modify, const std::vector< const ResidueModification * > &mods, const MapToResidueType &var_mods)
 For each modification in mods, extend original_sequences with a copy carrying that mod at idx_to_modify; the first mod in mods is applied in-place to the existing entries (avoids one copy)
 
static void applyModToPep_ (AASequence &current_peptide, int current_index, const ResidueModification *m, const MapToResidueType &var_mods)
 Install modification m on current_peptide at residue index current_index, looking up the pre-cached modified Residue via var_mods; overwrites an existing modification if any.
 

Detailed Description

Generate fixed- and variable-modification variants of an AASequence, lock-free.

The class implements the modification-placement stage of database-search engines and assay generators:

  1. getModifications() — resolve a list of UniMod names (e.g. "Oxidation (M)") into a cached MapToResidueType keyed by ResidueModification. This step performs the only ResidueDB lookups; the apply-helpers below are lock-free.
  2. applyFixedModifications() — apply mandatory modifications in place to one peptide.
  3. applyVariableModifications() — enumerate all peptide variants obtained by placing up to max_variable_mods_per_peptide compatible variable modifications.

Why pre-caching matters: ResidueDB is process-wide and locked on every getResidue / getModifiedResidue call. In a peptide-search workload that means contention proportional to the candidate count. getModifications() does the lookups once up front; the apply methods then reuse the cached Residue pointers and never touch the DB.

All methods are static; the class is a namespace in disguise.


Class Documentation

◆ OpenMS::ModifiedPeptideGenerator::MapToResidueType

struct OpenMS::ModifiedPeptideGenerator::MapToResidueType

Cached mapping ResidueModification* -> already-instantiated modified Residue*.

Built once by getModifications() (or createResidueModificationToResidueMap_()) and consumed by the apply* methods. For modifications that have no associated residue (e.g. strict "Protein N-term" without an amino-acid origin), the residue pointer is nullptr.

The wrapping struct (rather than a bare unordered_map) exists so that pyOpenMS can bind this aggregate as a single type without having to template-instantiate the map.

Collaboration diagram for ModifiedPeptideGenerator::MapToResidueType:
[legend]
Class Members
unordered_map< const ResidueModification *, const Residue * > val

Member Function Documentation

◆ applyAllModsAtIdxAndExtend_()

static void applyAllModsAtIdxAndExtend_ ( std::vector< AASequence > &  original_sequences,
int  idx_to_modify,
const std::vector< const ResidueModification * > &  mods,
const MapToResidueType var_mods 
)
staticprivate

For each modification in mods, extend original_sequences with a copy carrying that mod at idx_to_modify; the first mod in mods is applied in-place to the existing entries (avoids one copy)

◆ applyAtMostOneVariableModification_()

static void applyAtMostOneVariableModification_ ( const MapToResidueType var_mods,
const AASequence peptide,
std::vector< AASequence > &  all_modified_peptides,
bool  keep_original = true 
)
staticprotected

Fast path for the common case where exactly one variable modification is placed per peptide.

Avoids the combinatoric enumeration of applyVariableModifications(): every compatible (residue, modification) pair yields exactly one output peptide; already-modified residues are skipped. The original peptide is emitted iff keep_original is true.

Parameters
[in]var_modsCached variable-modification table.
[in]peptideSource peptide.
[out]all_modified_peptidesGenerated variants appended here.
[in]keep_originalIf true, also emit peptide unchanged.

◆ applyFixedModifications()

static void applyFixedModifications ( const MapToResidueType fixed_mods,
AASequence peptide 
)
static

Apply all compatible fixed modifications in fixed_mods to peptide in place.

Two passes:

  • Strict terminal modifications (N_TERM / C_TERM with no specific residue origin) are installed on the peptide's terminal slots if those are still empty.
  • Each residue is then visited: residues already carrying a modification are skipped (fixed mods do not overwrite), residues whose one-letter code matches a fixed mod receive that mod. Term-specificity ANYWHERE applies anywhere; C_TERM / N_TERM only when the residue is the last / first residue of the peptide.
Parameters
[in]fixed_modsCached fixed-modification table (typically from getModifications()).
[in,out]peptidePeptide to modify in place.

◆ applyModToPep_()

static void applyModToPep_ ( AASequence current_peptide,
int  current_index,
const ResidueModification m,
const MapToResidueType var_mods 
)
staticprivate

Install modification m on current_peptide at residue index current_index, looking up the pre-cached modified Residue via var_mods; overwrites an existing modification if any.

◆ applyVariableModifications()

static void applyVariableModifications ( const MapToResidueType var_mods,
const AASequence peptide,
Size  max_variable_mods_per_peptide,
std::vector< AASequence > &  all_modified_peptides,
bool  keep_original = true 
)
static

Enumerate all peptide variants obtained by combinatorially placing up to max_variable_mods_per_peptide variable modifications.

Treats each modification site independently and produces every legal combination of compatible variable mods up to the cap. The original (unmodified) peptide is included in the output iff keep_original is true. The output all_modified_peptides is appended to (existing contents are preserved).

Site compatibility is determined by:

  • Residue one-letter code matching the modification's origin.
  • Term-specificity (ANYWHERE / N_TERM / C_TERM) being satisfied by the residue's position in the peptide.
  • The residue not already carrying a fixed modification (those slots are skipped).

The implementation uses a fast specialisation (applyAtMostOneVariableModification_) when max_variable_mods_per_peptide == 1.

Parameters
[in]var_modsCached variable-modification table (from getModifications()).
[in]peptideSource peptide; fixed modifications already applied if desired.
[in]max_variable_mods_per_peptideMaximum number of variable mods to place; 0 disables enumeration.
[out]all_modified_peptidesGenerated variants are appended here (existing entries preserved).
[in]keep_originalIf true, also emit peptide unchanged as the first entry.

◆ createResidueModificationToResidueMap_()

static MapToResidueType createResidueModificationToResidueMap_ ( const std::vector< const ResidueModification * > &  mods)
staticprotected

Build the ResidueModification -> Residue cache used by the apply* methods (the lock-free trick — see class brief).

For each modification: looks up the corresponding modified Residue in ResidueDB and stores the pointer. Strict terminal modifications without an amino-acid origin (e.g. "Protein N-term" with origin == 'X') map to nullptr because no residue can carry them — the apply* code handles those via the peptide's terminal slots.

Parameters
[in]modsModifications to cache (typically the resolved list inside getModifications()).
Returns
Cache ready for use by applyFixedModifications() / applyVariableModifications().

◆ getModifications()

static MapToResidueType getModifications ( const StringList modNames)
static

Resolve modification names from UniMod terms and pre-cache the corresponding modified Residue pointers.

Performs all ResidueDB / ModificationsDB lookups in one place so that the downstream apply methods can run lock-free. For modifications without an amino-acid origin (strict terminal modifications) the cached residue pointer is nullptr.

Parameters
[in]modNamesList of UniMod modification names (e.g. "Oxidation (M)").
Returns
Mapping ResidueModification* -> modified Residue* suitable for the apply* methods.

Member Data Documentation

◆ C_TERM_MODIFICATION_INDEX

const int C_TERM_MODIFICATION_INDEX
staticprotected

Sentinel index used internally to mark a strict C_TERM-only modification placed at the C-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)

◆ N_TERM_MODIFICATION_INDEX

const int N_TERM_MODIFICATION_INDEX
staticprotected

Sentinel index used internally to mark a strict N_TERM-only modification placed at the N-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)