Generate fixed- and variable-modification variants of an AASequence, lock-free.
More...
#include <OpenMS/CHEMISTRY/ModifiedPeptideGenerator.h>
|
| static const int | N_TERM_MODIFICATION_INDEX |
| | Sentinel index used internally to mark a strict N_TERM-only modification placed at the N-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)
|
| |
| static const int | C_TERM_MODIFICATION_INDEX |
| | Sentinel index used internally to mark a strict C_TERM-only modification placed at the C-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)
|
| |
|
| static void | applyAllModsAtIdxAndExtend_ (std::vector< AASequence > &original_sequences, int idx_to_modify, const std::vector< const ResidueModification * > &mods, const MapToResidueType &var_mods) |
| | For each modification in mods, extend original_sequences with a copy carrying that mod at idx_to_modify; the first mod in mods is applied in-place to the existing entries (avoids one copy)
|
| |
| static void | applyModToPep_ (AASequence ¤t_peptide, int current_index, const ResidueModification *m, const MapToResidueType &var_mods) |
| | Install modification m on current_peptide at residue index current_index, looking up the pre-cached modified Residue via var_mods; overwrites an existing modification if any.
|
| |
Generate fixed- and variable-modification variants of an AASequence, lock-free.
The class implements the modification-placement stage of database-search engines and assay generators:
- getModifications() — resolve a list of UniMod names (e.g.
"Oxidation (M)") into a cached MapToResidueType keyed by ResidueModification. This step performs the only ResidueDB lookups; the apply-helpers below are lock-free.
- applyFixedModifications() — apply mandatory modifications in place to one peptide.
- applyVariableModifications() — enumerate all peptide variants obtained by placing up to
max_variable_mods_per_peptide compatible variable modifications.
Why pre-caching matters: ResidueDB is process-wide and locked on every getResidue / getModifiedResidue call. In a peptide-search workload that means contention proportional to the candidate count. getModifications() does the lookups once up front; the apply methods then reuse the cached Residue pointers and never touch the DB.
All methods are static; the class is a namespace in disguise.
◆ OpenMS::ModifiedPeptideGenerator::MapToResidueType
| struct OpenMS::ModifiedPeptideGenerator::MapToResidueType |
Cached mapping ResidueModification* -> already-instantiated modified Residue*.
Built once by getModifications() (or createResidueModificationToResidueMap_()) and consumed by the apply* methods. For modifications that have no associated residue (e.g. strict "Protein N-term" without an amino-acid origin), the residue pointer is nullptr.
The wrapping struct (rather than a bare unordered_map) exists so that pyOpenMS can bind this aggregate as a single type without having to template-instantiate the map.
◆ applyAllModsAtIdxAndExtend_()
For each modification in mods, extend original_sequences with a copy carrying that mod at idx_to_modify; the first mod in mods is applied in-place to the existing entries (avoids one copy)
◆ applyAtMostOneVariableModification_()
| static void applyAtMostOneVariableModification_ |
( |
const MapToResidueType & |
var_mods, |
|
|
const AASequence & |
peptide, |
|
|
std::vector< AASequence > & |
all_modified_peptides, |
|
|
bool |
keep_original = true |
|
) |
| |
|
staticprotected |
Fast path for the common case where exactly one variable modification is placed per peptide.
Avoids the combinatoric enumeration of applyVariableModifications(): every compatible (residue, modification) pair yields exactly one output peptide; already-modified residues are skipped. The original peptide is emitted iff keep_original is true.
- Parameters
-
| [in] | var_mods | Cached variable-modification table. |
| [in] | peptide | Source peptide. |
| [out] | all_modified_peptides | Generated variants appended here. |
| [in] | keep_original | If true, also emit peptide unchanged. |
◆ applyFixedModifications()
Apply all compatible fixed modifications in fixed_mods to peptide in place.
Two passes:
- Strict terminal modifications (
N_TERM / C_TERM with no specific residue origin) are installed on the peptide's terminal slots if those are still empty.
- Each residue is then visited: residues already carrying a modification are skipped (fixed mods do not overwrite), residues whose one-letter code matches a fixed mod receive that mod. Term-specificity
ANYWHERE applies anywhere; C_TERM / N_TERM only when the residue is the last / first residue of the peptide.
- Parameters
-
| [in] | fixed_mods | Cached fixed-modification table (typically from getModifications()). |
| [in,out] | peptide | Peptide to modify in place. |
◆ applyModToPep_()
Install modification m on current_peptide at residue index current_index, looking up the pre-cached modified Residue via var_mods; overwrites an existing modification if any.
◆ applyVariableModifications()
| static void applyVariableModifications |
( |
const MapToResidueType & |
var_mods, |
|
|
const AASequence & |
peptide, |
|
|
Size |
max_variable_mods_per_peptide, |
|
|
std::vector< AASequence > & |
all_modified_peptides, |
|
|
bool |
keep_original = true |
|
) |
| |
|
static |
Enumerate all peptide variants obtained by combinatorially placing up to max_variable_mods_per_peptide variable modifications.
Treats each modification site independently and produces every legal combination of compatible variable mods up to the cap. The original (unmodified) peptide is included in the output iff keep_original is true. The output all_modified_peptides is appended to (existing contents are preserved).
Site compatibility is determined by:
- Residue one-letter code matching the modification's
origin.
- Term-specificity (
ANYWHERE / N_TERM / C_TERM) being satisfied by the residue's position in the peptide.
- The residue not already carrying a fixed modification (those slots are skipped).
The implementation uses a fast specialisation (applyAtMostOneVariableModification_) when max_variable_mods_per_peptide == 1.
- Parameters
-
| [in] | var_mods | Cached variable-modification table (from getModifications()). |
| [in] | peptide | Source peptide; fixed modifications already applied if desired. |
| [in] | max_variable_mods_per_peptide | Maximum number of variable mods to place; 0 disables enumeration. |
| [out] | all_modified_peptides | Generated variants are appended here (existing entries preserved). |
| [in] | keep_original | If true, also emit peptide unchanged as the first entry. |
◆ createResidueModificationToResidueMap_()
Build the ResidueModification -> Residue cache used by the apply* methods (the lock-free trick — see class brief).
For each modification: looks up the corresponding modified Residue in ResidueDB and stores the pointer. Strict terminal modifications without an amino-acid origin (e.g. "Protein N-term" with origin == 'X') map to nullptr because no residue can carry them — the apply* code handles those via the peptide's terminal slots.
- Parameters
-
- Returns
- Cache ready for use by applyFixedModifications() / applyVariableModifications().
◆ getModifications()
Resolve modification names from UniMod terms and pre-cache the corresponding modified Residue pointers.
Performs all ResidueDB / ModificationsDB lookups in one place so that the downstream apply methods can run lock-free. For modifications without an amino-acid origin (strict terminal modifications) the cached residue pointer is nullptr.
- Parameters
-
| [in] | modNames | List of UniMod modification names (e.g. "Oxidation (M)"). |
- Returns
- Mapping ResidueModification* -> modified Residue* suitable for the apply* methods.
◆ C_TERM_MODIFICATION_INDEX
| const int C_TERM_MODIFICATION_INDEX |
|
staticprotected |
Sentinel index used internally to mark a strict C_TERM-only modification placed at the C-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)
◆ N_TERM_MODIFICATION_INDEX
| const int N_TERM_MODIFICATION_INDEX |
|
staticprotected |
Sentinel index used internally to mark a strict N_TERM-only modification placed at the N-terminal residue (distinguishes it from an ANYWHERE mod that happens to land there)