![]() |
OpenMS
|
Recursive descent parser for ProForma v2 peptidoform notation. More...
#include <OpenMS/CHEMISTRY/ProFormaParser.h>
Static Public Member Functions | |
| static Peptidoform | parse (const String &input) |
| Parse a ProForma string into a Peptidoform AST. | |
| static PeptidoformIon | parseIon (const String &input) |
| Parse a ProForma string into a PeptidoformIon AST. | |
| static String | toString (const Peptidoform &pf, ProFormaWriteMode mode=ProFormaWriteMode::LOSSLESS) |
| Convert a Peptidoform AST back to ProForma string notation. | |
| static String | toString (const PeptidoformIon &pfi, ProFormaWriteMode mode=ProFormaWriteMode::LOSSLESS) |
| Convert a PeptidoformIon AST back to ProForma string notation. | |
| static void | resolveModifications (Peptidoform &pf) |
| Resolve all modifications in a Peptidoform using ModificationsDB. | |
| static AASequence | toAASequence (const Peptidoform &pf, AASequenceConversionPolicy policy=AASequenceConversionPolicy::FAIL_ON_LOSS) |
| Convert a Peptidoform to an OpenMS AASequence. | |
| static Peptidoform | fromAASequence (const AASequence &seq) |
| Create a Peptidoform from an OpenMS AASequence. | |
| static bool | isRepresentableAsAASequence (const Peptidoform &pf) |
| Check if a Peptidoform can be fully represented as an AASequence. | |
| static std::vector< ConversionIssue > | getAASequenceConversionIssues (const Peptidoform &pf) |
| Get a list of all issues that would arise during AASequence conversion. | |
| static bool | canCalculateMass (const Peptidoform &pf) |
| Check if mass can be calculated for a Peptidoform. | |
| static bool | canCalculateMass (const PeptidoformIon &pfi) |
| Check if mass can be calculated for a PeptidoformIon. | |
| static std::vector< ConversionIssue > | getMassCalculationIssues (const Peptidoform &pf) |
| Get issues preventing mass calculation for a Peptidoform. | |
| static std::vector< ConversionIssue > | getMassCalculationIssues (const PeptidoformIon &pfi) |
| Get issues preventing mass calculation for a PeptidoformIon. | |
| static double | getMonoWeight (const Peptidoform &pf) |
| Calculate monoisotopic mass of a Peptidoform. | |
| static double | getMonoWeight (const PeptidoformIon &pfi) |
| Calculate monoisotopic mass of a PeptidoformIon. | |
| static double | getMZ (const PeptidoformIon &pfi) |
| Calculate m/z for a PeptidoformIon at its specified charge state. | |
| static double | getMZ (const Peptidoform &pf, int charge) |
| Calculate m/z for a Peptidoform at a given charge state. | |
| static std::optional< double > | tryGetMonoWeight (const Peptidoform &pf) |
| Try to calculate monoisotopic mass of a Peptidoform (non-throwing) | |
| static std::optional< double > | tryGetMonoWeight (const Peptidoform &pf, std::vector< ConversionIssue > &issues_out) |
| Try to calculate monoisotopic mass with diagnostic information. | |
| static std::optional< double > | tryGetMonoWeight (const PeptidoformIon &pfi) |
| Try to calculate monoisotopic mass of a PeptidoformIon (non-throwing) | |
| static std::optional< double > | tryGetMonoWeight (const PeptidoformIon &pfi, std::vector< ConversionIssue > &issues_out) |
| Try to calculate monoisotopic mass of PeptidoformIon with diagnostics. | |
| static std::optional< double > | tryGetMZ (const Peptidoform &pf, int charge) |
| Try to calculate m/z for a Peptidoform (non-throwing) | |
| static std::optional< double > | tryGetMZ (const Peptidoform &pf, int charge, std::vector< ConversionIssue > &issues_out) |
| Try to calculate m/z for a Peptidoform with diagnostics. | |
| static std::optional< double > | tryGetMZ (const PeptidoformIon &pfi) |
| Try to calculate m/z for a PeptidoformIon (non-throwing) | |
| static std::optional< double > | tryGetMZ (const PeptidoformIon &pfi, std::vector< ConversionIssue > &issues_out) |
| Try to calculate m/z for a PeptidoformIon with diagnostics. | |
| static bool | canGenerateSpectrum (const Peptidoform &pf) |
| Check if a theoretical spectrum can be generated for a Peptidoform. | |
| static bool | canGenerateSpectrum (const PeptidoformIon &pfi) |
| Check if a theoretical spectrum can be generated for a PeptidoformIon. | |
| static std::vector< ConversionIssue > | getSpectrumGenerationIssues (const Peptidoform &pf) |
| Get issues preventing spectrum generation for a Peptidoform. | |
| static std::vector< ConversionIssue > | getSpectrumGenerationIssues (const PeptidoformIon &pfi) |
| Get issues preventing spectrum generation for a PeptidoformIon. | |
| static MSSpectrum | generateSpectrum (const Peptidoform &pf, int min_charge=1, int max_charge=1, const std::string &ion_types="by", bool add_losses=false, bool add_metainfo=true) |
| Generate a theoretical MS/MS spectrum for a Peptidoform. | |
| static MSSpectrum | generateSpectrum (const PeptidoformIon &pfi, int min_charge=1, int max_charge=1, const std::string &ion_types="by", bool add_losses=false, bool add_metainfo=true) |
| Generate a theoretical MS/MS spectrum for a PeptidoformIon. | |
Private Member Functions | |
| ProFormaParser (std::string_view input) | |
| Private constructor - use static methods. | |
| PeptidoformIon | parsePeptidoformIon_ () |
| Parse a complete PeptidoformIon (multiple chains + charge) | |
| Peptidoform | parsePeptidoform_ () |
| Parse a single Peptidoform (one chain) | |
| Peptidoform | parsePeptidoformWithCharge_ (bool is_chimeric_context) |
| std::vector< GlobalModEntry > | parseGlobalMods_ () |
| Parse global modifications: < ... > | |
| GlobalModEntry | parseGlobalModEntry_ () |
| Parse a single global modification entry. | |
| IsotopeReplacement | parseIsotopeReplacement_ () |
Parse isotope replacement: <13C>, <15N>, <D> | |
| GlobalModification | parseGlobalModification_ () |
Parse global modification with locations: <[mod]@locations> | |
| std::vector< UnlocalisedMod > | parseUnlocalisedMods_ () |
| Parse unlocalised modifications: [mod]? | |
| std::vector< LabileModification > | parseLabileModifications_ () |
| Parse labile modifications: {mod}. | |
| std::vector< SequenceSection > | parseSequence_ () |
| Parse the amino acid sequence with modifications. | |
| SequenceElement | parseSequenceElement_ () |
| Parse a single sequence element (amino acid + mods) | |
| AmbiguousRegion | parseAmbiguousRegion_ () |
| Parse an ambiguous region: (?XY) | |
| ModifiedRange | parseModifiedRange_ () |
| Parse a modified range: (XYZ)[mod]. | |
| std::vector< Modification > | parseTerminalMods_ () |
| Parse terminal modifications: [mod1][mod2]... | |
| std::vector< Modification > | parseModificationList_ () |
| Parse a modification list: [mod1, mod2, ...]. | |
| Modification | parseModification_ () |
| Parse a single modification (may have alternatives with |) | |
| std::pair< ModificationTag, std::optional< Label > > | parseModificationTagWithLabel_ () |
| Parse a single modification tag (no alternatives) | |
| ModificationTag | parseModificationTag_ () |
| Parse a modification tag. | |
| NamedMod | parseNamedMod_ () |
| Parse a named modification: Oxidation, U:Oxidation. | |
| NamedMod | parseNamedMod_ (char cv_hint) |
| Parse a named modification with a known CV hint prefix. | |
| CvAccession | parseCvAccession_ () |
| Parse a CV accession: UNIMOD:35, MOD:00046. | |
| MassDelta | parseMassDelta_ () |
| Parse a mass delta: +15.9949, Obs:+79.978. | |
| FormulaTag | parseFormulaTag_ () |
| Parse a formula tag: Formula:C12H20O2. | |
| GlycanComposition | parseGlycanComposition_ () |
| Parse a glycan composition: Glycan:HexNAc1Hex2. | |
| InfoTag | parseInfoTag_ () |
| Parse an info tag: INFO:text. | |
| PositionConstraint | parsePositionConstraint_ () |
| Parse a position constraint: Position:MKC. | |
| Label | parseLabel_ () |
Parse a label: #XL1, #BRANCH, #g1(0.90) | |
| std::optional< ChargeState > | parseChargeState_ () |
| Parse charge state: /2, /+2, /[Na:z+1]. | |
| std::vector< AdductIon > | parseAdductIons_ () |
| Parse adduct ions: [Na:z+1, H:z+1]. | |
| AdductIon | parseAdductIon_ () |
| Parse a single adduct ion: Na:z+1. | |
| ProFormaTokenizer::Token | current_ () |
| Get the current token. | |
| ProFormaTokenizer::Token | peek_ () |
| Look at the next token without consuming. | |
| ProFormaTokenizer::Token | advance_ () |
| Consume and return the current token. | |
| bool | check_ (ProFormaTokenizer::TokenType type) |
| Check if current token matches expected type. | |
| bool | match_ (ProFormaTokenizer::TokenType type) |
| Check if current token matches expected type, consume if true. | |
| ProFormaTokenizer::Token | expect_ (ProFormaTokenizer::TokenType type, const char *expected_desc) |
| Expect a specific token type, throw error if not found. | |
| bool | isAtEnd_ () |
| Check if at end of input. | |
| void | error_ (ProFormaErrorCode code, const char *message) |
| Throw a parse error at the current position. | |
| void | errorAt_ (ProFormaErrorCode code, size_t pos, const char *message) |
| Throw a parse error at a specific position. | |
| std::optional< CvDatabase > | parseCvDatabasePrefix_ (const std::string_view &id) |
| Parse a CV database prefix from identifier. | |
| bool | looksLikeModificationTagContent_ () |
| Check if the current position could start a modification tag content. | |
| bool | hasNTerminalModPattern_ () |
| Check if current position has N-terminal modification pattern ([mod]-) | |
| ProFormaTokenizer | createLookahead_ () const |
| Create a lookahead tokenizer positioned at the current logical position. | |
Static Private Member Functions | |
| static bool | isAminoAcid_ (char c) |
| Check if identifier is a valid amino acid. | |
Private Attributes | |
| ProFormaTokenizer | tokenizer_ |
| The tokenizer for lexical analysis. | |
| std::string | input_ |
| The original input string (for error messages) | |
| ProFormaTokenizer::Token | current_token_ |
| Current token (cached) | |
| bool | has_current_ = false |
| Whether we have a cached current token. | |
Recursive descent parser for ProForma v2 peptidoform notation.
This class parses ProForma strings into an Abstract Syntax Tree (AST) representation. The AST structures are defined in ProFormaData.h.
The parser implements the ProForma v2 grammar:
Usage example:
|
explicitprivate |
Private constructor - use static methods.
|
private |
Consume and return the current token.
|
static |
Check if mass can be calculated for a Peptidoform.
Returns true if all components have known masses:
| [in] | pf | The Peptidoform to check (modifications will be resolved if needed) |
|
static |
Check if mass can be calculated for a PeptidoformIon.
Returns true if mass can be calculated for all chains. Cross-links are handled correctly (cross-linker mass counted once).
| [in] | pfi | The PeptidoformIon to check |
|
static |
Check if a theoretical spectrum can be generated for a Peptidoform.
Returns true if the Peptidoform can be converted to AASequence and fragmented. Use getSpectrumGenerationIssues() to get detailed diagnostics.
| [in] | pf | The Peptidoform to check |
|
static |
Check if a theoretical spectrum can be generated for a PeptidoformIon.
Returns true if the PeptidoformIon can be fragmented. For cross-linked peptides, both chains must be convertible. Chimeric spectra are not supported.
| [in] | pfi | The PeptidoformIon to check |
|
private |
Check if current token matches expected type.
|
private |
Create a lookahead tokenizer positioned at the current logical position.
|
private |
Get the current token.
|
private |
Throw a parse error at the current position.
|
private |
Throw a parse error at a specific position.
|
private |
Expect a specific token type, throw error if not found.
|
static |
Create a Peptidoform from an OpenMS AASequence.
Converts an AASequence with modifications to ProForma notation. Uses CV accessions (UNIMOD) where available, otherwise named modifications.
| [in] | seq | The AASequence to convert |
|
static |
Generate a theoretical MS/MS spectrum for a Peptidoform.
Converts the Peptidoform to AASequence and uses TheoreticalSpectrumGenerator to generate fragment ions based on the specified ion types.
| [in] | pf | The Peptidoform to fragment |
| [in] | min_charge | Minimum fragment ion charge state |
| [in] | max_charge | Maximum fragment ion charge state |
| [in] | ion_types | String specifying which ion types to generate: 'a','b','c','x','y','z' for ion series, 'M' for precursor peaks, 'I' for immonium ions. Example: "by" for b/y ions, "byM" for b/y + precursor |
| [in] | add_losses | If true, include neutral loss peaks (H2O, NH3) |
| [in] | add_metainfo | If true, include ion annotations in spectrum |
| Exception::ConversionError | if spectrum generation fails |
|
static |
Generate a theoretical MS/MS spectrum for a PeptidoformIon.
For single-chain peptides, uses TheoreticalSpectrumGenerator. For cross-linked peptides (// separator), uses TheoreticalSpectrumGeneratorXLMS. Chimeric spectra are not supported.
| [in] | pfi | The PeptidoformIon to fragment |
| [in] | min_charge | Minimum fragment ion charge state |
| [in] | max_charge | Maximum fragment ion charge state |
| [in] | ion_types | String specifying which ion types to generate: 'a','b','c','x','y','z' for ion series, 'M' for precursor peaks, 'I' for immonium ions. Example: "by" for b/y ions, "abyM" for a/b/y + precursor |
| [in] | add_losses | If true, include neutral loss peaks |
| [in] | add_metainfo | If true, include ion annotations |
| Exception::ConversionError | if spectrum generation fails |
|
static |
Get a list of all issues that would arise during AASequence conversion.
Returns detailed information about every aspect of the Peptidoform that cannot be represented in an AASequence.
| [in] | pf | The Peptidoform to analyze |
|
static |
Get issues preventing mass calculation for a Peptidoform.
Returns detailed information about components that prevent mass calculation.
| [in] | pf | The Peptidoform to analyze |
|
static |
Get issues preventing mass calculation for a PeptidoformIon.
Returns detailed information about components that prevent mass calculation across all chains.
| [in] | pfi | The PeptidoformIon to analyze |
|
static |
Calculate monoisotopic mass of a Peptidoform.
Calculates the neutral monoisotopic mass including:
| [in] | pf | The Peptidoform to calculate mass for |
| Exception::InvalidValue | if mass cannot be calculated (use canCalculateMass() first) |
|
static |
Calculate monoisotopic mass of a PeptidoformIon.
For cross-linked peptides, calculates the combined mass of all chains. Cross-linker masses are counted only once per cross-link group.
| [in] | pfi | The PeptidoformIon to calculate mass for |
| Exception::InvalidValue | if mass cannot be calculated |
| Exception::InvalidValue | if pfi is chimeric (use getMonoWeight on individual chains) |
|
static |
Calculate m/z for a Peptidoform at a given charge state.
| [in] | pf | The Peptidoform to calculate m/z for |
| [in] | charge | The charge state (must be non-zero) |
| Exception::InvalidValue | if mass cannot be calculated or charge is zero |
|
static |
Calculate m/z for a PeptidoformIon at its specified charge state.
Uses the charge state from the PeptidoformIon if present.
| [in] | pfi | The PeptidoformIon with charge state |
| Exception::InvalidValue | if mass cannot be calculated or no charge state |
|
static |
Get issues preventing spectrum generation for a Peptidoform.
| [in] | pf | The Peptidoform to analyze |
|
static |
Get issues preventing spectrum generation for a PeptidoformIon.
| [in] | pfi | The PeptidoformIon to analyze |
|
private |
Check if current position has N-terminal modification pattern ([mod]-)
|
staticprivate |
Check if identifier is a valid amino acid.
|
private |
Check if at end of input.
|
static |
Check if a Peptidoform can be fully represented as an AASequence.
Returns true if all modifications can be resolved and there are no unsupported features (ambiguous regions, cross-links, etc.)
| [in] | pf | The Peptidoform to check |
|
private |
Check if the current position could start a modification tag content.
|
private |
Check if current token matches expected type, consume if true.
|
static |
Parse a ProForma string into a Peptidoform AST.
This is the main entry point for parsing simple peptidoforms without charge state information.
| [in] | input | The ProForma string to parse |
| ProFormaParseError | if the input is invalid |
|
private |
Parse a single adduct ion: Na:z+1.
|
private |
Parse adduct ions: [Na:z+1, H:z+1].
|
private |
Parse an ambiguous region: (?XY)
|
private |
Parse charge state: /2, /+2, /[Na:z+1].
|
private |
Parse a CV accession: UNIMOD:35, MOD:00046.
|
private |
Parse a CV database prefix from identifier.
|
private |
Parse a formula tag: Formula:C12H20O2.
|
private |
Parse a single global modification entry.
|
private |
Parse global modification with locations: <[mod]@locations>
|
private |
Parse global modifications: < ... >
|
private |
Parse a glycan composition: Glycan:HexNAc1Hex2.
|
private |
Parse an info tag: INFO:text.
|
static |
Parse a ProForma string into a PeptidoformIon AST.
This entry point handles the full ProForma notation including:
| [in] | input | The ProForma string to parse |
| ProFormaParseError | if the input is invalid |
|
private |
Parse isotope replacement: <13C>, <15N>, <D>
|
private |
Parse a label: #XL1, #BRANCH, #g1(0.90)
|
private |
Parse labile modifications: {mod}.
|
private |
Parse a mass delta: +15.9949, Obs:+79.978.
|
private |
Parse a single modification (may have alternatives with |)
|
private |
Parse a modification list: [mod1, mod2, ...].
|
private |
Parse a modification tag.
|
private |
Parse a single modification tag (no alternatives)
|
private |
Parse a modified range: (XYZ)[mod].
|
private |
Parse a named modification: Oxidation, U:Oxidation.
|
private |
Parse a named modification with a known CV hint prefix.
|
private |
Parse a single Peptidoform (one chain)
|
private |
Parse a complete PeptidoformIon (multiple chains + charge)
|
private |
Parse a Peptidoform with optional per-chain charge (for chimeric spectra)
| [in] | is_chimeric_context | If true, parse trailing charge as per-chain charge |
|
private |
Parse a position constraint: Position:MKC.
|
private |
Parse the amino acid sequence with modifications.
|
private |
Parse a single sequence element (amino acid + mods)
|
private |
Parse terminal modifications: [mod1][mod2]...
|
private |
Parse unlocalised modifications: [mod]?
|
private |
Look at the next token without consuming.
|
static |
Resolve all modifications in a Peptidoform using ModificationsDB.
Looks up each modification tag (CV accession, named mod, mass delta) in ModificationsDB and stores the resolved ResidueModification pointer.
| [in,out] | pf | The Peptidoform to resolve (modified in place) |
|
static |
Convert a Peptidoform to an OpenMS AASequence.
| [in] | pf | The Peptidoform to convert |
| [in] | policy | How to handle unconvertible modifications |
| Exception::ConversionError | if STRICT policy and conversion not possible |
|
static |
Convert a Peptidoform AST back to ProForma string notation.
| [in] | pf | The Peptidoform to convert |
| [in] | mode | Write mode: LOSSLESS preserves original formatting, CANONICAL produces normalized output |
|
static |
Convert a PeptidoformIon AST back to ProForma string notation.
| [in] | pfi | The PeptidoformIon to convert |
| [in] | mode | Write mode: LOSSLESS preserves original formatting, CANONICAL produces normalized output |
|
static |
Try to calculate monoisotopic mass of a Peptidoform (non-throwing)
Single-pass calculation that resolves modifications and calculates mass. More efficient than calling canCalculateMass() followed by getMonoWeight().
| [in] | pf | The Peptidoform to calculate mass for |
|
static |
Try to calculate monoisotopic mass with diagnostic information.
Single-pass calculation that also collects any issues preventing calculation.
| [in] | pf | The Peptidoform to calculate mass for |
| [out] | issues_out | Vector to receive any issues (cleared first) |
|
static |
Try to calculate monoisotopic mass of a PeptidoformIon (non-throwing)
| [in] | pfi | The PeptidoformIon to calculate mass for |
|
static |
Try to calculate monoisotopic mass of PeptidoformIon with diagnostics.
| [in] | pfi | The PeptidoformIon to calculate mass for |
| [out] | issues_out | Vector to receive any issues (cleared first) |
|
static |
Try to calculate m/z for a Peptidoform (non-throwing)
| [in] | pf | The Peptidoform to calculate m/z for |
| [in] | charge | The charge state (must be non-zero) |
|
static |
Try to calculate m/z for a Peptidoform with diagnostics.
| [in] | pf | The Peptidoform to calculate m/z for |
| [in] | charge | The charge state (must be non-zero) |
| [out] | issues_out | Vector to receive any issues (cleared first) |
|
static |
Try to calculate m/z for a PeptidoformIon (non-throwing)
| [in] | pfi | The PeptidoformIon with charge state |
|
static |
Try to calculate m/z for a PeptidoformIon with diagnostics.
| [in] | pfi | The PeptidoformIon with charge state |
| [out] | issues_out | Vector to receive any issues (cleared first) |
|
private |
Current token (cached)
|
private |
Whether we have a cached current token.
|
private |
The original input string (for error messages)
|
private |
The tokenizer for lexical analysis.