![]() |
OpenMS
|
Tokenizer for ProForma v2 peptidoform notation. More...
#include <OpenMS/CHEMISTRY/ProFormaTokenizer.h>
Classes | |
| struct | Token |
| A single token from the input stream. More... | |
Public Types | |
| enum class | TokenType { LBRACKET , RBRACKET , LPAREN , RPAREN , LBRACE , RBRACE , LANGLE , RANGLE , PLUS , MINUS , SLASH , PIPE , HASH , COLON , COMMA , CARET , QUESTION , AT , NUMBER , IDENTIFIER , END } |
| Token types produced by the tokenizer. More... | |
Public Member Functions | |
| ProFormaTokenizer (std::string_view input, size_t start_pos=0) | |
| Construct a tokenizer for the given input string. | |
| ~ProFormaTokenizer ()=default | |
| Default destructor. | |
| ProFormaTokenizer (const ProFormaTokenizer &)=default | |
| Copy constructor. | |
| ProFormaTokenizer (ProFormaTokenizer &&)=default | |
| Move constructor. | |
| ProFormaTokenizer & | operator= (const ProFormaTokenizer &)=default |
| Copy assignment operator. | |
| ProFormaTokenizer & | operator= (ProFormaTokenizer &&)=default |
| Move assignment operator. | |
| Token | next () |
| Consume and return the next token. | |
| Token | peek () |
| Look at the next token without consuming it. | |
| bool | hasMore () const |
| Check if more tokens are available. | |
| size_t | position () const |
| Get the current position in the input. | |
| std::string_view | getContext (size_t pos, size_t before=20, size_t after=20) const |
| Get a context string around a position for error messages. | |
Static Public Member Functions | |
| static const char * | tokenTypeName (TokenType type) |
| Get a human-readable name for a token type. | |
Private Member Functions | |
| Token | scanToken_ () |
| Scan and return the next token from the current position. | |
| Token | scanNumber_ () |
| Scan a number token (integer, decimal, optionally signed) | |
| Token | scanIdentifier_ () |
| Scan an identifier token (letter sequence) | |
| bool | isAtEnd_ () const |
| Check if we have reached the end of input. | |
| char | current_ () const |
| Get the current character (or '\0' if at end) | |
| char | peek_ (size_t offset) const |
| Get the character at offset from current position (or '\0' if out of bounds) | |
| char | advance_ () |
| Advance to the next character and return the previous one. | |
Static Private Member Functions | |
| static bool | isLetter_ (char c) |
| Check if a character is a letter (A-Za-z) | |
| static bool | isDigit_ (char c) |
| Check if a character is a digit (0-9) | |
Private Attributes | |
| std::string_view | input_ |
| The input string (must remain valid for tokenizer lifetime) | |
| size_t | pos_ = 0 |
| Current position in the input. | |
| std::optional< Token > | peeked_ |
| Cached peeked token (if any) | |
Tokenizer for ProForma v2 peptidoform notation.
This class provides lexical analysis (tokenization) for ProForma strings. It produces tokens suitable for parsing the ProForma grammar, supporting zero-copy operation via std::string_view for performance.
The tokenizer handles:
Usage example:
|
strong |
Token types produced by the tokenizer.
|
explicit |
Construct a tokenizer for the given input string.
| input | The ProForma string to tokenize. Must remain valid for the lifetime of this tokenizer. |
| start_pos | Optional starting position (default 0). Used for efficient lookahead without re-scanning from the beginning. |
|
default |
Default destructor.
|
default |
Copy constructor.
|
default |
Move constructor.
|
private |
Advance to the next character and return the previous one.
|
private |
Get the current character (or '\0' if at end)
| std::string_view getContext | ( | size_t | pos, |
| size_t | before = 20, |
||
| size_t | after = 20 |
||
| ) | const |
Get a context string around a position for error messages.
Returns a substring of the input centered around the given position, useful for providing context in error messages.
| pos | The position to center the context around |
| before | Maximum number of characters to include before pos |
| after | Maximum number of characters to include after pos |
| bool hasMore | ( | ) | const |
Check if more tokens are available.
|
private |
Check if we have reached the end of input.
|
staticprivate |
Check if a character is a digit (0-9)
|
staticprivate |
Check if a character is a letter (A-Za-z)
| Token next | ( | ) |
Consume and return the next token.
Advances the tokenizer position past the returned token.
|
default |
Copy assignment operator.
|
default |
Move assignment operator.
| Token peek | ( | ) |
|
private |
Get the character at offset from current position (or '\0' if out of bounds)
| size_t position | ( | ) | const |
Get the current position in the input.
|
private |
Scan an identifier token (letter sequence)
|
private |
Scan a number token (integer, decimal, optionally signed)
|
private |
Scan and return the next token from the current position.
|
static |
Get a human-readable name for a token type.
| type | The token type |
|
private |
The input string (must remain valid for tokenizer lifetime)
|
private |
Cached peeked token (if any)
|
private |
Current position in the input.