![]() |
OpenMS
|
Class for the enzymatic digestion of RNAs. More...
#include <OpenMS/CHEMISTRY/RNaseDigestion.h>
Classes | |
| struct | CleavageSensitiveModGroups |
| Cleavage-sensitive modification groups split by cleavage direction. More... | |
| struct | DigestionProduct |
| Detailed digestion product including sequence and parent coordinates. More... | |
Public Types | |
| using | ConstRibonucleotidePtr = const Ribonucleotide * |
Public Types inherited from EnzymaticDigestion | |
| enum | Specificity { SPEC_NONE = 0 , SPEC_SEMI = 1 , SPEC_FULL = 2 , SPEC_UNKNOWN = 3 , SPEC_NOCTERM = 8 , SPEC_NONTERM = 9 , SIZE_OF_SPECIFICITY = 10 } |
| when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More... | |
Public Member Functions | |
| void | setEnzyme (const DigestionEnzyme *enzyme) override |
| Sets the enzyme for the digestion. | |
| void | setEnzyme (const String &name) |
| Sets the enzyme for the digestion (by name) | |
| void | digest (const NASequence &rna, std::vector< NASequence > &output, Size min_length=0, Size max_length=0) const |
| Performs the enzymatic digestion of a (potentially modified) RNA. | |
| void | digest (const NASequence &rna, std::vector< DigestionProduct > &output, Size min_length=0, Size max_length=0) const |
| Performs the enzymatic digestion of a RNA and returns fragments with parent coordinates. | |
| std::vector< std::pair< Size, Size > > | getFragmentPositions (const NASequence &rna, Size min_length=0, Size max_length=0) const |
| Returns the positions of digestion products in the RNA as pairs: (start, length) | |
| CleavageSensitiveModGroups | inferCleavageSensitiveMods (const std::set< ConstRibonucleotidePtr > &variable_modifications) const |
| Infer which variable modifications can block cleavage for the configured enzyme. | |
| void | digestWithCleavageSensitiveMods (const NASequence &rna, const CleavageSensitiveModGroups &cleavage_sensitive_mods, Size max_sensitive_mods_per_fragment, std::vector< DigestionProduct > &output, Size min_length=0, Size max_length=0) const |
| Digest RNA while allowing cleavage-sensitive modifications to block adjacent cuts. | |
| void | digest (IdentificationData &id_data, Size min_length=0, Size max_length=0) const |
Performs the enzymatic digestion of all RNA parent sequences in IdentificationData. | |
Public Member Functions inherited from EnzymaticDigestion | |
| EnzymaticDigestion () | |
| Default constructor. | |
| EnzymaticDigestion (const EnzymaticDigestion &rhs) | |
| Copy constructor. | |
| EnzymaticDigestion & | operator= (const EnzymaticDigestion &rhs) |
| Assignment operator. | |
| virtual | ~EnzymaticDigestion () |
| Destructor. | |
| Size | getMissedCleavages () const |
| Returns the number of missed cleavages for the digestion. | |
| void | setMissedCleavages (Size missed_cleavages) |
| Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used. | |
| String | getEnzymeName () const |
| Returns the enzyme for the digestion. | |
| Specificity | getSpecificity () const |
| Returns the specificity for the digestion. | |
| void | setSpecificity (Specificity spec) |
| Sets the specificity for the digestion (default is SPEC_FULL). | |
| Size | digestUnmodified (const StringView &sequence, std::vector< StringView > &output, Size min_length=1, Size max_length=0) const |
| Performs the enzymatic digestion of an unmodified sequence. | |
| Size | digestUnmodified (const StringView &sequence, std::vector< std::pair< Size, Size > > &output, Size min_length=1, Size max_length=0) const |
| Performs the enzymatic digestion of an unmodified sequence. | |
| bool | isValidProduct (const String &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true) const |
Is the peptide fragment starting at position pep_pos with length pep_length within the sequence protein generated by the current enzyme? | |
| Size | countInternalCleavageSites (const String &sequence) const |
| Counts the number of internal cleavage sites (missed cleavages) in a protein sequence. | |
| bool | filterByMissedCleavages (const String &sequence, const std::function< bool(const Int)> &filter) const |
| Filter based on the number of missed cleavages. | |
Protected Member Functions | |
| std::vector< std::pair< Size, Size > > | getFragmentPositions_ (const NASequence &rna, Size min_length, Size max_length) const |
| Returns the positions of digestion products in the RNA as pairs: (start, length) | |
| void | applyTerminalGains_ (NASequence &fragment, const std::pair< Size, Size > &pos, Size parent_size) const |
| Apply enzyme-specific 5'/3' terminal gains to a fragment based on its parent coordinates. | |
Protected Member Functions inherited from EnzymaticDigestion | |
| bool | isValidProduct_ (const String &sequence, int pos, int length, bool ignore_missed_cleavages, bool allow_nterm_protein_cleavage, bool allow_random_asp_pro_cleavage) const |
| supports functionality for ProteaseDigestion as well (which is deeply weaved into the function) To avoid code duplication, this is stored here and called by wrappers. Do not duplicate the code, just for the sake of semantics (unless we can come up with a clean separation) Note: the overhead of allow_nterm_protein_cleavage and allow_random_asp_pro_cleavage is marginal; the main runtime is spend during tokenize_() | |
| std::vector< int > | tokenize_ (const String &sequence, int start=0, int end=-1) const |
| Digests the sequence using the enzyme's regular expression. | |
| Size | semiSpecificDigestion_ (const std::vector< int > &cleavage_positions, std::vector< std::pair< Size, Size > > &output, Size min_length=0, Size max_length=-1) const |
| Generates semi-specific digestion products. | |
| Size | digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< StringView > &output, Size min_length=0, Size max_length=-1) const |
| Helper function for digestUnmodified() | |
| Size | digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< std::pair< Size, Size > > &output, Size min_length=0, Size max_length=-1) const |
| Size | countMissedCleavages_ (const std::vector< int > &cleavage_positions, Size seq_start, Size seq_end) const |
| Counts the number of missed cleavages in a sequence fragment. | |
Protected Attributes | |
| const Ribonucleotide * | five_prime_gain_ |
| 5' mod added by the enzyme | |
| const Ribonucleotide * | three_prime_gain_ |
| 3' mod added by the enzyme | |
| std::vector< boost::regex > | cuts_after_regexes_ |
| a vector of reg. exp. for enzyme cutting pattern, each regex represents a single nucleotide | |
| std::vector< boost::regex > | cuts_before_regexes_ |
| a vector reg. exp. for enzyme cutting pattern | |
Protected Attributes inherited from EnzymaticDigestion | |
| Size | missed_cleavages_ |
| Number of missed cleavages. | |
| const DigestionEnzyme * | enzyme_ |
| Used enzyme. | |
| std::unique_ptr< boost::regex > | re_ |
| Regex for tokenizing (huge speedup by making this a member instead of stack object in tokenize_()) | |
| Specificity | specificity_ |
| specificity of enzyme | |
Additional Inherited Members | |
Static Public Member Functions inherited from EnzymaticDigestion | |
| static Specificity | getSpecificityByName (const String &name) |
Static Public Attributes inherited from EnzymaticDigestion | |
| static const std::string | NamesOfSpecificity [SIZE_OF_SPECIFICITY] |
| Names of the Specificity. | |
| static const std::string | NoCleavage |
| Name for no cleavage. | |
| static const std::string | UnspecificCleavage |
| Name for unspecific cleavage. | |
Class for the enzymatic digestion of RNAs.
| struct OpenMS::RNaseDigestion::DigestionProduct |
Detailed digestion product including sequence and parent coordinates.
| Class Members | ||
|---|---|---|
| NASequence | fragment | |
| pair< Size, Size > | position | |
| using ConstRibonucleotidePtr = const Ribonucleotide* |
|
protected |
Apply enzyme-specific 5'/3' terminal gains to a fragment based on its parent coordinates.
| void digest | ( | const NASequence & | rna, |
| std::vector< DigestionProduct > & | output, | ||
| Size | min_length = 0, |
||
| Size | max_length = 0 |
||
| ) | const |
Performs the enzymatic digestion of a RNA and returns fragments with parent coordinates.
Only fragments of appropriate length (between min_length and max_length) are returned. Enzyme-specific terminal gains are applied to the reported fragment sequences.
| void digest | ( | const NASequence & | rna, |
| std::vector< NASequence > & | output, | ||
| Size | min_length = 0, |
||
| Size | max_length = 0 |
||
| ) | const |
Performs the enzymatic digestion of a (potentially modified) RNA.
Only fragments of appropriate length (between min_length and max_length) are returned.
Referenced by NucleicAcidSearchEngine::main_().
| void digest | ( | IdentificationData & | id_data, |
| Size | min_length = 0, |
||
| Size | max_length = 0 |
||
| ) | const |
Performs the enzymatic digestion of all RNA parent sequences in IdentificationData.
Digestion products are stored as IdentifiedOligos with corresponding ParentMatch annotations. Only fragments of appropriate length (between min_length and max_length) are included.
| void digestWithCleavageSensitiveMods | ( | const NASequence & | rna, |
| const CleavageSensitiveModGroups & | cleavage_sensitive_mods, | ||
| Size | max_sensitive_mods_per_fragment, | ||
| std::vector< DigestionProduct > & | output, | ||
| Size | min_length = 0, |
||
| Size | max_length = 0 |
||
| ) | const |
Digest RNA while allowing cleavage-sensitive modifications to block adjacent cuts.
Starting from the regular digest fragments, additional fragments are generated recursively by applying cleavage-sensitive modifications at fragment boundaries. The number of such applied modifications is limited by max_sensitive_mods_per_fragment. Enzyme terminal gains are applied to all returned fragment sequences.
Referenced by NucleicAcidSearchEngine::main_().
| std::vector< std::pair< Size, Size > > getFragmentPositions | ( | const NASequence & | rna, |
| Size | min_length = 0, |
||
| Size | max_length = 0 |
||
| ) | const |
Returns the positions of digestion products in the RNA as pairs: (start, length)
This is useful when callers need to associate digested fragments with parent coordinates.
|
protected |
Returns the positions of digestion products in the RNA as pairs: (start, length)
| CleavageSensitiveModGroups inferCleavageSensitiveMods | ( | const std::set< ConstRibonucleotidePtr > & | variable_modifications | ) | const |
Infer which variable modifications can block cleavage for the configured enzyme.
A modification is classified as cleavage-sensitive if its origin residue matches the enzyme cleavage regex at a boundary position, but the modified residue code no longer matches.
Referenced by NucleicAcidSearchEngine::main_().
|
overridevirtual |
Sets the enzyme for the digestion.
Reimplemented from EnzymaticDigestion.
Referenced by NucleicAcidSearchEngine::main_().
| void setEnzyme | ( | const String & | name | ) |
Sets the enzyme for the digestion (by name)
|
protected |
a vector of reg. exp. for enzyme cutting pattern, each regex represents a single nucleotide
|
protected |
a vector reg. exp. for enzyme cutting pattern
|
protected |
5' mod added by the enzyme
|
protected |
3' mod added by the enzyme