OpenMS
Loading...
Searching...
No Matches
RNaseDigestion Class Reference

Class for the enzymatic digestion of RNAs. More...

#include <OpenMS/CHEMISTRY/RNaseDigestion.h>

Inheritance diagram for RNaseDigestion:
[legend]
Collaboration diagram for RNaseDigestion:
[legend]

Classes

struct  CleavageSensitiveModGroups
 Cleavage-sensitive modification groups split by cleavage direction. More...
 
struct  DigestionProduct
 Detailed digestion product including sequence and parent coordinates. More...
 

Public Types

using ConstRibonucleotidePtr = const Ribonucleotide *
 
- Public Types inherited from EnzymaticDigestion
enum  Specificity {
  SPEC_NONE = 0 , SPEC_SEMI = 1 , SPEC_FULL = 2 , SPEC_UNKNOWN = 3 ,
  SPEC_NOCTERM = 8 , SPEC_NONTERM = 9 , SIZE_OF_SPECIFICITY = 10
}
 when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More...
 

Public Member Functions

void setEnzyme (const DigestionEnzyme *enzyme) override
 Sets the enzyme for the digestion.
 
void setEnzyme (const String &name)
 Sets the enzyme for the digestion (by name)
 
void digest (const NASequence &rna, std::vector< NASequence > &output, Size min_length=0, Size max_length=0) const
 Performs the enzymatic digestion of a (potentially modified) RNA.
 
void digest (const NASequence &rna, std::vector< DigestionProduct > &output, Size min_length=0, Size max_length=0) const
 Performs the enzymatic digestion of a RNA and returns fragments with parent coordinates.
 
std::vector< std::pair< Size, Size > > getFragmentPositions (const NASequence &rna, Size min_length=0, Size max_length=0) const
 Returns the positions of digestion products in the RNA as pairs: (start, length)
 
CleavageSensitiveModGroups inferCleavageSensitiveMods (const std::set< ConstRibonucleotidePtr > &variable_modifications) const
 Infer which variable modifications can block cleavage for the configured enzyme.
 
void digestWithCleavageSensitiveMods (const NASequence &rna, const CleavageSensitiveModGroups &cleavage_sensitive_mods, Size max_sensitive_mods_per_fragment, std::vector< DigestionProduct > &output, Size min_length=0, Size max_length=0) const
 Digest RNA while allowing cleavage-sensitive modifications to block adjacent cuts.
 
void digest (IdentificationData &id_data, Size min_length=0, Size max_length=0) const
 Performs the enzymatic digestion of all RNA parent sequences in IdentificationData.
 
- Public Member Functions inherited from EnzymaticDigestion
 EnzymaticDigestion ()
 Default constructor.
 
 EnzymaticDigestion (const EnzymaticDigestion &rhs)
 Copy constructor.
 
EnzymaticDigestionoperator= (const EnzymaticDigestion &rhs)
 Assignment operator.
 
virtual ~EnzymaticDigestion ()
 Destructor.
 
Size getMissedCleavages () const
 Returns the number of missed cleavages for the digestion.
 
void setMissedCleavages (Size missed_cleavages)
 Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used.
 
String getEnzymeName () const
 Returns the enzyme for the digestion.
 
Specificity getSpecificity () const
 Returns the specificity for the digestion.
 
void setSpecificity (Specificity spec)
 Sets the specificity for the digestion (default is SPEC_FULL).
 
Size digestUnmodified (const StringView &sequence, std::vector< StringView > &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of an unmodified sequence.
 
Size digestUnmodified (const StringView &sequence, std::vector< std::pair< Size, Size > > &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of an unmodified sequence.
 
bool isValidProduct (const String &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true) const
 Is the peptide fragment starting at position pep_pos with length pep_length within the sequence protein generated by the current enzyme?
 
Size countInternalCleavageSites (const String &sequence) const
 Counts the number of internal cleavage sites (missed cleavages) in a protein sequence.
 
bool filterByMissedCleavages (const String &sequence, const std::function< bool(const Int)> &filter) const
 Filter based on the number of missed cleavages.
 

Protected Member Functions

std::vector< std::pair< Size, Size > > getFragmentPositions_ (const NASequence &rna, Size min_length, Size max_length) const
 Returns the positions of digestion products in the RNA as pairs: (start, length)
 
void applyTerminalGains_ (NASequence &fragment, const std::pair< Size, Size > &pos, Size parent_size) const
 Apply enzyme-specific 5'/3' terminal gains to a fragment based on its parent coordinates.
 
- Protected Member Functions inherited from EnzymaticDigestion
bool isValidProduct_ (const String &sequence, int pos, int length, bool ignore_missed_cleavages, bool allow_nterm_protein_cleavage, bool allow_random_asp_pro_cleavage) const
 supports functionality for ProteaseDigestion as well (which is deeply weaved into the function) To avoid code duplication, this is stored here and called by wrappers. Do not duplicate the code, just for the sake of semantics (unless we can come up with a clean separation) Note: the overhead of allow_nterm_protein_cleavage and allow_random_asp_pro_cleavage is marginal; the main runtime is spend during tokenize_()
 
std::vector< int > tokenize_ (const String &sequence, int start=0, int end=-1) const
 Digests the sequence using the enzyme's regular expression.
 
Size semiSpecificDigestion_ (const std::vector< int > &cleavage_positions, std::vector< std::pair< Size, Size > > &output, Size min_length=0, Size max_length=-1) const
 Generates semi-specific digestion products.
 
Size digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< StringView > &output, Size min_length=0, Size max_length=-1) const
 Helper function for digestUnmodified()
 
Size digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< std::pair< Size, Size > > &output, Size min_length=0, Size max_length=-1) const
 
Size countMissedCleavages_ (const std::vector< int > &cleavage_positions, Size seq_start, Size seq_end) const
 Counts the number of missed cleavages in a sequence fragment.
 

Protected Attributes

const Ribonucleotidefive_prime_gain_
 5' mod added by the enzyme
 
const Ribonucleotidethree_prime_gain_
 3' mod added by the enzyme
 
std::vector< boost::regex > cuts_after_regexes_
 a vector of reg. exp. for enzyme cutting pattern, each regex represents a single nucleotide
 
std::vector< boost::regex > cuts_before_regexes_
 a vector reg. exp. for enzyme cutting pattern
 
- Protected Attributes inherited from EnzymaticDigestion
Size missed_cleavages_
 Number of missed cleavages.
 
const DigestionEnzymeenzyme_
 Used enzyme.
 
std::unique_ptr< boost::regex > re_
 Regex for tokenizing (huge speedup by making this a member instead of stack object in tokenize_())
 
Specificity specificity_
 specificity of enzyme
 

Additional Inherited Members

- Static Public Member Functions inherited from EnzymaticDigestion
static Specificity getSpecificityByName (const String &name)
 
- Static Public Attributes inherited from EnzymaticDigestion
static const std::string NamesOfSpecificity [SIZE_OF_SPECIFICITY]
 Names of the Specificity.
 
static const std::string NoCleavage
 Name for no cleavage.
 
static const std::string UnspecificCleavage
 Name for unspecific cleavage.
 

Detailed Description

Class for the enzymatic digestion of RNAs.

See also
DigestionEnzymeRNA

Class Documentation

◆ OpenMS::RNaseDigestion::DigestionProduct

struct OpenMS::RNaseDigestion::DigestionProduct

Detailed digestion product including sequence and parent coordinates.

Collaboration diagram for RNaseDigestion::DigestionProduct:
[legend]
Class Members
NASequence fragment
pair< Size, Size > position

Member Typedef Documentation

◆ ConstRibonucleotidePtr

Member Function Documentation

◆ applyTerminalGains_()

void applyTerminalGains_ ( NASequence fragment,
const std::pair< Size, Size > &  pos,
Size  parent_size 
) const
protected

Apply enzyme-specific 5'/3' terminal gains to a fragment based on its parent coordinates.

◆ digest() [1/3]

void digest ( const NASequence rna,
std::vector< DigestionProduct > &  output,
Size  min_length = 0,
Size  max_length = 0 
) const

Performs the enzymatic digestion of a RNA and returns fragments with parent coordinates.

Only fragments of appropriate length (between min_length and max_length) are returned. Enzyme-specific terminal gains are applied to the reported fragment sequences.

◆ digest() [2/3]

void digest ( const NASequence rna,
std::vector< NASequence > &  output,
Size  min_length = 0,
Size  max_length = 0 
) const

Performs the enzymatic digestion of a (potentially modified) RNA.

Only fragments of appropriate length (between min_length and max_length) are returned.

Referenced by NucleicAcidSearchEngine::main_().

◆ digest() [3/3]

void digest ( IdentificationData id_data,
Size  min_length = 0,
Size  max_length = 0 
) const

Performs the enzymatic digestion of all RNA parent sequences in IdentificationData.

Digestion products are stored as IdentifiedOligos with corresponding ParentMatch annotations. Only fragments of appropriate length (between min_length and max_length) are included.

◆ digestWithCleavageSensitiveMods()

void digestWithCleavageSensitiveMods ( const NASequence rna,
const CleavageSensitiveModGroups cleavage_sensitive_mods,
Size  max_sensitive_mods_per_fragment,
std::vector< DigestionProduct > &  output,
Size  min_length = 0,
Size  max_length = 0 
) const

Digest RNA while allowing cleavage-sensitive modifications to block adjacent cuts.

Starting from the regular digest fragments, additional fragments are generated recursively by applying cleavage-sensitive modifications at fragment boundaries. The number of such applied modifications is limited by max_sensitive_mods_per_fragment. Enzyme terminal gains are applied to all returned fragment sequences.

Referenced by NucleicAcidSearchEngine::main_().

◆ getFragmentPositions()

std::vector< std::pair< Size, Size > > getFragmentPositions ( const NASequence rna,
Size  min_length = 0,
Size  max_length = 0 
) const

Returns the positions of digestion products in the RNA as pairs: (start, length)

This is useful when callers need to associate digested fragments with parent coordinates.

◆ getFragmentPositions_()

std::vector< std::pair< Size, Size > > getFragmentPositions_ ( const NASequence rna,
Size  min_length,
Size  max_length 
) const
protected

Returns the positions of digestion products in the RNA as pairs: (start, length)

◆ inferCleavageSensitiveMods()

CleavageSensitiveModGroups inferCleavageSensitiveMods ( const std::set< ConstRibonucleotidePtr > &  variable_modifications) const

Infer which variable modifications can block cleavage for the configured enzyme.

A modification is classified as cleavage-sensitive if its origin residue matches the enzyme cleavage regex at a boundary position, but the modified residue code no longer matches.

Referenced by NucleicAcidSearchEngine::main_().

◆ setEnzyme() [1/2]

void setEnzyme ( const DigestionEnzyme enzyme)
overridevirtual

Sets the enzyme for the digestion.

Reimplemented from EnzymaticDigestion.

Referenced by NucleicAcidSearchEngine::main_().

◆ setEnzyme() [2/2]

void setEnzyme ( const String name)

Sets the enzyme for the digestion (by name)

Member Data Documentation

◆ cuts_after_regexes_

std::vector<boost::regex> cuts_after_regexes_
protected

a vector of reg. exp. for enzyme cutting pattern, each regex represents a single nucleotide

◆ cuts_before_regexes_

std::vector<boost::regex> cuts_before_regexes_
protected

a vector reg. exp. for enzyme cutting pattern

◆ five_prime_gain_

const Ribonucleotide* five_prime_gain_
protected

5' mod added by the enzyme

◆ three_prime_gain_

const Ribonucleotide* three_prime_gain_
protected

3' mod added by the enzyme