OpenMS
ProteaseDigestion Class Reference

Class for the enzymatic digestion of proteins represented as AASequence or String. More...

#include <OpenMS/CHEMISTRY/ProteaseDigestion.h>

Inheritance diagram for ProteaseDigestion:
[legend]
Collaboration diagram for ProteaseDigestion:
[legend]

Public Member Functions

void setEnzyme (const String &name)
 Sets the enzyme for the digestion (by name) More...
 
Size digest (const AASequence &protein, std::vector< AASequence > &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of a protein represented as AASequence. More...
 
Size digest (const AASequence &protein, std::vector< std::pair< size_t, size_t >> &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of a protein represented as AASequence. More...
 
Size peptideCount (const AASequence &protein)
 Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings. More...
 
bool isValidProduct (const String &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true, bool allow_nterm_protein_cleavage=false, bool allow_random_asp_pro_cleavage=false) const
 Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage. More...
 
bool isValidProduct (const AASequence &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true, bool allow_nterm_protein_cleavage=false, bool allow_random_asp_pro_cleavage=false) const
 forwards to isValidProduct using protein.toUnmodifiedString() More...
 
virtual void setEnzyme (const DigestionEnzyme *enzyme)
 Sets the enzyme for the digestion. More...
 
- Public Member Functions inherited from EnzymaticDigestion
 EnzymaticDigestion ()
 Default constructor. More...
 
 EnzymaticDigestion (const EnzymaticDigestion &rhs)
 Copy constructor. More...
 
EnzymaticDigestionoperator= (const EnzymaticDigestion &rhs)
 Assignment operator. More...
 
virtual ~EnzymaticDigestion ()
 Destructor. More...
 
Size getMissedCleavages () const
 Returns the number of missed cleavages for the digestion. More...
 
void setMissedCleavages (Size missed_cleavages)
 Sets the number of missed cleavages for the digestion (default is 0). This setting is ignored when log model is used. More...
 
String getEnzymeName () const
 Returns the enzyme for the digestion. More...
 
virtual void setEnzyme (const DigestionEnzyme *enzyme)
 Sets the enzyme for the digestion. More...
 
Specificity getSpecificity () const
 Returns the specificity for the digestion. More...
 
void setSpecificity (Specificity spec)
 Sets the specificity for the digestion (default is SPEC_FULL). More...
 
Size digestUnmodified (const StringView &sequence, std::vector< StringView > &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of an unmodified sequence. More...
 
Size digestUnmodified (const StringView &sequence, std::vector< std::pair< Size, Size >> &output, Size min_length=1, Size max_length=0) const
 Performs the enzymatic digestion of an unmodified sequence. More...
 
bool isValidProduct (const String &protein, int pep_pos, int pep_length, bool ignore_missed_cleavages=true) const
 Is the peptide fragment starting at position pep_pos with length pep_length within the sequence protein generated by the current enzyme? More...
 
Size countInternalCleavageSites (const String &sequence) const
 Counts the number of internal cleavage sites (missed cleavages) in a protein sequence. More...
 
bool filterByMissedCleavages (const String &sequence, const std::function< bool(const Int)> &filter) const
 Filter based on the number of missed cleavages. More...
 

Additional Inherited Members

- Public Types inherited from EnzymaticDigestion
enum  Specificity {
  SPEC_NONE = 0 , SPEC_SEMI = 1 , SPEC_FULL = 2 , SPEC_UNKNOWN = 3 ,
  SPEC_NOCTERM = 8 , SPEC_NONTERM = 9 , SIZE_OF_SPECIFICITY = 10
}
 when querying for valid digestion products, this determines if the specificity of the two peptide ends is considered important More...
 
- Static Public Member Functions inherited from EnzymaticDigestion
static Specificity getSpecificityByName (const String &name)
 
- Static Public Attributes inherited from EnzymaticDigestion
static const std::string NamesOfSpecificity [SIZE_OF_SPECIFICITY]
 Names of the Specificity. More...
 
static const std::string NoCleavage
 Name for no cleavage. More...
 
static const std::string UnspecificCleavage
 Name for unspecific cleavage. More...
 
- Protected Member Functions inherited from EnzymaticDigestion
bool isValidProduct_ (const String &sequence, int pos, int length, bool ignore_missed_cleavages, bool allow_nterm_protein_cleavage, bool allow_random_asp_pro_cleavage) const
 supports functionality for ProteaseDigestion as well (which is deeply weaved into the function) To avoid code duplication, this is stored here and called by wrappers. Do not duplicate the code, just for the sake of semantics (unless we can come up with a clean separation) Note: the overhead of allow_nterm_protein_cleavage and allow_random_asp_pro_cleavage is marginal; the main runtime is spend during tokenize_() More...
 
std::vector< int > tokenize_ (const String &sequence, int start=0, int end=-1) const
 Digests the sequence using the enzyme's regular expression. More...
 
Size digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< StringView > &output, Size min_length=0, Size max_length=-1) const
 Helper function for digestUnmodified() More...
 
Size digestAfterTokenize_ (const std::vector< int > &fragment_positions, const StringView &sequence, std::vector< std::pair< Size, Size >> &output, Size min_length=0, Size max_length=-1) const
 
Size countMissedCleavages_ (const std::vector< int > &cleavage_positions, Size seq_start, Size seq_end) const
 Counts the number of missed cleavages in a sequence fragment. More...
 
- Protected Attributes inherited from EnzymaticDigestion
Size missed_cleavages_
 Number of missed cleavages. More...
 
const DigestionEnzymeenzyme_
 Used enzyme. More...
 
std::unique_ptr< boost::regex > re_
 Regex for tokenizing (huge speedup by making this a member instead of stack object in tokenize_()) More...
 
Specificity specificity_
 specificity of enzyme More...
 

Detailed Description

Class for the enzymatic digestion of proteins represented as AASequence or String.

Digestion can be performed using simple regular expressions, e.g. [KR] | [^P] for trypsin. Also missed cleavages can be modeled, i.e. adjacent peptides are not cleaved due to enzyme malfunction/access restrictions. If n missed cleavages are allowed, all possible resulting peptides (cleaved and uncleaved) with up to n missed cleavages are returned. Thus no random selection of just n specific missed cleavage sites is performed.

An alternative model is also available in EnzymaticDigestionLogModel.

Member Function Documentation

◆ digest() [1/2]

Size digest ( const AASequence protein,
std::vector< AASequence > &  output,
Size  min_length = 1,
Size  max_length = 0 
) const

Performs the enzymatic digestion of a protein represented as AASequence.

Parameters
proteinSequence to digest
outputDigestion products (peptides)
min_lengthMinimal length of reported products
max_lengthMaximal length of reported products (0 = no restriction)
Returns
Number of discarded digestion products (which are not matching length restrictions)

◆ digest() [2/2]

Size digest ( const AASequence protein,
std::vector< std::pair< size_t, size_t >> &  output,
Size  min_length = 1,
Size  max_length = 0 
) const

Performs the enzymatic digestion of a protein represented as AASequence.

Parameters
proteinSequence to digest
outputDigestion products (start and end indices of peptides)
min_lengthMinimal length of reported products
max_lengthMaximal length of reported products (0 = no restriction)
Returns
Number of discarded digestion products (which are not matching length restrictions)

◆ isValidProduct() [1/2]

bool isValidProduct ( const AASequence protein,
int  pep_pos,
int  pep_length,
bool  ignore_missed_cleavages = true,
bool  allow_nterm_protein_cleavage = false,
bool  allow_random_asp_pro_cleavage = false 
) const

forwards to isValidProduct using protein.toUnmodifiedString()

◆ isValidProduct() [2/2]

bool isValidProduct ( const String protein,
int  pep_pos,
int  pep_length,
bool  ignore_missed_cleavages = true,
bool  allow_nterm_protein_cleavage = false,
bool  allow_random_asp_pro_cleavage = false 
) const

Variant of EnzymaticDigestion::isValidProduct() with support for n-term protein cleavage and random D|P cleavage.

Checks if peptide is a valid digestion product of the enzyme, taking into account specificity and the flags provided here.

Parameters
proteinProtein sequence
pep_posStarting index of potential peptide
pep_lengthLength of potential peptide
ignore_missed_cleavagesDo not compare MC's of potential peptide to the maximum allowed MC's
allow_nterm_protein_cleavageRegard peptide as n-terminal of protein if it starts only at pos=1 or 2 and protein starts with 'M'
allow_random_asp_pro_cleavageAllow cleavage at D|P sites to count as n/c-terminal.
Returns
True if peptide has correct n/c terminals (according to enzyme, specificity and above flags)

Referenced by IDFilter::DigestionFilter::operator()().

◆ peptideCount()

Size peptideCount ( const AASequence protein)

Returns the number of peptides a digestion of protein would yield under the current enzyme and missed cleavage settings.

◆ setEnzyme() [1/2]

virtual void setEnzyme

Sets the enzyme for the digestion.

◆ setEnzyme() [2/2]

void setEnzyme ( const String name)

Sets the enzyme for the digestion (by name)