OpenMS  2.8.0
Classes | Static Public Member Functions | Static Public Attributes | Private Types | List of all members
DecoyHelper Class Reference

Helper class for calculations on decoy proteins. More...

#include <OpenMS/DATASTRUCTURES/FASTAContainer.h>

Collaboration diagram for DecoyHelper:
[legend]

Classes

struct  Result
 

Static Public Member Functions

template<typename T >
static Result findDecoyString (FASTAContainer< T > &proteins)
 Heuristic to determine the decoy string given a set of protein names. More...
 

Static Public Attributes

static const std::vector< std::string > affixes = { "decoy", "dec", "reverse", "rev", "reversed", "__id_decoy", "xxx", "shuffled", "shuffle", "pseudo", "random" }
 
static const std::string regexstr_prefix = std::string("^(") + ListUtils::concatenate<std::string>(affixes, "_*|") + "_*)"
 
static const std::string regexstr_suffix = std::string("(_") + ListUtils::concatenate<std::string>(affixes, "*|_") + ")$"
 

Private Types

using DecoyStringToAffixCount = std::unordered_map< std::string, std::pair< Size, Size > >
 
using CaseInsensitiveToCaseSensitiveDecoy = std::unordered_map< std::string, std::string >
 

Detailed Description

Helper class for calculations on decoy proteins.


Class Documentation

◆ OpenMS::DecoyHelper::Result

struct OpenMS::DecoyHelper::Result
Collaboration diagram for DecoyHelper::Result:
[legend]
Class Members
bool is_prefix on success, was it a prefix or suffix
String name on success, what was the decoy string?
bool success did more than 40% of proteins have the *same* prefix or suffix

Member Typedef Documentation

◆ CaseInsensitiveToCaseSensitiveDecoy

using CaseInsensitiveToCaseSensitiveDecoy = std::unordered_map<std::string, std::string>
private

◆ DecoyStringToAffixCount

using DecoyStringToAffixCount = std::unordered_map<std::string, std::pair<Size, Size> >
private

Member Function Documentation

◆ findDecoyString()

static Result findDecoyString ( FASTAContainer< T > &  proteins)
inlinestatic

Heuristic to determine the decoy string given a set of protein names.

Tested decoy strings are "decoy", "dec", "reverse", "rev", "__id_decoy", "xxx", "shuffled", "shuffle", "pseudo" and "random". Both prefix and suffix is tested and if one of the candidates above is found in at least 40% of all proteins, it is returned as the winner (see DecoyHelper::Result).

References OPENMS_LOG_DEBUG, OPENMS_LOG_ERROR, OPENMS_LOG_WARN, OpenMS::StringUtils::prefix(), DecoyHelper::regexstr_prefix, DecoyHelper::regexstr_suffix, OpenMS::StringUtils::suffix(), and String::toLower().

Member Data Documentation

◆ affixes

const std::vector<std::string> affixes = { "decoy", "dec", "reverse", "rev", "reversed", "__id_decoy", "xxx", "shuffled", "shuffle", "pseudo", "random" }
inlinestatic

◆ regexstr_prefix

const std::string regexstr_prefix = std::string("^(") + ListUtils::concatenate<std::string>(affixes, "_*|") + "_*)"
inlinestatic

◆ regexstr_suffix

const std::string regexstr_suffix = std::string("(_") + ListUtils::concatenate<std::string>(affixes, "*|_") + ")$"
inlinestatic