OpenMS
MRMRTNormalizer Class Reference

The MRMRTNormalizer will find retention time peptides in data. More...

#include <OpenMS/ANALYSIS/OPENSWATH/MRMRTNormalizer.h>

Static Public Member Functions

static std::vector< std::pair< double, double > > removeOutliersRANSAC (const std::vector< std::pair< double, double > > &pairs, double rsq_limit, double coverage_limit, size_t max_iterations, double max_rt_threshold, size_t sampling_size)
 This function removes potential outliers in a linear regression dataset. More...
 
static std::vector< std::pair< double, double > > removeOutliersIterative (const std::vector< std::pair< double, double > > &pairs, double rsq_limit, double coverage_limit, bool use_chauvenet, const std::string &method)
 This function removes potential outliers in a linear regression dataset. More...
 
static double chauvenet_probability (const std::vector< double > &residuals, int pos)
 This function computes Chauvenet's criterion probability for a vector and a value whose position is submitted. More...
 
static bool chauvenet (const std::vector< double > &residuals, int pos)
 This function computes Chauvenet's criterion for a vector and a value whose position is submitted. More...
 
static bool computeBinnedCoverage (const std::pair< double, double > &rtRange, const std::vector< std::pair< double, double > > &pairs, int nrBins, int minPeptidesPerBin, int minBinsFilled)
 Computes coverage of the RT normalization peptides over the whole RT range, ensuring that each bin has enough peptides. More...
 

Static Protected Member Functions

static int jackknifeOutlierCandidate_ (const std::vector< double > &x, const std::vector< double > &y)
 This function computes a candidate outlier peptide by iteratively leaving one peptide out to find the one which results in the maximum R^2 of a first order linear regression of the remaining ones. The data points are submitted as two vectors of doubles (x- and y-coordinates). More...
 
static int residualOutlierCandidate_ (const std::vector< double > &x, const std::vector< double > &y)
 This function computes a candidate outlier peptide by computing the residuals of all points to the linear fit and selecting the one with the largest deviation. The data points are submitted as two vectors of doubles (x- and y-coordinates). More...
 

Detailed Description

The MRMRTNormalizer will find retention time peptides in data.

This tool will take a description of RT peptides and their normalized retention time to write out a transformation file on how to transform the RT space into the normalized space.

The principle is adapted from the following publication: Escher, C. et al. (2012), Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics, 12: 1111-1121.

Member Function Documentation

◆ chauvenet()

static bool chauvenet ( const std::vector< double > &  residuals,
int  pos 
)
static

This function computes Chauvenet's criterion for a vector and a value whose position is submitted.

Returns
TRUE, if Chauvenet's criterion is fulfilled and the outlier can be removed.

◆ chauvenet_probability()

static double chauvenet_probability ( const std::vector< double > &  residuals,
int  pos 
)
static

This function computes Chauvenet's criterion probability for a vector and a value whose position is submitted.

Returns
Chauvenet's criterion probability

◆ computeBinnedCoverage()

static bool computeBinnedCoverage ( const std::pair< double, double > &  rtRange,
const std::vector< std::pair< double, double > > &  pairs,
int  nrBins,
int  minPeptidesPerBin,
int  minBinsFilled 
)
static

Computes coverage of the RT normalization peptides over the whole RT range, ensuring that each bin has enough peptides.

Parameters
rtRangeThe (estimated) full RT range in iRT space (theoretical RT)
pairsThe RT normalization peptide pairs (pair = experimental RT / theoretical RT)
nrBinsThe number of bins to be used
minPeptidesPerBinThe minimal number of peptides per bin to be used to be considered full
minBinsFilledThe minimal number of bins needed to be full
Returns
Whether more than the minimal number of bins are covered

◆ jackknifeOutlierCandidate_()

static int jackknifeOutlierCandidate_ ( const std::vector< double > &  x,
const std::vector< double > &  y 
)
staticprotected

This function computes a candidate outlier peptide by iteratively leaving one peptide out to find the one which results in the maximum R^2 of a first order linear regression of the remaining ones. The data points are submitted as two vectors of doubles (x- and y-coordinates).

Returns
The position of the candidate outlier peptide as supplied by the vector is returned.
Exceptions
Exception::UnableToFitis thrown if fitting cannot be performed

◆ removeOutliersIterative()

static std::vector<std::pair<double, double> > removeOutliersIterative ( const std::vector< std::pair< double, double > > &  pairs,
double  rsq_limit,
double  coverage_limit,
bool  use_chauvenet,
const std::string &  method 
)
static

This function removes potential outliers in a linear regression dataset.

Two thresholds need to be defined, first a lower R^2 limit to accept the regression for the RT normalization and second, the lower limit of peptide coverage. The algorithms then selects candidate outlier peptides and applies the Chauvenet's criterion on the assumption that the residuals are normal distributed to determine whether the peptides can be removed. This is done iteratively until both limits are reached.

Parameters
pairsInput data (paired data of type <experimental_rt, theoretical_rt>)
rsq_limitMinimal R^2 required
coverage_limitMinimal coverage required (the number of points falls below this fraction, the algorithm aborts)
use_chauvenetWhether to only remove outliers that fulfill Chauvenet's criterion for outliers (otherwise it will remove any outlier candidate regardless of the criterion)
methodOutlier detection method ("iter_jackknife" or "iter_residual")
Returns
A vector of pairs is returned if the R^2 limit was reached without reaching the coverage limit. If the limits are reached, an exception is thrown.
Exceptions
Exception::UnableToFitis thrown if fitting cannot be performed (rsq_limit and coverage_limit cannot be fulfilled)

◆ removeOutliersRANSAC()

static std::vector<std::pair<double, double> > removeOutliersRANSAC ( const std::vector< std::pair< double, double > > &  pairs,
double  rsq_limit,
double  coverage_limit,
size_t  max_iterations,
double  max_rt_threshold,
size_t  sampling_size 
)
static

This function removes potential outliers in a linear regression dataset.

Two thresholds need to be defined, first a lower R^2 limit to accept the regression for the RT normalization and second, the lower limit of peptide coverage. The algorithms then selects candidate outlier peptides using the RANSAC outlier detection algorithm and returns the corrected set of peptides if the two thresholds are satisfied.

Parameters
pairsInput data (paired data of type <experimental_rt, theoretical_rt>)
rsq_limitMinimal R^2 required
coverage_limitMinimal coverage required (if the number of points falls below this fraction, the algorithm aborts)
max_iterationsMaximum iterations for the RANSAC algorithm
max_rt_thresholdMaximum deviation from fit for the retention time. This must be in the unit of the second dimension (e.g. theoretical_rt).
sampling_sizeThe number of data points to sample for the RANSAC algorithm.
Returns
A vector of pairs is returned if the R^2 limit was reached without reaching the coverage limit. If the limits are reached, an exception is thrown.
Exceptions
Exception::UnableToFitis thrown if fitting cannot be performed (rsq_limit and coverage_limit cannot be fulfilled)

◆ residualOutlierCandidate_()

static int residualOutlierCandidate_ ( const std::vector< double > &  x,
const std::vector< double > &  y 
)
staticprotected

This function computes a candidate outlier peptide by computing the residuals of all points to the linear fit and selecting the one with the largest deviation. The data points are submitted as two vectors of doubles (x- and y-coordinates).

Returns
The position of the candidate outlier peptide as supplied by the vector is returned.
Exceptions
Exception::UnableToFitis thrown if fitting cannot be performed