OpenMS

Create and apply models of a mass recalibration function. More...
#include <OpenMS/PROCESSING/CALIBRATION/MZTrafoModel.h>
Classes  
struct  RTLess 
Comparator by position. As this class has dimension 1, this is basically an alias for MZLess. More...  
Public Types  
enum  MODELTYPE { LINEAR , LINEAR_WEIGHTED , QUADRATIC , QUADRATIC_WEIGHTED , SIZE_OF_MODELTYPE } 
Public Member Functions  
MZTrafoModel ()  
Default constructor. More...  
MZTrafoModel (bool ppm_model)  
Default constructor. More...  
bool  isTrained () const 
Does the model have coefficients (i.e. was trained successfully). More...  
double  getRT () const 
Get RT associated with the model (training region) More...  
double  predict (double mz) const 
Apply the model to an uncalibrated m/z value. More...  
bool  train (const CalibrationData &cd, MODELTYPE md, bool use_RANSAC, double rt_left=std::numeric_limits< double >::max(), double rt_right=std::numeric_limits< double >::max()) 
Train a model using calibrant data. More...  
bool  train (std::vector< double > error_mz, std::vector< double > theo_mz, std::vector< double > weights, MODELTYPE md, bool use_RANSAC) 
Train a model using calibrant data. More...  
void  getCoefficients (double &intercept, double &slope, double &power) 
Get model coefficients. More...  
void  setCoefficients (const MZTrafoModel &rhs) 
Copy model coefficients from another model. More...  
void  setCoefficients (double intercept, double slope, double power) 
Manually set model coefficients. More...  
String  toString () const 
String representation of the model parameters. More...  
Static Public Member Functions  
static MODELTYPE  nameToEnum (const std::string &name) 
Convert string to enum. More...  
static const std::string &  enumToName (MODELTYPE mt) 
Convert enum to string. More...  
static void  setRANSACParams (const Math::RANSACParam &p) 
Set the global (program wide) parameters for RANSAC. More...  
static void  setRANSACSeed (int seed) 
Set RANSAC seed. More...  
static void  setCoefficientLimits (double offset, double scale, double power) 
Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model. More...  
static bool  isValidModel (const MZTrafoModel &trafo) 
Predicate to decide if the model has valid parameters, i.e. coefficients. More...  
static Size  findNearest (const std::vector< MZTrafoModel > &tms, double rt) 
Binary search for the model nearest to a specific RT. More...  
Static Public Attributes  
static const std::string  names_of_modeltype [] 
strings corresponding to enum MODELTYPE More...  
Private Attributes  
std::vector< double >  coeff_ 
Model coefficients (for both linear and quadratic models), estimated from the data. More...  
bool  use_ppm_ 
during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too More...  
double  rt_ 
retention time associated to the model (i.e. where the calibrant data was taken from) More...  
Static Private Attributes  
static Math::RANSACParam *  ransac_params_ 
global pointer, init to NULL at startup; set classglobal RANSAC params More...  
static int  ransac_seed_ 
seed used for all RANSAC invocations More...  
static double  limit_offset_ 
acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel()) More...  
static double  limit_scale_ 
acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel()) More...  
static double  limit_power_ 
acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel()) More...  
Create and apply models of a mass recalibration function.
The input is a list of calibration points (ideally spanning a wide m/z range to prevent extrapolation when applying to model).
Models (LINEAR, LINEAR_WEIGHTED, QUADRATIC, QUADRATIC_WEIGHTED) can be trained using CalData points (or a subset of them). Calibration points can have different retention time points, and a model should be build such that it captures the local (in time) decalibration of the instrument, i.e. choose appropriate time windows along RT to calibrate the spectra in this RT region. From the available calibrant data, a model is build. Later, any uncalibrated m/z value can be fed to the model, to obtain a calibrated m/z.
The input domain can either be absolute mass differences in [Th], or relative differences in [ppm]. The models are build based on this input.
Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.
enum MODELTYPE 
MZTrafoModel  (  ) 
Default constructor.
MZTrafoModel  (  bool  ppm_model  ) 
Default constructor.
If you have external coefficients, use this constructor and the setCoefficients() method to build a 'manual' model. Afterwards, use applyTransformation() or predict() to calibrate your data. If you call train(), the ppmsetting will be overwritten, depending on the type of training data.
ppm_model  Are the coefficients derived from ppm calibration data, or from absolute deltas? 

static 
Convert enum to string.
mt  The enum value 

static 
Binary search for the model nearest to a specific RT.
tms  Vector of models, sorted by RT 
rt  The target retention time 
Exception::Precondition  is thrown if the vector is empty (not only in debug mode) 
void getCoefficients  (  double &  intercept, 
double &  slope,  
double &  power  
) 
Get model coefficients.
Parameters will be filled with internal model parameters. The model must be trained before; Exception is thrown otherwise!
intercept  The intercept 
slope  The slope 
power  The coefficient for x*x (will be 0 for linear models) 
Exception::Precondition  if model is not trained yet 
double getRT  (  )  const 
Get RT associated with the model (training region)
bool isTrained  (  )  const 
Does the model have coefficients (i.e. was trained successfully).
Having coefficients does not mean its valid (see isValidModel(); since coeffs might be too large).

static 
Predicate to decide if the model has valid parameters, i.e. coefficients.
If the model coefficients are empty, no model was trained yet (or unsuccessful), causing a return value of 'false'.
Also, if the model has coefficients, we check if they are within the acceptable boundaries (if boundaries were given via setCoeffientLimits()).

static 
Convert string to enum.
Returns 'SIZE_OF_MODELTYPE' if string is unknown.
name  A string from names_of_modeltype[]. 
double predict  (  double  mz  )  const 
Apply the model to an uncalibrated m/z value.
Make sure the model was trained (train()) and is valid (isValidModel()) before calling this function!
Applies the function y = intercept + slope*mz + power*mz^2 and returns y.
mz  The uncalibrated m/z value 

static 
Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model.
Use std::numeric_limits<double>::max() for no limit (default). If isValidModel() is called these limits are checked. Negative input run through fabs() to get positive values (since comparison is done in absolute terms).
void setCoefficients  (  const MZTrafoModel &  rhs  ) 
Copy model coefficients from another model.
void setCoefficients  (  double  intercept, 
double  slope,  
double  power  
) 
Manually set model coefficients.
Can be used instead of train(), so manually set coefficients. It must be exactly three values. If you want a linear model, set 'power' to zero. If you want a constant model, set slope to zero in addition.
intercept  The offset 
slope  The slope 
power  The x*x coefficient (for quadratic models) 

static 
Set the global (program wide) parameters for RANSAC.
This is not done via member, to keep a small memory footprint since hundreds of MZTrafoModels are expected to be build at the same time and the RANSAC params should be identical for all of them.
p  RANSAC params 

static 
Set RANSAC seed.
String toString  (  )  const 
String representation of the model parameters.
Empty if model is not trained.
bool train  (  const CalibrationData &  cd, 
MODELTYPE  md,  
bool  use_RANSAC,  
double  rt_left = std::numeric_limits< double >::max() , 

double  rt_right = std::numeric_limits< double >::max() 

) 
Train a model using calibrant data.
If the CalibrationData was created using peak groups (usually corresponding to mass traces), the median for each group is used as a group representative. This is more robust, and reduces the number of data points drastically, i.e. one value per group.
Internally, these steps take place:
cd  List of calibrants 
md  Type of model (linear, quadratic, ...) 
use_RANSAC  Remove outliers before computing the model? 
rt_left  Filter 'cd' by RT; all calibrants with RT < 'rt_left' are removed 
rt_right  Filter 'cd' by RT; all calibrants with RT > 'rt_right' are removed 
bool train  (  std::vector< double >  error_mz, 
std::vector< double >  theo_mz,  
std::vector< double >  weights,  
MODELTYPE  md,  
bool  use_RANSAC  
) 
Train a model using calibrant data.
Given theoretical and observed mass values (and corresponding weights), a model (linear, quadratic, ...) is build. Outlier removal is applied before. The 'obs_mz' can be either given as absolute masses in [Th] or relative deviations in [ppm]. The MZTrafoModel must be constructed accordingly (see constructor). This has no influence on the model building itself, but rather on how 'predict()' works internally.
Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.
Internally, these steps take place:
error_mz  Observed Mass error (in ppm or Th) 
theo_mz  Theoretical m/z values, corresponding to 'error_mz' 
weights  For weighted models only: weight of calibrants; ignored otherwise 
md  Type of model (linear, quadratic, ...) 
use_RANSAC  Remove outliers before computing the model? 

private 
Model coefficients (for both linear and quadratic models), estimated from the data.

staticprivate 
acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel())

staticprivate 
acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel())

staticprivate 
acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel())

static 
strings corresponding to enum MODELTYPE

staticprivate 
global pointer, init to NULL at startup; set classglobal RANSAC params

staticprivate 
seed used for all RANSAC invocations

private 
retention time associated to the model (i.e. where the calibrant data was taken from)
Referenced by MZTrafoModel::RTLess::operator()().

private 
during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too