OpenMS
2.5.0
|
Create and apply models of a mass recalibration function. More...
#include <OpenMS/FILTERING/CALIBRATION/MZTrafoModel.h>
Classes | |
struct | RTLess |
Comparator by position. As this class has dimension 1, this is basically an alias for MZLess. More... | |
Public Types | |
enum | MODELTYPE { LINEAR, LINEAR_WEIGHTED, QUADRATIC, QUADRATIC_WEIGHTED, SIZE_OF_MODELTYPE } |
Public Member Functions | |
MZTrafoModel () | |
Default constructor. More... | |
MZTrafoModel (bool ppm_model) | |
Default constructor. More... | |
bool | isTrained () const |
Does the model have coefficients (i.e. was trained successfully). More... | |
double | getRT () const |
Get RT associated with the model (training region) More... | |
double | predict (double mz) const |
Apply the model to an uncalibrated m/z value. More... | |
bool | train (const CalibrationData &cd, MODELTYPE md, bool use_RANSAC, double rt_left=-std::numeric_limits< double >::max(), double rt_right=std::numeric_limits< double >::max()) |
Train a model using calibrant data. More... | |
bool | train (std::vector< double > error_mz, std::vector< double > theo_mz, std::vector< double > weights, MODELTYPE md, bool use_RANSAC) |
Train a model using calibrant data. More... | |
void | getCoefficients (double &intercept, double &slope, double &power) |
Get model coefficients. More... | |
void | setCoefficients (const MZTrafoModel &rhs) |
Copy model coefficients from another model. More... | |
void | setCoefficients (double intercept, double slope, double power) |
Manually set model coefficients. More... | |
String | toString () const |
String representation of the model parameters. More... | |
Static Public Member Functions | |
static MODELTYPE | nameToEnum (const std::string &name) |
Convert string to enum. More... | |
static const std::string & | enumToName (MODELTYPE mt) |
Convert enum to string. More... | |
static void | setRANSACParams (const Math::RANSACParam &p) |
Set the global (program wide) parameters for RANSAC. More... | |
static void | setCoefficientLimits (double offset, double scale, double power) |
Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model. More... | |
static bool | isValidModel (const MZTrafoModel &trafo) |
Predicate to decide if the model has valid parameters, i.e. coefficients. More... | |
static Size | findNearest (const std::vector< MZTrafoModel > &tms, double rt) |
Binary search for the model nearest to a specific RT. More... | |
Static Public Attributes | |
static const std::string | names_of_modeltype [] |
strings corresponding to enum MODELTYPE More... | |
Private Attributes | |
std::vector< double > | coeff_ |
Model coefficients (for both linear and quadratic models), estimated from the data. More... | |
bool | use_ppm_ |
during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too More... | |
double | rt_ |
retention time associated to the model (i.e. where the calibrant data was taken from) More... | |
Static Private Attributes | |
static Math::RANSACParam * | ransac_params_ |
global pointer, init to NULL at startup; set class-global RANSAC params More... | |
static double | limit_offset_ |
acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel()) More... | |
static double | limit_scale_ |
acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel()) More... | |
static double | limit_power_ |
acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel()) More... | |
Create and apply models of a mass recalibration function.
The input is a list of calibration points (ideally spanning a wide m/z range to prevent extrapolation when applying to model).
Models (LINEAR, LINEAR_WEIGHTED, QUADRATIC, QUADRATIC_WEIGHTED) can be trained using CalData points (or a subset of them). Calibration points can have different retention time points, and a model should be build such that it captures the local (in time) decalibration of the instrument, i.e. choose appropriate time windows along RT to calibrate the spectra in this RT region. From the available calibrant data, a model is build. Later, any uncalibrated m/z value can be fed to the model, to obtain a calibrated m/z.
The input domain can either be absolute mass differences in [Th], or relative differences in [ppm]. The models are build based on this input.
Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.
enum MODELTYPE |
MZTrafoModel | ( | ) |
Default constructor.
MZTrafoModel | ( | bool | ppm_model | ) |
Default constructor.
If you have external coefficients, use this constructor and the setCoefficients() method to build a 'manual' model. Afterwards, use applyTransformation() or predict() to calibrate your data. If you call train(), the ppm-setting will be overwritten, depending on the type of training data.
ppm_model | Are the coefficients derived from ppm calibration data, or from absolute deltas? |
|
static |
Convert enum to string.
mt | The enum value |
|
static |
Binary search for the model nearest to a specific RT.
tms | Vector of models, sorted by RT |
rt | The target retention time |
Exception::Precondition | is thrown if the vector is empty (not only in debug mode) |
Get model coefficients.
Parameters will be filled with internal model parameters. The model must be trained before; Exception is thrown otherwise!
intercept | The intercept |
slope | The slope |
power | The coefficient for x*x (will be 0 for linear models) |
Exception::Precondition | if model is not trained yet |
double getRT | ( | ) | const |
Get RT associated with the model (training region)
bool isTrained | ( | ) | const |
Does the model have coefficients (i.e. was trained successfully).
Having coefficients does not mean its valid (see isValidModel(); since coeffs might be too large).
|
static |
Predicate to decide if the model has valid parameters, i.e. coefficients.
If the model coefficients are empty, no model was trained yet (or unsuccessful), causing a return value of 'false'.
Also, if the model has coefficients, we check if they are within the acceptable boundaries (if boundaries were given via setCoeffientLimits()).
|
static |
Convert string to enum.
Returns 'SIZE_OF_MODELTYPE' if string is unknown.
name | A string from names_of_modeltype[]. |
Apply the model to an uncalibrated m/z value.
Make sure the model was trained (train()) and is valid (isValidModel()) before calling this function!
Applies the function y = intercept + slope*mz + power*mz^2 and returns y.
mz | The uncalibrated m/z value |
Set coefficient boundaries for which the model coefficient must not exceed to be considered a valid model.
Use std::numeric_limits<double>::max() for no limit (default). If isValidModel() is called these limits are checked. Negative input run through fabs() to get positive values (since comparison is done in absolute terms).
Referenced by UTILProteomicsLFQ::recalibrateMasses_().
void setCoefficients | ( | const MZTrafoModel & | rhs | ) |
Copy model coefficients from another model.
Manually set model coefficients.
Can be used instead of train(), so manually set coefficients. It must be exactly three values. If you want a linear model, set 'power' to zero. If you want a constant model, set slope to zero in addition.
intercept | The offset |
slope | The slope |
power | The x*x coefficient (for quadratic models) |
|
static |
Set the global (program wide) parameters for RANSAC.
This is not done via member, to keep a small memory footprint since hundreds of MZTrafoModels are expected to be build at the same time and the RANSAC params should be identical for all of them.
p | RANSAC params |
Referenced by UTILProteomicsLFQ::recalibrateMasses_().
String toString | ( | ) | const |
String representation of the model parameters.
Empty if model is not trained.
bool train | ( | const CalibrationData & | cd, |
MODELTYPE | md, | ||
bool | use_RANSAC, | ||
double | rt_left = -std::numeric_limits< double >::max() , |
||
double | rt_right = std::numeric_limits< double >::max() |
||
) |
Train a model using calibrant data.
If the CalibrationData were created using peak groups (usually corresponding to mass traces), the median for each group is used as a group representative. This is more robust, and reduces the number of data points drastically, i.e. one value per group.
Internally, these steps take place:
cd | List of calibrants |
md | Type of model (linear, quadratic, ...) |
use_RANSAC | Remove outliers before computing the model? |
rt_left | Filter 'cd' by RT; all calibrants with RT < 'rt_left' are removed |
rt_right | Filter 'cd' by RT; all calibrants with RT > 'rt_right' are removed |
bool train | ( | std::vector< double > | error_mz, |
std::vector< double > | theo_mz, | ||
std::vector< double > | weights, | ||
MODELTYPE | md, | ||
bool | use_RANSAC | ||
) |
Train a model using calibrant data.
Given theoretical and observed mass values (and corresponding weights), a model (linear, quadratic, ...) is build. Outlier removal is applied before. The 'obs_mz' can be either given as absolute masses in [Th] or relative deviations in [ppm]. The MZTrafoModel must be constructed accordingly (see constructor). This has no influence on the model building itself, but rather on how 'predict()' works internally.
Outlier detection before model building via the RANSAC algorithm is supported for LINEAR and QUADRATIC models.
Internally, these steps take place:
error_mz | Observed Mass error (in ppm or Th) |
theo_mz | Theoretical m/z values, corresponding to 'error_mz' |
weights | For weighted models only: weight of calibrants; ignored otherwise |
md | Type of model (linear, quadratic, ...) |
use_RANSAC | Remove outliers before computing the model? |
|
private |
Model coefficients (for both linear and quadratic models), estimated from the data.
|
staticprivate |
acceptable boundary for the estimated offset; if estimated offset is larger (absolute) the model does not validate (isValidModel())
|
staticprivate |
acceptable boundary for the estimated power; if estimated power is larger (absolute) the model does not validate (isValidModel())
|
staticprivate |
acceptable boundary for the estimated scale; if estimated scale is larger (absolute) the model does not validate (isValidModel())
|
static |
strings corresponding to enum MODELTYPE
|
staticprivate |
global pointer, init to NULL at startup; set class-global RANSAC params
|
private |
retention time associated to the model (i.e. where the calibrant data was taken from)
Referenced by MZTrafoModel::RTLess::operator()().
|
private |
during training, model is build on absolute or relative(ppm) predictions. predict(), i.e. applying the model, requires this information too