OpenMS
MapAlignerSpectrum

Corrects retention time distortions between maps by aligning spectra.

potential predecessor tools → MapAlignerSpectrum → potential successor tools
FileConverter FeatureFinderCentroided
(or another feature finding algorithm)

This tool provides an algorithm to align the retention time scales of multiple input files, correcting shifts and distortions between them. Retention time adjustment may be necessary to correct for chromatography differences e.g. before data from multiple LC-MS runs can be combined (feature grouping), or when one run should be annotated with peptide identifications obtained in a different run.

All map alignment tools (MapAligner...) collect retention time data from the input files and - by fitting a model to this data - compute transformations that map all runs to a common retention time scale. They can apply the transformations right away and return output files with aligned time scales (parameter out), and/or return descriptions of the transformations in trafoXML format (parameter trafo_out). Transformations stored as trafoXML can be applied to arbitrary files with the MapRTTransformer tool.

The map alignment tools differ in how they obtain retention time data for the modeling of transformations, and consequently what types of data they can be applied to. Here, an experimental algorithm based on spectrum alignment is implemented. It is only applicable to peak maps (mzML format). For more details and algorithm-specific parameters (set in the INI file) see "Detailed Description" in the algorithm documentation.

See also
MapAlignerIdentification MapAlignerPoseClustering MapRTTransformer

Since OpenMS 1.8, the extraction of data for the alignment has been separate from the modeling of RT transformations based on that data. It is now possible to use different models independently of the chosen algorithm. This algorithm has been tested mostly with the "interpolated" model. The different available models are:

The following parameters control the modeling of RT transformations (they can be set in the "model" section of the INI file):

NameTypeDefaultRestrictionsDescription
type stringinterpolated linear, b_spline, lowess, interpolatedType of model
linear:symmetric_regression stringfalse true, falsePerform linear regression on 'y - x' vs. 'y + x', instead of on 'y' vs. 'x'.
linear:x_weight stringx 1/x, 1/x2, ln(x), xWeight x values
linear:y_weight stringy 1/y, 1/y2, ln(y), yWeight y values
linear:x_datum_min float1.0e-15  Minimum x value
linear:x_datum_max float1.0e15  Maximum x value
linear:y_datum_min float1.0e-15  Minimum y value
linear:y_datum_max float1.0e15  Maximum y value
b_spline:wavelength float0.0 min: 0.0Determines the amount of smoothing by setting the number of nodes for the B-spline. The number is chosen so that the spline approximates a low-pass filter with this cutoff wavelength. The wavelength is given in the same units as the data; a higher value means more smoothing. '0' sets the number of nodes to twice the number of input points.
b_spline:num_nodes int5 min: 0Number of nodes for B-spline fitting. Overrides 'wavelength' if set (to two or greater). A lower value means more smoothing.
b_spline:extrapolate stringlinear linear, b_spline, constant, global_linearMethod to use for extrapolation beyond the original data range. 'linear': Linear extrapolation using the slope of the B-spline at the corresponding endpoint. 'b_spline': Use the B-spline (as for interpolation). 'constant': Use the constant value of the B-spline at the corresponding endpoint. 'global_linear': Use a linear fit through the data (which will most probably introduce discontinuities at the ends of the data range).
b_spline:boundary_condition int2 min: 0 max: 2Boundary condition at B-spline endpoints: 0 (value zero), 1 (first derivative zero) or 2 (second derivative zero)
lowess:span float0.666666666666667 min: 0.0 max: 1.0Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.
lowess:num_iterations int3 min: 0Number of robustifying iterations for lowess fitting.
lowess:delta float-1.0  Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
lowess:interpolation_type stringcspline linear, cspline, akimaMethod to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolation
lowess:extrapolation_type stringfour-point-linear two-point-linear, four-point-linear, global-linearMethod to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.
interpolated:interpolation_type stringcspline linear, cspline, akimaType of interpolation to apply.
interpolated:extrapolation_type stringtwo-point-linear two-point-linear, four-point-linear, global-linearType of extrapolation to apply: two-point-linear: use the first and last data point to build a single linear model, four-point-linear: build two linear models on both ends using the first two / last two points, global-linear: use all points to build a single linear model. Note that global-linear may not be continuous at the border.

The command line parameters of this tool are:

MapAlignerSpectrum -- Corrects retention time distortions between maps by spectrum alignment.
Full documentation: http://www.openms.de/doxygen/release/3.1.0/html/TOPP_MapAlignerSpectrum.html
Version: 3.1.0 Oct 18 2023, 10:27:18, Revision: 17a07f8
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  MapAlignerSpectrum <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
  -in <files>*        Input files to align (all must have the same file type) (valid formats: 'mzML')
  -out <files>        Output files (same file type as 'in'). This option or 'trafo_out' has to be provided; 
                      they can be used together. (valid formats: 'mzML')
  -trafo_out <files>  Transformation output files. This option or 'out' has to be provided; they can be used 
                      together. (valid formats: 'trafoXML')
                      
Common TOPP options:
  -ini <file>         Use the given TOPP INI file
  -threads <n>        Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>   Writes the default configuration file
  --help              Shows options
  --helphelp          Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Algorithm parameters section
 - model       Options to control the modeling of retention time transformations from data

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:
  - http://www.openms.de/doxygen/release/3.1.0/html/TOPP_MapAlignerSpectrum.html

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+MapAlignerSpectrumCorrects retention time distortions between maps by spectrum alignment.
version3.1.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'MapAlignerSpectrum'
in[] Input files to align (all must have the same file type)input file*.mzML
out[] Output files (same file type as 'in'). This option or 'trafo_out' has to be provided; they can be used together.output file*.mzML
trafo_out[] Transformation output files. This option or 'out' has to be provided; they can be used together.output file*.trafoXML
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++algorithmAlgorithm parameters section
gapcost1.0 This Parameter stands for the cost of opening a gap in the Alignment. A gap means that one spectrum can not be aligned directly to another spectrum in the Map. This happens, when the similarity of both spectra a too low or even not present. Imagine it as a insert or delete of the spectrum in the map (similar to sequence alignment). The gap is necessary for aligning, if we open a gap there is a possibility that an another spectrum can be correct aligned with a higher score as before without gap. But to open a gap is a negative event and needs to carry a punishment, so a gap should only be opened if the benefits outweigh the downsides. The Parameter is to giving as a positive number, the implementation convert it to a negative number.0.0:∞
affinegapcost0.5 This Parameter controls the cost of extension a already open gap. The idea behind the affine gapcost lies under the assumption, that it is better to get a long distance of connected gaps than to have a structure of gaps interspersed with matches (gap match gap match etc.). Therefore the punishment for the extension of a gap generally should be lower than the normal gapcost. If the result of the alignment shows high compression, it is a good idea to lower either the affine gapcost or gap opening cost.0.0:∞
cutoff_score0.7 The Parameter defines the threshold which filtered spectra, these spectra are high potential candidate for deciding the interval of a sub-alignment. Only those pair of spectra are selected, which has a score higher or same of the threshold.0.0:1.0
bucketsize100 Defines the numbers of buckets. It is a quantize of the interval of those points, which defines the main alignment (match points). These points have to filtered, to reduce the amount of points for the calculating a smoother spline curve.1:∞
anchorpoints100 Defines the percent of numbers of match points which a selected from one bucket. The high score pairs are previously selected. The reduction of match points helps to get a smoother spline curve.1:100
debugfalse Activate the debug mode, there a files written starting with debug prefix.true, false
mismatchscore-5.0 Defines the score of two spectra if they have no similarity to each other. -∞:0.0
scorefunctionSteinScottImproveScore The score function is the core of an alignment. The success of an alignment depends mostly of the elected score function. The score function return the similarity of two spectra. The score influence defines later the way of possible traceback. There are multiple spectra similarity scores available..SteinScottImproveScore, ZhangSimilarityScore
+++modelOptions to control the modeling of retention time transformations from data
typeinterpolated Type of modellinear, b_spline, lowess, interpolated
++++linearParameters for 'linear' model
symmetric_regressionfalse Perform linear regression on 'y - x' vs. 'y + x', instead of on 'y' vs. 'x'.true, false
x_weightx Weight x values1/x, 1/x2, ln(x), x
y_weighty Weight y values1/y, 1/y2, ln(y), y
x_datum_min1.0e-15 Minimum x value
x_datum_max1.0e15 Maximum x value
y_datum_min1.0e-15 Minimum y value
y_datum_max1.0e15 Maximum y value
++++b_splineParameters for 'b_spline' model
wavelength0.0 Determines the amount of smoothing by setting the number of nodes for the B-spline. The number is chosen so that the spline approximates a low-pass filter with this cutoff wavelength. The wavelength is given in the same units as the data; a higher value means more smoothing. '0' sets the number of nodes to twice the number of input points.0.0:∞
num_nodes5 Number of nodes for B-spline fitting. Overrides 'wavelength' if set (to two or greater). A lower value means more smoothing.0:∞
extrapolatelinear Method to use for extrapolation beyond the original data range. 'linear': Linear extrapolation using the slope of the B-spline at the corresponding endpoint. 'b_spline': Use the B-spline (as for interpolation). 'constant': Use the constant value of the B-spline at the corresponding endpoint. 'global_linear': Use a linear fit through the data (which will most probably introduce discontinuities at the ends of the data range).linear, b_spline, constant, global_linear
boundary_condition2 Boundary condition at B-spline endpoints: 0 (value zero), 1 (first derivative zero) or 2 (second derivative zero)0:2
++++lowessParameters for 'lowess' model
span0.666666666666667 Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.0.0:1.0
num_iterations3 Number of robustifying iterations for lowess fitting.0:∞
delta-1.0 Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
interpolation_typecspline Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolationlinear, cspline, akima
extrapolation_typefour-point-linear Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.two-point-linear, four-point-linear, global-linear
++++interpolatedParameters for 'interpolated' model
interpolation_typecspline Type of interpolation to apply.linear, cspline, akima
extrapolation_typetwo-point-linear Type of extrapolation to apply: two-point-linear: use the first and last data point to build a single linear model, four-point-linear: build two linear models on both ends using the first two / last two points, global-linear: use all points to build a single linear model. Note that global-linear may not be continuous at the border.two-point-linear, four-point-linear, global-linear