Group corresponding features across labelfree experiments.

Group corresponding features across labelfree experiments. This tool produces results similar to those of FeatureLinkerUnlabeledQT, since it optimizes a similar objective. However, this algorithm is more efficient than FLQT as it uses a kd-tree for fast 2D region queries in m/z - RT space and a sorted binary search tree to choose the best cluster among the remaining ones in O(1). Insertion and searching in this tree have O(log n) runtime. KD-tree insertion and search have O(log n) runtime. The overall complexity of the algorithm is O(n log(n)) time and O(n) space.

In practice, the runtime of FeatureLinkerUnlabeledQT is often not significantly worse than that of FeatureLinkerUnlabeledKD if the datasets are relatively small and/or the value of the -nr_partitions parameter is chosen large enough. If, however, the datasets are very large, and especially if they are so dense that a partitioning based on the specified m/z tolerance is not possible anymore, then this algorithm becomes orders of magnitudes faster than FLQT.

Notably, this algorithm can be used to align featureXML files containing unassembled mass traces (as produced by MassTraceExtractor), which is often impossible for reasonably large datasets using other aligners, as these datasets tend to be too dense and hence cannot be partitioned.

Prior to feature linking, this tool performs an (optional) retention time transformation on the features using LOWESS regression in order to minimize retention time differences between corresponding features across different maps. These transformed RTs are used only internally. In the results, original RTs will be reported.

The linking behavior can be influenced by separately specifying how to use the available charge and adduct information. Options allow to restrict linking to features with the same adduct/charge (or lack thereof, i.e. features with charge zero or no adduct annotation), additionally allowing the linking of charged/adduct-annotated features with those having no charge/adduct information, or allowing all features to be linked irrespective of charge state/adduct information.

Note that the more relaxed the allowed grouping criteria, the larger internally used connected components memory-wise. More stringent m/z or retention time tolerances might be required then.

The command line parameters of this tool are:

FeatureLinkerUnlabeledKD -- Groups corresponding features from multiple maps.
Full documentation:
Version: 3.2.0-pre-nightly-2024-07-21 Jul 22 2024, 02:13:52, Revision: b650df0
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

  FeatureLinkerUnlabeledKD <options>

This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option

Options (mandatory options marked with '*'):
  -in <files>*        Input files separated by blanks (valid formats: 'featureXML', 'consensusXML')
  -out <file>*        Output file (valid formats: 'consensusXML')
  -design <file>      Input file containing the experimental design (valid formats: 'tsv')
  -keep_subelements   For consensusXML input only: If set, the sub-features of the inputs are transferred to 
                      the output.
Common TOPP options:
  -ini <file>         Use the given TOPP INI file
  -threads <n>        Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>   Writes the default configuration file
  --help              Shows options
  --helphelp          Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   Algorithm parameters section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
For more information, please consult the online documentation for this tool:

INI file documentation of this tool:

required parameter
advanced parameter
+FeatureLinkerUnlabeledKDGroups corresponding features from multiple maps.
version3.2.0-pre-nightly-2024-07-21 Version of the tool that generated this parameters file.
++1Instance '1' section for 'FeatureLinkerUnlabeledKD'
in[] input files separated by blanksinput file*.featureXML, *.consensusXML
out Output fileoutput file*.consensusXML
design input file containing the experimental designinput file*.tsv
keep_subelementsfalse For consensusXML input only: If set, the sub-features of the inputs are transferred to the output.true, false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++algorithmAlgorithm parameters section
mz_unitppm Unit of m/z toleranceppm, Da
nr_partitions100 Number of partitions in m/z space1:∞
enabledtrue Whether or not to internally warp feature RTs using LOWESS transformation before linking (reported RTs in results will always be the original RTs)true, false
rt_tol100.0 Width of RT tolerance window (sec)0.0:∞
mz_tol5.0 m/z tolerance (in ppm or Da)0.0:∞
max_pairwise_log_fc0.5 Maximum absolute log10 fold change between two compatible signals during compatibility graph construction. Two signals from different maps will not be connected by an edge in the compatibility graph if absolute log fold change exceeds this limit (they might still end up in the same connected component, however). Note: this does not limit fold changes in the linking stage, only during RT alignment, where we try to find high-quality alignment anchor points. Setting this to a value < 0 disables the FC check.
min_rel_cc_size0.5 Only connected components containing compatible features from at least max(2, (warp_min_occur * number_of_input_maps)) input maps are considered for computing the warping function0.0:1.0
max_nr_conflicts0 Allow up to this many conflicts (features from the same map) per connected component to be used for alignment (-1 means allow any number of conflicts)-1:∞
rt_tol30.0 Width of RT tolerance window (sec)0.0:∞
mz_tol10.0 m/z tolerance (in ppm or Da)0.0:∞
charge_mergingWith_charge_zero whether to disallow charge mismatches (Identical), allow to link charge zero (i.e., unknown charge state) with every charge state, or disregard charges (Any).Identical, With_charge_zero, Any
adduct_mergingAny whether to only allow the same adduct for linking (Identical), also allow linking features with adduct-free ones, or disregard adducts (Any).Identical, With_unknown_adducts, Any
++++distance_RTDistance component based on RT differences
exponent1.0 Normalized RT differences ([0-1], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight1.0 Final RT distances are weighted by this factor0.0:∞
++++distance_MZDistance component based on m/z differences
exponent2.0 Normalized ([0-1], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight1.0 Final m/z distances are weighted by this factor0.0:∞
++++distance_intensityDistance component based on differences in relative intensity (usually relative to highest peak in the whole data set)
exponent1.0 Differences in relative intensity ([0-1]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow)0.0:∞
weight1.0 Final intensity distances are weighted by this factor0.0:∞
log_transformenabled Log-transform intensities? If disabled, d = |int_f2 - int_f1| / int_max. If enabled, d = |log(int_f2 + 1) - log(int_f1 + 1)| / log(int_max + 1))enabled, disabled
++++LOWESSLOWESS parameters for internal RT transformations (only relevant if 'warp:enabled' is set to 'true')
span0.666666666666667 Fraction of datapoints (f) to use for each local regression (determines the amount of smoothing). Choosing this parameter in the range .2 to .8 usually results in a good fit.0.0:1.0
num_iterations3 Number of robustifying iterations for lowess fitting.0:∞
delta-1.0 Nonnegative parameter which may be used to save computations (recommended value is 0.01 of the range of the input, e.g. for data ranging from 1000 seconds to 2000 seconds, it could be set to 10). Setting a negative value will automatically do this.
interpolation_typecspline Method to use for interpolation between datapoints computed by lowess. 'linear': Linear interpolation. 'cspline': Use the cubic spline for interpolation. 'akima': Use an akima spline for interpolationlinear, cspline, akima
extrapolation_typefour-point-linear Method to use for extrapolation outside the data range. 'two-point-linear': Uses a line through the first and last point to extrapolate. 'four-point-linear': Uses a line through the first and second point to extrapolate in front and and a line through the last and second-to-last point in the end. 'global-linear': Uses a linear regression to fit a line through all data points and use it for interpolation.two-point-linear, four-point-linear, global-linear