This class implements a pair finding algorithm for consensus features.
It offers a method to determine pairs across two consensus maps. The corresponding consensus features must be aligned, but may have small position deviations.
The distance measure is implemented in class FeatureDistance  see there for details.
Additional criteria for pairing
Depending on parameter use_identifications
, peptide identifications annotated to the features may have to be compatible (i.e. no annotation or the same annotation) for a pairing to occur.
Stability criterion: The distance to the nearest neighbor must be smaller than the distance to the secondnearest neighbor by a certain factor, see parameter second_nearest_gap
. There is a nontrivial relation between this parameter and the maximum allowed difference (in RT or m/z) of the distance measure: If second_nearest_gap
is greater than one, lowering max_difference
may in fact lead to more  rather than fewer  pairings, because it increases the distance difference between the nearest and the secondnearest neighbor, so that the constraint imposed by second_nearest_gap
may be fulfilled more often.
Quality calculation
The quality of a pairing is computed from the distance between the paired elements (nearest neighbors) and the distances to the secondnearest neighbors of both elements, according to the formula:
\[ q_{i,j} = \big( 1  d_{i,j} \big) \cdot \big( 1  \frac{g \cdot d_{i,j}}{d_{2,i}} \big) \cdot \big( 1  \frac{g \cdot d_{i,j}}{d_{2,j}} \big) \cdot \]
\( q_{i,j} \) is the quality of the pairing of elements i and j, \( d_{i,j} \) is the distance between the two, \( d_{2,i} \) and \(d_{2,j} \) are the distances to the secondnearest neighbors of i and j, respectively, and g is the factor defined by parameter second_nearest_gap
.
Note that by the definition of the distance measure, \( 0 \leq d_{i,j} \leq 1 \) if i and j are to form a pair. The criteria for pairing further require that \( g \cdot d_{i,j} \leq d_{2,i} \) and \( g \cdot d_{i,j} \leq d_{2,j} \). This ensures that the resulting quality is always between one (best) and zero (worst).
For the final quality q of the consensus feature produced by merging two paired elements (i and j), the existing quality values of the two elements are taken into account. The final quality is a weighted average of the existing qualities ( \( q_i \) and \( q_j \)) and the quality of the pairing ( \( q_{i,j} \), see above):
\[ q = \frac{q_{i,j} + (s_i  1) \cdot q_i + (s_j  1) \cdot q_j}{s_i + s_j  1} \]
The weighting factors \( s_i \) and \( s_j \) are the sizes (i.e. numbers of subelements) of the two consensus features i and j. That way, it is possible to link several feature maps to a growing consensus map in a stepwise fashion (as done by FeatureGroupingAlgorithmUnlabeled), and in the end obtain quality values that incorporate the qualities of all pairings that occurred during the generation of a consensus feature. Note that "missing" elements (if a consensus feature does not contain subfeatures from all input maps) are not punished in this definition of quality.
Parameters of this class are:Name  Type  Default  Restrictions  Description 

second_nearest_gap  float  2.0  min: 1.0  Only link features whose distance to the second nearest neighbors (for both sides) is larger by 'second_nearest_gap' than the distance between the matched pair itself. 
use_identifications  string  false  true, false  Never link features that are annotated with different peptides (features without ID's always match; only the best hit per peptide identification is considered). 
ignore_charge  string  false  true, false  false [default]: pairing requires equal charge state (or at least one unknown charge '0'); true: Pairing irrespective of charge state 
ignore_adduct  string  true  true, false  true [default]: pairing requires equal adducts (or at least one without adduct annotation); true: Pairing irrespective of adducts 
distance_RT:max_difference  float  100.0  min: 0.0  Never pair features with a larger RT distance (in seconds). 
distance_RT:exponent  float  1.0  min: 0.0  Normalized RT differences ([01], relative to 'max_difference') are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow) 
distance_RT:weight  float  1.0  min: 0.0  Final RT distances are weighted by this factor 
distance_MZ:max_difference  float  0.3  min: 0.0  Never pair features with larger m/z distance (unit defined by 'unit') 
distance_MZ:unit  string  Da  Da, ppm  Unit of the 'max_difference' parameter 
distance_MZ:exponent  float  2.0  min: 0.0  Normalized ([01], relative to 'max_difference') m/z differences are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow) 
distance_MZ:weight  float  1.0  min: 0.0  Final m/z distances are weighted by this factor 
distance_intensity:exponent  float  1.0  min: 0.0  Differences in relative intensity ([01]) are raised to this power (using 1 or 2 will be fast, everything else is REALLY slow) 
distance_intensity:weight  float  0.0  min: 0.0  Final intensity distances are weighted by this factor 
distance_intensity:log_transform  string  disabled  enabled, disabled  Logtransform intensities? If disabled, d = int_f2  int_f1 / int_max. If enabled, d = log(int_f2 + 1)  log(int_f1 + 1) / log(int_max + 1)) 
