MapAlignmentAlgorithmSpectrumAlignment Class Reference

A map alignment algorithm based on spectrum similarity (dynamic programming). More...

#include <OpenMS/ANALYSIS/MAPMATCHING/MapAlignmentAlgorithmSpectrumAlignment.h>

Inheritance diagram for MapAlignmentAlgorithmSpectrumAlignment:
Collaboration diagram for MapAlignmentAlgorithmSpectrumAlignment:


class  Compare
 inner class necessary for using the sort algorithm. More...

Public Member Functions

 MapAlignmentAlgorithmSpectrumAlignment ()
 Default constructor. More...
 ~MapAlignmentAlgorithmSpectrumAlignment () override
 Destructor. More...
virtual void align (std::vector< PeakMap > &, std::vector< TransformationDescription > &)
 Align peak maps. More...
Detailed Description

A map alignment algorithm based on spectrum similarity (dynamic programming).

Parameters of this class are:

gapcost float1.0 min: 0.0This Parameter stands for the cost of opening a gap in the Alignment. A gap means that one spectrum can not be aligned directly to another spectrum in the Map. This happens, when the similarity of both spectra a too low or even not present. Imagine it as a insert or delete of the spectrum in the map (similar to sequence alignment). The gap is necessary for aligning, if we open a gap there is a possibility that an another spectrum can be correct aligned with a higher score as before without gap. But to open a gap is a negative event and needs to carry a punishment, so a gap should only be opened if the benefits outweigh the downsides. The Parameter is to giving as a positive number, the implementation convert it to a negative number.
affinegapcost float0.5 min: 0.0This Parameter controls the cost of extension a already open gap. The idea behind the affine gapcost lies under the assumption, that it is better to get a long distance of connected gaps than to have a structure of gaps interspersed with matches (gap match gap match etc.). Therefore the punishment for the extension of a gap generally should be lower than the normal gapcost. If the result of the alignment shows high compression, it is a good idea to lower either the affine gapcost or gap opening cost.
cutoff_score float0.7 min: 0.0 max: 1.0The Parameter defines the threshold which filtered spectra, these spectra are high potential candidate for deciding the interval of a sub-alignment. Only those pair of spectra are selected, which has a score higher or same of the threshold.
bucketsize int100 min: 1Defines the numbers of buckets. It is a quantize of the interval of those points, which defines the main alignment (match points). These points have to filtered, to reduce the amount of points for the calculating a smoother spline curve.
anchorpoints int100 min: 1 max: 100Defines the percent of numbers of match points which a selected from one bucket. The high score pairs are previously selected. The reduction of match points helps to get a smoother spline curve.
debug stringfalse true, falseActivate the debug mode, there a files written starting with debug prefix.
mismatchscore float-5.0 max: 0.0Defines the score of two spectra if they have no similarity to each other.
scorefunction stringSteinScottImproveScore SteinScottImproveScore, ZhangSimilarityScoreThe score function is the core of an alignment. The success of an alignment depends mostly of the elected score function. The score function return the similarity of two spectra. The score influence defines later the way of possible traceback. There are multiple spectra similarity scores available..

  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.
Experimental classes:
This algorithm is work in progress and might change.

Constructor & Destructor Documentation

◆ MapAlignmentAlgorithmSpectrumAlignment() [1/2]

Default constructor.

◆ ~MapAlignmentAlgorithmSpectrumAlignment()


◆ MapAlignmentAlgorithmSpectrumAlignment() [2/2]

Copy constructor is not implemented -> private.

Member Function Documentation

◆ affineGapalign_()

void affineGapalign_ ( Size  xbegin,
Size  ybegin,
Size  xend,
Size  yend,
const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::vector< int > &  xcoordinate,
std::vector< float > &  ycoordinate,
std::vector< int > &  xcoordinatepattern 

affine gap cost Alignment

This Alignment is based on the Needleman Wunsch Algorithm. To improve the time complexity a banded version was implemented, known as k - alignment. To save some space, the alignment is going to be calculated by position xbegin to xend of one sequence and ybegin and yend by another given sequence. The result of the alignment is stored in the second argument. The first sequence is used as a template for the alignment.

xbegincoordinate for the beginning of the template sequence.
ybegincoordinate for the beginning of the aligned sequence .
xendcoordinate for the end of the template sequence.
yendcoordinate for the end of the aligned sequence.
patterntemplate map.
alignedmap to be aligned.
xcoordinatesave the position of anchor points
ycoordinatesave the retentiontimes of an anchor points
xcoordinatepatternsave the reference position of the anchor points from the pattern
Exception::OutOfRangeif a out of bound appear pattern or aligned

◆ align()

virtual void align ( std::vector< PeakMap > &  ,
std::vector< TransformationDescription > &   

Align peak maps.

◆ bestk_()

Int bestk_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::map< Size, std::map< Size, float > > &  buffer,
bool  column_row_orientation,
Size  xbegin,
Size  xend,
Size  ybegin,
Size  yend 

calculate the size of the band for the alignment for two given Sequence

This function calculates the size of the band for the alignment. It takes three samples from the aligned sequence and tries to find the highscore pairs (matching against the template sequence). The highscore pair with the worst distance is to be chosen as the size of k.

patternvector of pointers of the template sequence
alignedvector of pointers of the aligned sequence
bufferholds the calculated score of index i,j.
column_row_orientationindicate the order of the matrix
xbeginindicate the beginning of the template sequence
xendindicate the end of the template sequence
ybeginindicate the beginning of the aligned sequence
yendindicate the end of the aligned sequence

◆ bucketFilter_()

void bucketFilter_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::vector< Int > &  xcoordinate,
std::vector< float > &  ycoordinate,
std::vector< Int > &  xcoordinatepattern 

preparation function of data points to construct later the spline function.

This function reduced the amount of data values for the next step. The reduction is done by using a number of buckets, where the data points a selected. Within the buckets, only defined number a selected, to be written back as a data point. The selection within the buckets is done by scoring.

patterntemplate map.
alignedmap to be aligned.
xcoordinatesave the position of anchor points
ycoordinatesave the retention times of an anchor points
xcoordinatepatternsave the reference position of the anchor points from the pattern

◆ debugFileCreator_()

void debugFileCreator_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned 

Creates files for the debugging.

This function is only active if the debug_ flag is true. The debugfileCreator creates following files:

  • debugtraceback.txt(gnuplotScript),
  • debugscoreheatmap.r and
  • debugRscript.

Debugscoreheatmap.r contains the scores of the Spectra to each other from the alignment and also the traceback. DebugRscript is the R script which reads those data. So both files are only working under R. Start R and type main(location of debugscoreheatmap.r). The output will be a heatmap of each sub-alignment. Debugtraceback.txt shows the way of the Traceback by using gnuplot.

patterntemplate map.
alignedmap to be aligned.

◆ debugscoreDistributionCalculation_()

void debugscoreDistributionCalculation_ ( float  score)

Rounding the score of two spectra, only necessary for debugging.

This function rounded the score of two spectra. This is necessary for some function in the Debug-Mode

◆ insideBand_()

bool insideBand_ ( Size  i,
Size  j,
Size  n,
Size  m,
Int  k_ 

function for the test if cell i,j of the grid is inside the band

The function returns true if the cell underlie these conditions: -k<=i-j<=k+n-m else return false.

icoordinate i
jcoordinate j
nsize of column
msize of row
k_size of k_

◆ msFilter_()

void msFilter_ ( PeakMap peakmap,
std::vector< MSSpectrum * > &  spectrum_pointer_container 

filtered the MSLevel to gain only MSLevel 1

The alignment works only on MSLevel 1 data, so a filter has to be run.

peakmapmap which has to be filtered
spectrum_pointer_containeroutput container, where pointers of the MSSpectrum are saved (only with MS level 1)
Exception::IllegalArgumentis thrown if no spectra are contained in peakmap

◆ operator=()

Assignment operator is not implemented -> private.

◆ prepareAlign_()

void prepareAlign_ ( const std::vector< MSSpectrum * > &  pattern,
PeakMap aligned,
std::vector< TransformationDescription > &  transformation 

A function to prepare the sequence for the alignment. It calls intern the main function for the alignment.

This function takes two arguments. These argument types are two MSExperiments. The first argument should have been filtered, so that only the type of MSLevel 1 exists in the Sequence. The second argument doesn't have to fulfill this restriction. It's going to be filtered automatically. With these two arguments a pre-calculation is done to find some corresponding data points(maximum 4) for building alignment blocks. After the alignment a re-transformation is done, the new Retention Times appear in the original data.

The parameters are MSExperiments.

patterntemplate map.
alignedmap which has to be aligned.
transformationcontainer for rebuilding the alignment only by specific data-points

◆ scoreCalculation_()

float scoreCalculation_ ( Size  i,
Size  j,
Size  patternbegin,
Size  alignbegin,
const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::map< Size, std::map< Size, float > > &  buffer,
bool  column_row_orientation 

calculate the score of two given MSSpectra calls intern scoring_

This function calculates the score from two MSSpectra. These two MSSpectra are chosen by the coordinates i,j. The two coordinates i,j indicate the index in the matrix. To find the right index on the sequence, each beginning is also given to the function. A flag indicates the labeling of the axes. The buffermatrix stores the result of the scoring. If the band expands only a lookup of known scores is done.

iis a index from the matrix.
jis a index from the matrix.
patternbeginindicate the beginning of the template sequence
alignbeginindicate the beginning of the aligned sequence
patternvector of pointers of the template sequence
alignedvector of pointers of the aligned sequence
bufferholds the calculated score of index i,j.
column_row_orientationindicate the order of the matrix

◆ scoring_()

float scoring_ ( const MSSpectrum a,
MSSpectrum b 

return the score of two given MSSpectra by calling the scorefunction

◆ updateMembers_()

void updateMembers_ ( )

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ anchorPoints_

Size anchorPoints_

Defines the amount of anchor points which are selected within one bucket.

◆ bucketsize_

Size bucketsize_

Defines the size of one bucket.

◆ c1_

Pointer holds the scoring function, which can be selected.

◆ cutoffScore_

float cutoffScore_

This is the minimal score to be count as a mismatch(range 0.0 - 1.0)

◆ debug_

bool debug_

Debug mode flag default: False.

◆ debugmatrix_

std::vector<std::vector<float> > debugmatrix_

Container holding the score of the matchmatrix and also the insertmatrix.

◆ debugscorematrix_

std::vector<std::vector<float> > debugscorematrix_

Container holding the only the score of Spectra.

◆ debugtraceback_

std::vector<std::pair<float, float> > debugtraceback_

Container holding the path of the traceback.

◆ e_

float e_

Extension cost after a gap is open.

◆ gap_

float gap_

Represent the gap cost for opening or closing a gap in the alignment.

◆ mismatchscore_

float mismatchscore_

Represent the cost of a mismatch in the alignment.

◆ scoredistribution_

std::vector<float> scoredistribution_

Container holding the score of each cell(matchmatrix,insertmatrix, traceback)

◆ threshold_

float threshold_

This is the minimum score for counting as a match(1-cutoffScore_)