OpenMS
MapAlignmentAlgorithmSpectrumAlignment Class Reference

A map alignment algorithm based on spectrum similarity (dynamic programming). More...

#include <OpenMS/ANALYSIS/MAPMATCHING/MapAlignmentAlgorithmSpectrumAlignment.h>

Inheritance diagram for MapAlignmentAlgorithmSpectrumAlignment:
[legend]
Collaboration diagram for MapAlignmentAlgorithmSpectrumAlignment:
[legend]

Classes

class  Compare
 inner class necessary for using the sort algorithm. More...
 

Public Member Functions

 MapAlignmentAlgorithmSpectrumAlignment ()
 Default constructor. More...
 
 ~MapAlignmentAlgorithmSpectrumAlignment () override
 Destructor. More...
 
virtual void align (std::vector< PeakMap > &, std::vector< TransformationDescription > &)
 Align peak maps. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor. More...
 
virtual ~ProgressLogger ()
 Destructor. More...
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor. More...
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator. More...
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE! More...
 
LogType getLogType () const
 Returns the type of progress log being used. More...
 
void startProgress (SignedSize begin, SignedSize end, const String &label) const
 Initializes the progress display. More...
 
void setProgress (SignedSize value) const
 Sets the current progress. More...
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end) More...
 

Private Member Functions

 MapAlignmentAlgorithmSpectrumAlignment (const MapAlignmentAlgorithmSpectrumAlignment &)
 Copy constructor is not implemented -> private. More...
 
MapAlignmentAlgorithmSpectrumAlignmentoperator= (const MapAlignmentAlgorithmSpectrumAlignment &)
 Assignment operator is not implemented -> private. More...
 
void prepareAlign_ (const std::vector< MSSpectrum * > &pattern, PeakMap &aligned, std::vector< TransformationDescription > &transformation)
 A function to prepare the sequence for the alignment. It calls intern the main function for the alignment. More...
 
void msFilter_ (PeakMap &peakmap, std::vector< MSSpectrum * > &spectrum_pointer_container)
 filtered the MSLevel to gain only MSLevel 1 More...
 
bool insideBand_ (Size i, Size j, Size n, Size m, Int k_)
 function for the test if cell i,j of the grid is inside the band More...
 
Int bestk_ (const std::vector< MSSpectrum * > &pattern, std::vector< MSSpectrum * > &aligned, std::map< Size, std::map< Size, float > > &buffer, bool column_row_orientation, Size xbegin, Size xend, Size ybegin, Size yend)
 calculate the size of the band for the alignment for two given Sequence More...
 
float scoreCalculation_ (Size i, Size j, Size patternbegin, Size alignbegin, const std::vector< MSSpectrum * > &pattern, std::vector< MSSpectrum * > &aligned, std::map< Size, std::map< Size, float > > &buffer, bool column_row_orientation)
 calculate the score of two given MSSpectra calls intern scoring_ More...
 
float scoring_ (const MSSpectrum &a, MSSpectrum &b)
 return the score of two given MSSpectra by calling the scorefunction More...
 
void affineGapalign_ (Size xbegin, Size ybegin, Size xend, Size yend, const std::vector< MSSpectrum * > &pattern, std::vector< MSSpectrum * > &aligned, std::vector< int > &xcoordinate, std::vector< float > &ycoordinate, std::vector< int > &xcoordinatepattern)
 affine gap cost Alignment More...
 
void bucketFilter_ (const std::vector< MSSpectrum * > &pattern, std::vector< MSSpectrum * > &aligned, std::vector< Int > &xcoordinate, std::vector< float > &ycoordinate, std::vector< Int > &xcoordinatepattern)
 preparation function of data points to construct later the spline function. More...
 
void debugFileCreator_ (const std::vector< MSSpectrum * > &pattern, std::vector< MSSpectrum * > &aligned)
 Creates files for the debugging. More...
 
void debugscoreDistributionCalculation_ (float score)
 Rounding the score of two spectra, only necessary for debugging. More...
 
void updateMembers_ () override
 This method is used to update extra member variables at the end of the setParameters() method. More...
 

Private Attributes

float gap_
 Represent the gap cost for opening or closing a gap in the alignment. More...
 
float e_
 Extension cost after a gap is open. More...
 
PeakSpectrumCompareFunctorc1_
 Pointer holds the scoring function, which can be selected. More...
 
float cutoffScore_
 This is the minimal score to be count as a mismatch(range 0.0 - 1.0) More...
 
Size bucketsize_
 Defines the size of one bucket. More...
 
Size anchorPoints_
 Defines the amount of anchor points which are selected within one bucket. More...
 
bool debug_
 Debug mode flag default: False. More...
 
float mismatchscore_
 Represent the cost of a mismatch in the alignment. More...
 
float threshold_
 This is the minimum score for counting as a match(1-cutoffScore_) More...
 
std::vector< std::vector< float > > debugmatrix_
 Container holding the score of the matchmatrix and also the insertmatrix. More...
 
std::vector< std::vector< float > > debugscorematrix_
 Container holding the only the score of Spectra. More...
 
std::vector< std::pair< float, float > > debugtraceback_
 Container holding the path of the traceback. More...
 
std::vector< float > scoredistribution_
 Container holding the score of each cell(matchmatrix,insertmatrix, traceback) More...
 

Additional Inherited Members

- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 
- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 
- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 
- Static Protected Member Functions inherited from ProgressLogger
static String logTypeToFactoryName_ (LogType type)
 Return the name of the factory product used for this log type. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

A map alignment algorithm based on spectrum similarity (dynamic programming).

Parameters of this class are:

NameTypeDefaultRestrictionsDescription
gapcost float1.0 min: 0.0This Parameter stands for the cost of opening a gap in the Alignment. A gap means that one spectrum can not be aligned directly to another spectrum in the Map. This happens, when the similarity of both spectra a too low or even not present. Imagine it as a insert or delete of the spectrum in the map (similar to sequence alignment). The gap is necessary for aligning, if we open a gap there is a possibility that an another spectrum can be correct aligned with a higher score as before without gap. But to open a gap is a negative event and needs to carry a punishment, so a gap should only be opened if the benefits outweigh the downsides. The Parameter is to giving as a positive number, the implementation convert it to a negative number.
affinegapcost float0.5 min: 0.0This Parameter controls the cost of extension a already open gap. The idea behind the affine gapcost lies under the assumption, that it is better to get a long distance of connected gaps than to have a structure of gaps interspersed with matches (gap match gap match etc.). Therefore the punishment for the extension of a gap generally should be lower than the normal gapcost. If the result of the alignment shows high compression, it is a good idea to lower either the affine gapcost or gap opening cost.
cutoff_score float0.7 min: 0.0 max: 1.0The Parameter defines the threshold which filtered spectra, these spectra are high potential candidate for deciding the interval of a sub-alignment. Only those pair of spectra are selected, which has a score higher or same of the threshold.
bucketsize int100 min: 1Defines the numbers of buckets. It is a quantize of the interval of those points, which defines the main alignment (match points). These points have to filtered, to reduce the amount of points for the calculating a smoother spline curve.
anchorpoints int100 min: 1 max: 100Defines the percent of numbers of match points which a selected from one bucket. The high score pairs are previously selected. The reduction of match points helps to get a smoother spline curve.
debug stringfalse true, falseActivate the debug mode, there a files written starting with debug prefix.
mismatchscore float-5.0 max: 0.0Defines the score of two spectra if they have no similarity to each other.
scorefunction stringSteinScottImproveScore SteinScottImproveScore, ZhangSimilarityScoreThe score function is the core of an alignment. The success of an alignment depends mostly of the elected score function. The score function return the similarity of two spectra. The score influence defines later the way of possible traceback. There are multiple spectra similarity scores available..

Note:
  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.
Experimental classes:
This algorithm is work in progress and might change.

Constructor & Destructor Documentation

◆ MapAlignmentAlgorithmSpectrumAlignment() [1/2]

Default constructor.

◆ ~MapAlignmentAlgorithmSpectrumAlignment()

Destructor.

◆ MapAlignmentAlgorithmSpectrumAlignment() [2/2]

Copy constructor is not implemented -> private.

Member Function Documentation

◆ affineGapalign_()

void affineGapalign_ ( Size  xbegin,
Size  ybegin,
Size  xend,
Size  yend,
const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::vector< int > &  xcoordinate,
std::vector< float > &  ycoordinate,
std::vector< int > &  xcoordinatepattern 
)
private

affine gap cost Alignment

This Alignment is based on the Needleman Wunsch Algorithm. To improve the time complexity a banded version was implemented, known as k - alignment. To save some space, the alignment is going to be calculated by position xbegin to xend of one sequence and ybegin and yend by another given sequence. The result of the alignment is stored in the second argument. The first sequence is used as a template for the alignment.

Parameters
xbegincoordinate for the beginning of the template sequence.
ybegincoordinate for the beginning of the aligned sequence .
xendcoordinate for the end of the template sequence.
yendcoordinate for the end of the aligned sequence.
patterntemplate map.
alignedmap to be aligned.
xcoordinatesave the position of anchor points
ycoordinatesave the retentiontimes of an anchor points
xcoordinatepatternsave the reference position of the anchor points from the pattern
Exceptions
Exception::OutOfRangeif a out of bound appear pattern or aligned

◆ align()

virtual void align ( std::vector< PeakMap > &  ,
std::vector< TransformationDescription > &   
)
virtual

Align peak maps.

◆ bestk_()

Int bestk_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::map< Size, std::map< Size, float > > &  buffer,
bool  column_row_orientation,
Size  xbegin,
Size  xend,
Size  ybegin,
Size  yend 
)
private

calculate the size of the band for the alignment for two given Sequence

This function calculates the size of the band for the alignment. It takes three samples from the aligned sequence and tries to find the highscore pairs (matching against the template sequence). The highscore pair with the worst distance is to be chosen as the size of k.

Parameters
patternvector of pointers of the template sequence
alignedvector of pointers of the aligned sequence
bufferholds the calculated score of index i,j.
column_row_orientationindicate the order of the matrix
xbeginindicate the beginning of the template sequence
xendindicate the end of the template sequence
ybeginindicate the beginning of the aligned sequence
yendindicate the end of the aligned sequence

◆ bucketFilter_()

void bucketFilter_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::vector< Int > &  xcoordinate,
std::vector< float > &  ycoordinate,
std::vector< Int > &  xcoordinatepattern 
)
private

preparation function of data points to construct later the spline function.

This function reduced the amount of data values for the next step. The reduction is done by using a number of buckets, where the data points a selected. Within the buckets, only defined number a selected, to be written back as a data point. The selection within the buckets is done by scoring.

Parameters
patterntemplate map.
alignedmap to be aligned.
xcoordinatesave the position of anchor points
ycoordinatesave the retention times of an anchor points
xcoordinatepatternsave the reference position of the anchor points from the pattern

◆ debugFileCreator_()

void debugFileCreator_ ( const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned 
)
private

Creates files for the debugging.

This function is only active if the debug_ flag is true. The debugfileCreator creates following files:

  • debugtraceback.txt(gnuplotScript),
  • debugscoreheatmap.r and
  • debugRscript.

Debugscoreheatmap.r contains the scores of the Spectra to each other from the alignment and also the traceback. DebugRscript is the R script which reads those data. So both files are only working under R. Start R and type main(location of debugscoreheatmap.r). The output will be a heatmap of each sub-alignment. Debugtraceback.txt shows the way of the Traceback by using gnuplot.

Parameters
patterntemplate map.
alignedmap to be aligned.

◆ debugscoreDistributionCalculation_()

void debugscoreDistributionCalculation_ ( float  score)
private

Rounding the score of two spectra, only necessary for debugging.

This function rounded the score of two spectra. This is necessary for some function in the Debug-Mode

◆ insideBand_()

bool insideBand_ ( Size  i,
Size  j,
Size  n,
Size  m,
Int  k_ 
)
private

function for the test if cell i,j of the grid is inside the band

The function returns true if the cell underlie these conditions: -k<=i-j<=k+n-m else return false.

Parameters
icoordinate i
jcoordinate j
nsize of column
msize of row
k_size of k_

◆ msFilter_()

void msFilter_ ( PeakMap peakmap,
std::vector< MSSpectrum * > &  spectrum_pointer_container 
)
private

filtered the MSLevel to gain only MSLevel 1

The alignment works only on MSLevel 1 data, so a filter has to be run.

Parameters
peakmapmap which has to be filtered
spectrum_pointer_containeroutput container, where pointers of the MSSpectrum are saved (only with MS level 1)
Exceptions
Exception::IllegalArgumentis thrown if no spectra are contained in peakmap

◆ operator=()

Assignment operator is not implemented -> private.

◆ prepareAlign_()

void prepareAlign_ ( const std::vector< MSSpectrum * > &  pattern,
PeakMap aligned,
std::vector< TransformationDescription > &  transformation 
)
private

A function to prepare the sequence for the alignment. It calls intern the main function for the alignment.

This function takes two arguments. These argument types are two MSExperiments. The first argument should have been filtered, so that only the type of MSLevel 1 exists in the Sequence. The second argument doesn't have to fulfill this restriction. It's going to be filtered automatically. With these two arguments a pre-calculation is done to find some corresponding data points(maximum 4) for building alignment blocks. After the alignment a re-transformation is done, the new Retention Times appear in the original data.

The parameters are MSExperiments.

Parameters
patterntemplate map.
alignedmap which has to be aligned.
transformationcontainer for rebuilding the alignment only by specific data-points

◆ scoreCalculation_()

float scoreCalculation_ ( Size  i,
Size  j,
Size  patternbegin,
Size  alignbegin,
const std::vector< MSSpectrum * > &  pattern,
std::vector< MSSpectrum * > &  aligned,
std::map< Size, std::map< Size, float > > &  buffer,
bool  column_row_orientation 
)
private

calculate the score of two given MSSpectra calls intern scoring_

This function calculates the score from two MSSpectra. These two MSSpectra are chosen by the coordinates i,j. The two coordinates i,j indicate the index in the matrix. To find the right index on the sequence, each beginning is also given to the function. A flag indicates the labeling of the axes. The buffermatrix stores the result of the scoring. If the band expands only a lookup of known scores is done.

Parameters
iis a index from the matrix.
jis a index from the matrix.
patternbeginindicate the beginning of the template sequence
alignbeginindicate the beginning of the aligned sequence
patternvector of pointers of the template sequence
alignedvector of pointers of the aligned sequence
bufferholds the calculated score of index i,j.
column_row_orientationindicate the order of the matrix

◆ scoring_()

float scoring_ ( const MSSpectrum a,
MSSpectrum b 
)
private

return the score of two given MSSpectra by calling the scorefunction

◆ updateMembers_()

void updateMembers_ ( )
overrideprivatevirtual

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ anchorPoints_

Size anchorPoints_
private

Defines the amount of anchor points which are selected within one bucket.

◆ bucketsize_

Size bucketsize_
private

Defines the size of one bucket.

◆ c1_

Pointer holds the scoring function, which can be selected.

◆ cutoffScore_

float cutoffScore_
private

This is the minimal score to be count as a mismatch(range 0.0 - 1.0)

◆ debug_

bool debug_
private

Debug mode flag default: False.

◆ debugmatrix_

std::vector<std::vector<float> > debugmatrix_
private

Container holding the score of the matchmatrix and also the insertmatrix.

◆ debugscorematrix_

std::vector<std::vector<float> > debugscorematrix_
private

Container holding the only the score of Spectra.

◆ debugtraceback_

std::vector<std::pair<float, float> > debugtraceback_
private

Container holding the path of the traceback.

◆ e_

float e_
private

Extension cost after a gap is open.

◆ gap_

float gap_
private

Represent the gap cost for opening or closing a gap in the alignment.

◆ mismatchscore_

float mismatchscore_
private

Represent the cost of a mismatch in the alignment.

◆ scoredistribution_

std::vector<float> scoredistribution_
private

Container holding the score of each cell(matchmatrix,insertmatrix, traceback)

◆ threshold_

float threshold_
private

This is the minimum score for counting as a match(1-cutoffScore_)