OpenMS
SimpleSVM Class Reference

Simple interface to support vector machines for classification and regression (via LIBSVM). More...

#include <OpenMS/ANALYSIS/SVM/SimpleSVM.h>

Inheritance diagram for SimpleSVM:
[legend]
Collaboration diagram for SimpleSVM:
[legend]

Classes

struct  Prediction
 SVM/SVR prediction result. More...
 

Public Types

typedef std::map< String, std::vector< double > > PredictorMap
 Mapping from predictor name to vector of predictor values. More...
 
typedef std::map< String, std::pair< double, double > > ScaleMap
 Mapping from predictor name to predictor min and max. More...
 

Public Member Functions

 SimpleSVM ()
 Default constructor. More...
 
 ~SimpleSVM () override
 Destructor. More...
 
void setup (PredictorMap &predictors, const std::map< Size, double > &outcomes, bool classification=true)
 Load data and train a model. More...
 
void predict (std::vector< Prediction > &predictions, std::vector< Size > indexes=std::vector< Size >()) const
 Predict class labels or regression values (and probabilities). More...
 
void predict (PredictorMap &predictors, std::vector< Prediction > &predictions) const
 Predict class labels or regression values (and probabilities). More...
 
void getFeatureWeights (std::map< String, double > &feature_weights) const
 Get the weights used for features (predictors) in the SVM model. More...
 
void writeXvalResults (const String &path) const
 Write cross-validation (parameter optimization) results to a CSV file. More...
 
const ScaleMapgetScaling () const
 Get data range of predictors before scaling to [0, 1]. More...
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const String &name)
 Constructor with name that is displayed in error messages. More...
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor. More...
 
virtual ~DefaultParamHandler ()
 Destructor. More...
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator. More...
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator. More...
 
void setParameters (const Param &param)
 Sets the parameters. More...
 
const ParamgetParameters () const
 Non-mutable access to the parameters. More...
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters. More...
 
const StringgetName () const
 Non-mutable access to the name. More...
 
void setName (const String &name)
 Mutable access to the name. More...
 
const std::vector< String > & getSubsections () const
 Non-mutable access to the registered subsections. More...
 

Protected Types

typedef std::vector< std::vector< std::vector< double > > > SVMPerformance
 Classification (or regression) performance for different param. combinations (C/gamma/p): More...
 

Protected Member Functions

void clear_ ()
 
void scaleData_ (PredictorMap &predictors)
 Scale predictor values to range 0-1. More...
 
void convertData_ (const PredictorMap &predictors)
 Convert predictors to LIBSVM format. More...
 
std::tuple< double, double, double > chooseBestParameters_ (bool higher_better) const
 Choose best SVM parameters based on cross-validation results. More...
 
void optimizeParameters_ (bool classification)
 Run cross-validation to optimize SVM parameters. More...
 
- Protected Member Functions inherited from DefaultParamHandler
virtual void updateMembers_ ()
 This method is used to update extra member variables at the end of the setParameters() method. More...
 
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor. More...
 

Static Protected Member Functions

static void printNull_ (const char *)
 Dummy function to suppress LIBSVM output. More...
 

Protected Attributes

std::vector< std::vector< struct svm_node > > nodes_
 Values of predictors (LIBSVM format) More...
 
struct svm_problem data_
 SVM training data (LIBSVM format) More...
 
struct svm_parameter svm_params_
 SVM parameters (LIBSVM format) More...
 
struct svm_model * model_
 Pointer to SVM model (LIBSVM format) More...
 
std::vector< Stringpredictor_names_
 Names of predictors in the model (excluding uninformative ones) More...
 
Size n_parts_
 Number of partitions for cross-validation. More...
 
std::vector< double > log2_C_
 Parameter values to try during optimization. More...
 
std::vector< double > log2_gamma_
 
std::vector< double > log2_p_
 
ScaleMap scaling_
 Mapping from predictor name to predictor min and max. More...
 
SVMPerformance performance_
 Cross-validation results. More...
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters. More...
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes! More...
 
std::vector< Stringsubsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes! More...
 
String error_name_
 Name that is displayed in error messages during the parameter checking. More...
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;. More...
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;. More...
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="")
 Writes all parameters to meta values. More...
 

Detailed Description

Simple interface to support vector machines for classification and regression (via LIBSVM).

This class supports:

  • (multi-class) classification and regression with an epsilon-SVR.
  • linear or RBF kernel.

It uses cross-validation to optimize the respective SVM/SVR parameters C, p (SVR-only) and (RBF kernel only) gamma.

Usage: SVM models are generated by the the setup() method. SimpleSVM provides two common use cases for convinience:

  • all data: Passing all data (training + test) as predictors to setup and training on a subset.
  • training only: Passing exclusivly training data as predictors to setup. The parameter outcomes of setup() defines in both cases the training set; it contains the indexes of observations (corresponding to positions in the vectors in predictors) together with the class labels (or regression values) for training.

Given N observations of M predictors, the data are coded as a map of predictors (size M), each a numeric vector of values for different observations (size N).

To predict class labels (or regression values) based on a model, use one of the predict() methods:

  • if setup with all data: the parameter indexes of predict() takes a vector of indexes corresponding to the observations for which predictions should be made. (With an empty vector, the default, predictions are made for all observations, including those used for training.)
  • if setup with training data only: by passing a new PredictorMap.
Parameters of this class are:

NameTypeDefaultRestrictionsDescription
kernel stringRBF RBF, linearSVM kernel
xval int5 min: 1Number of partitions for cross-validation (parameter optimization)
log2_C float list[-5.0, -3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0, 11.0, 13.0, 15.0]  Values to try for the SVM parameter 'C' during parameter optimization. A value 'x' is used as 'C = 2^x'.
log2_gamma float list[-15.0, -13.0, -11.0, -9.0, -7.0, -5.0, -3.0, -1.0, 1.0, 3.0]  Values to try for the SVM parameter 'gamma' during parameter optimization (RBF kernel only). A value 'x' is used as 'gamma = 2^x'.
log2_p float list[-15.0, -12.0, -9.0, -6.0, -3.32192809489, 0.0, 3.32192809489, 6.0, 9.0, 12.0, 15.0]  Values to try for the SVM parameter 'epsilon' during parameter optimization (epsilon-SVR only). A value 'x' is used as 'epsilon = 2^x'.
epsilon float1.0e-03 min: 0.0Stopping criterion
cache_size float100.0 min: 1.0Size of the kernel cache (in MB)
no_shrinking stringfalse true, falseDisable the shrinking heuristics

Note:
  • If a section name is documented, the documentation is displayed as tooltip.
  • Advanced parameter names are italic.

Class Documentation

◆ OpenMS::SimpleSVM::Prediction

struct OpenMS::SimpleSVM::Prediction

SVM/SVR prediction result.

Collaboration diagram for SimpleSVM::Prediction:
[legend]
Class Members
double outcome Predicted class label (or regression value)
map< double, double > probabilities Class label (or regression value) and their predicted probabilities.

Member Typedef Documentation

◆ PredictorMap

typedef std::map<String, std::vector<double> > PredictorMap

Mapping from predictor name to vector of predictor values.

◆ ScaleMap

typedef std::map<String, std::pair<double, double> > ScaleMap

Mapping from predictor name to predictor min and max.

◆ SVMPerformance

typedef std::vector<std::vector<std::vector<double> > > SVMPerformance
protected

Classification (or regression) performance for different param. combinations (C/gamma/p):

Constructor & Destructor Documentation

◆ SimpleSVM()

SimpleSVM ( )

Default constructor.

◆ ~SimpleSVM()

~SimpleSVM ( )
override

Destructor.

Member Function Documentation

◆ chooseBestParameters_()

std::tuple<double, double, double> chooseBestParameters_ ( bool  higher_better) const
protected

Choose best SVM parameters based on cross-validation results.

◆ clear_()

void clear_ ( )
protected

◆ convertData_()

void convertData_ ( const PredictorMap predictors)
protected

Convert predictors to LIBSVM format.

◆ getFeatureWeights()

void getFeatureWeights ( std::map< String, double > &  feature_weights) const

Get the weights used for features (predictors) in the SVM model.

Currently only supported for two-class classification. If a linear kernel is used, the weights are informative for ranking features.

Exceptions
Exception::Preconditionif no model has been trained, or if the classification involves more than two classes

◆ getScaling()

const ScaleMap& getScaling ( ) const

Get data range of predictors before scaling to [0, 1].

◆ optimizeParameters_()

void optimizeParameters_ ( bool  classification)
protected

Run cross-validation to optimize SVM parameters.

◆ predict() [1/2]

void predict ( PredictorMap predictors,
std::vector< Prediction > &  predictions 
) const

Predict class labels or regression values (and probabilities).

Parameters
predictorsMapping from predictor name to vector of predictor values (for different observations). All vectors should have the same length; values will be changed by scaling applied to training data in setup.
predictionsOutput vector of prediction results (same order as indexes).
Exceptions
Exception::Preconditionif no model has been trained
Exception::InvalidValueif an invalid index is used in indexes

◆ predict() [2/2]

void predict ( std::vector< Prediction > &  predictions,
std::vector< Size indexes = std::vector< Size >() 
) const

Predict class labels or regression values (and probabilities).

Parameters
predictionsOutput vector of prediction results (same order as indexes).
indexesVector of observation indexes for which predictions are desired. If empty (default), predictions are made for all observations.
Exceptions
Exception::Preconditionif no model has been trained
Exception::InvalidValueif an invalid index is used in indexes

◆ printNull_()

static void printNull_ ( const char *  )
inlinestaticprotected

Dummy function to suppress LIBSVM output.

◆ scaleData_()

void scaleData_ ( PredictorMap predictors)
protected

Scale predictor values to range 0-1.

◆ setup()

void setup ( PredictorMap predictors,
const std::map< Size, double > &  outcomes,
bool  classification = true 
)

Load data and train a model.

Parameters
predictorsMapping from predictor name to vector of predictor values (for different observations). All vectors should have the same length; values will be changed by scaling.
outcomesMapping from observation index to class label or regression value in the training set.
classificationtrue (default) if SVM classification should be used, SVR otherwise
Exceptions
Exception::IllegalArgumentif predictors is empty
Exception::InvalidValueif an invalid index is used in outcomes
Exception::MissingInformationif there are fewer than two class labels in outcomes, or if there are not enough observations for cross-validation

◆ writeXvalResults()

void writeXvalResults ( const String path) const

Write cross-validation (parameter optimization) results to a CSV file.

Member Data Documentation

◆ data_

struct svm_problem data_
protected

SVM training data (LIBSVM format)

◆ log2_C_

std::vector<double> log2_C_
protected

Parameter values to try during optimization.

◆ log2_gamma_

std::vector<double> log2_gamma_
protected

◆ log2_p_

std::vector<double> log2_p_
protected

◆ model_

struct svm_model* model_
protected

Pointer to SVM model (LIBSVM format)

◆ n_parts_

Size n_parts_
protected

Number of partitions for cross-validation.

◆ nodes_

std::vector<std::vector<struct svm_node> > nodes_
protected

Values of predictors (LIBSVM format)

◆ performance_

SVMPerformance performance_
protected

Cross-validation results.

◆ predictor_names_

std::vector<String> predictor_names_
protected

Names of predictors in the model (excluding uninformative ones)

◆ scaling_

ScaleMap scaling_
protected

Mapping from predictor name to predictor min and max.

◆ svm_params_

struct svm_parameter svm_params_
protected

SVM parameters (LIBSVM format)