BALL  1.4.2
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Groups Pages
List of all members
BALL::QSAR::FeatureSelection Class Reference

#include <BALL/QSAR/featureSelection.h>

Public Member Functions

Constructors and Destructors
 FeatureSelection (Model &m)
 
 FeatureSelection (KernelModel &m)
 
 ~FeatureSelection ()
 

Accessors

void setModel (Model &m)
 
void setModel (KernelModel &km)
 
void forwardSelection (int k=4, bool optPar=0)
 
void backwardSelection (int k=4, bool optPar=0)
 
void stepwiseSelection (int k=4, bool optPar=0)
 
void twinScan (int k, bool optPar=0)
 
void implicitSelection (LinearModel &lm, int act=1, double d=1)
 
void removeHighlyCorrelatedFeatures (double &cor_threshold)
 
void removeLowResponseCorrelation (double &min_correlation)
 
void removeEmptyDescriptors ()
 
void selectStat (int s)
 
void setQualityIncreaseCutoff (double &d)
 
void updateWeights (std::multiset< unsigned int > &oldDescIDs, std::multiset< unsigned int > &newDescIDs, Vector< double > &oldWeights)
 

Attributes

Modelmodel_
 
Vector< double > * weights_
 
double quality_increase_cutoff_
 
std::multiset< unsigned int > * findIrrelevantDescriptors ()
 
void forward (bool stepwise, int k, bool optPar)
 

Detailed Description

Definition at line 48 of file featureSelection.h.

Constructor & Destructor Documentation

BALL::QSAR::FeatureSelection::FeatureSelection ( Model m)
BALL::QSAR::FeatureSelection::FeatureSelection ( KernelModel m)
BALL::QSAR::FeatureSelection::~FeatureSelection ( )

Member Function Documentation

void BALL::QSAR::FeatureSelection::backwardSelection ( int  k = 4,
bool  optPar = 0 
)
starts backward selection. \n

In order to evaluate how much a descriptor increases the accuracy of the model, cross-validation is started in each step using descriptor_matrix from class QSARData as data source.

Parameters
optPar1 : Model.optimizeParameters() is used to try to find the optimal parameters during each step of feature selection.
0: Model.optimizeParameters() is not used during feature selection
std::multiset<unsigned int>* BALL::QSAR::FeatureSelection::findIrrelevantDescriptors ( )
private
searches for empty or irrelevant descriptors and returns a sorted list containing their IDs.


If more than one feature selection method is applied, all descriptors that have not been selected by the previous method are considered to be irrelevant.

void BALL::QSAR::FeatureSelection::forward ( bool  stepwise,
int  k,
bool  optPar 
)
private

implements forward selection; if stepwise==1, backwardSelection() is called after each forward step, i.e. after adding a feature.

void BALL::QSAR::FeatureSelection::forwardSelection ( int  k = 4,
bool  optPar = 0 
)
starts forward selection. \n

In order to evaluate how much a descriptor increases the accuracy of the model, cross-validation is started in each step using descriptor_matrix from class QSARData as data source.

Parameters
optPar1 : Model.optimizeParameters() is used to try to find the optimal parameters during each step of feature selection.
0: Model.optimizeParameters() is not used during feature selection
void BALL::QSAR::FeatureSelection::implicitSelection ( LinearModel lm,
int  act = 1,
double  d = 1 
)
uses the coefficients generated by a linear regression model (LinearModel.training_result) in order to select features.\n

All descriptors whose coefficients are within 0 +/- d*stddev are considered to be unimportant and are not selected.
Futhermore, if feature selection has already been done on FeatureSelection->model, only those descriptors that are already part of lm AND of FeatureSelection->model are tested.

Parameters
actdetermines which coefficients are to be used, i.e. which column of LinearModel.training_result
void BALL::QSAR::FeatureSelection::removeEmptyDescriptors ( )

removes descriptors whose values are 0 in all substances from the list of selected features

void BALL::QSAR::FeatureSelection::removeHighlyCorrelatedFeatures ( double cor_threshold)
reomves features that are highly correlated to another feature.
Parameters
cor_thresholdall feature which a correlation (to another feature) > cor_threshold or < cor_threshold are removed
void BALL::QSAR::FeatureSelection::removeLowResponseCorrelation ( double min_correlation)

removes those features that do not have a correlation greater than the specified value to any of the response variables

void BALL::QSAR::FeatureSelection::selectStat ( int  s)
void BALL::QSAR::FeatureSelection::setModel ( Model m)

set the model, or which feature selection is to be done

void BALL::QSAR::FeatureSelection::setModel ( KernelModel km)
void BALL::QSAR::FeatureSelection::setQualityIncreaseCutoff ( double d)
Sets a cutoff value for feature selections. \n

If the preditive quality is increased by less than d after adding/removing a descriptor, feature selection is stopped.

void BALL::QSAR::FeatureSelection::stepwiseSelection ( int  k = 4,
bool  optPar = 0 
)
void BALL::QSAR::FeatureSelection::twinScan ( int  k,
bool  optPar = 0 
)
Does a simple check consisting of two successive scans of all features.\n

In the first scan, the best feature to start with is searched.
In the second scan, it is checked for each remaining (non-empty) descriptor whether it can increase the prediction quality. The features are tested in the descending order of their predictive qualities as determined in the first scan.
Thus, this method is particularly suited for models that consider all features to be independent for each other (e.g. Bayesian classifiaction models).

void BALL::QSAR::FeatureSelection::updateWeights ( std::multiset< unsigned int > &  oldDescIDs,
std::multiset< unsigned int > &  newDescIDs,
Vector< double > &  oldWeights 
)
private

Member Data Documentation

Model* BALL::QSAR::FeatureSelection::model_
private

pointer to the model, for which feature selection is to be done

Definition at line 133 of file featureSelection.h.

double BALL::QSAR::FeatureSelection::quality_increase_cutoff_
private

if the preditive quality is increased by less than this value after adding/removing a descriptor, feature selection is stopped.

Definition at line 142 of file featureSelection.h.

Vector<double>* BALL::QSAR::FeatureSelection::weights_
private

pointer to KernelModel.weights (if the model to be optimized is a KernelModel)

Definition at line 136 of file featureSelection.h.