OpenMS  2.5.0
RTModel

Used to train a model for peptide retention time prediction or peptide separation prediction.

For retention time prediction, a support vector machine is trained with peptide sequences and their measured retention times. For peptide separation prediction, two files have to be given: One file contains the positive examples (the peptides which are collected) and the other contains the negative examples (the flowthrough peptides).

These methods and applications of this model are described in the following publications:

Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468

Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15

There are a number of parameters which can be changed for the svm (specified in the ini file and command line):

  • svm_type: the type of the svm (can be NU_SVR or EPSILON_SVR for RT prediction and is C_SVC for separation prediction)
  • kernel_type: the kernel function (e.g., POLY for the polynomial kernel, LINEAR for the linear kernel or RBF for the gaussian kernel); we recommend SVMWrapper::OLIGO for our paired oligo-border kernel (POBK)
  • border_length: border length for the POBK
  • k_mer_length: length of the signals considered in the POBK
  • sigma: the amount of positional smoothing for the POBK
  • degree: the degree parameter for the polynomial kernel
  • c: the penalty parameter of the svm
  • nu: the nu parameter for nu-SVR
  • p: the epsilon parameter for epsilon-SVR


The last five parameters (sigma, degree, c, nu and p) can be used in a cross validation (CV) to find the best parameters according to the training set. Therefore you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation for example for the parameter c, enable CV (across all 5 parameters) and set skip_cv to false in the INI file. This can be easily done with using the INIFileEditor.

Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.


Consequently you have two choices to use this application:

  1. Set the parameters of the svm: The RTModel application will train the svm with the training data and store the svm model
  2. Give a range of parameters for which a CV should be performed: The RTModel application will perform a CV to find the best parameter combination in the given range and afterwards train the svm with the best parameters and the whole training data. Then the model is stored.


The model can be used in RTPredict, to predict retention times for peptides or peptide separation depending on how you trained the model.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

RTModel -- Trains a model for the retention time prediction of peptides from a training set.
Full documentation: http://www.openms.de/documentation/TOPP_RTModel.html
Version: 2.5.0 Feb 20 2020, 20:13:06, Revision: f649042
To cite OpenMS:
  Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  RTModel <options>

Options (mandatory options marked with '*'):
  -in <file>                      This is the name of the input file (RT prediction). It is assumed that the 
                                  file type is idXML. Alternatively you can provide a .txt file having a sequ
                                  ence and the corresponding rt per line.
                                  (valid formats: 'idXML', 'txt')
  -in_positive <file>             Input file with positive examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -in_negative <file>             Input file with negative examples (peptide separation prediction)
                                  (valid formats: 'idXML')
  -out <file>*                    Output file: the model in libsvm format (valid formats: 'txt')
  -out_oligo_params <file>        Output file with additional model parameters when using the OLIGO kernel 
                                  (valid formats: 'paramXML')
  -out_oligo_trainset <file>      Output file with the used training dataset when using the OLIGO kernel (val
                                  id formats: 'txt')
  -svm_type <type>                The type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically
                                  set
                                  to C_SVC for separation prediction)
                                  (default: 'NU_SVR' valid: 'NU_SVR', 'NU_SVC', 'EPSILON_SVR', 'C_SVC')
  -nu <float>                     The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5' min: '0.0' 
                                  max: '1.0')
  -p <float>                      The epsilon parameter of the svm (for epsilon-SVR) (default: '0.1')
  -c <float>                      The penalty parameter of the svm (default: '1.0')
  -kernel_type <type>             The kernel type of the svm (default: 'OLIGO' valid: 'LINEAR', 'RBF', 'POLY'
                                  , 'OLIGO')
  -degree <int>                   The degree parameter of the kernel function of the svm (POLY kernel)
                                  (default: '1' min: '1')
  -border_length <int>            Length of the POBK (default: '22' min: '1')
  -max_std <float>                Max standard deviation for a peptide to be included (if there are several 
                                  ones for one peptide string)(median is taken) (default: '10.0' min: '0.0')
  -k_mer_length <int>             K_mer length of the POBK (default: '1' min: '1')
  -sigma <float>                  Sigma of the POBK (default: '5.0')
  -total_gradient_time <time>     The time (in seconds) of the gradient (only for RT prediction) (default: 
                                  '1.0' min: '1.0e-05')
  -first_dim_rt                   If set the model will be built for first_dim_rt
  -additive_cv                    If the step sizes should be interpreted additively (otherwise the actual 
                                  value is multiplied
                                  with the step size to get the new value
                                  

Parameters for the grid search / cross validation::
  -cv:skip_cv                     Set to enable Cross-Validation or set to true if the model should just be 
                                  trained with 1 set of specified parameters.
  -cv:number_of_runs <int>        Number of runs for the CV (each run creates a new random partition of the 
                                  data) (default: '1' min: '1')
  -cv:number_of_partitions <int>  Number of CV partitions (default: '10' min: '2')
  -cv:degree_start <int>          Starting point of degree (default: '1' min: '1')
  -cv:degree_step_size <int>      Step size point of degree (default: '2')
  -cv:degree_stop <int>           Stopping point of degree (default: '4')
  -cv:p_start <float>             Starting point of p (default: '1.0')
  -cv:p_step_size <float>         Step size point of p (default: '10.0')
  -cv:p_stop <float>              Stopping point of p (default: '1000.0')
  -cv:c_start <float>             Starting point of c (default: '1.0')
  -cv:c_step_size <float>         Step size of c (default: '10.0')
  -cv:c_stop <float>              Stopping point of c (default: '1000.0')
  -cv:nu_start <float>            Starting point of nu (default: '0.3' min: '0.0' max: '1.0')
  -cv:nu_step_size <float>        Step size of nu (default: '1.2')
  -cv:nu_stop <float>             Stopping point of nu (default: '0.7' min: '0.0' max: '1.0')
  -cv:sigma_start <float>         Starting point of sigma (default: '1.0')
  -cv:sigma_step_size <float>     Step size of sigma (default: '1.3')
  -cv:sigma_stop <float>          Stopping point of sigma (default: '15.0')

                                  
Common TOPP options:
  -ini <file>                     Use the given TOPP INI file
  -threads <n>                    Sets the number of threads allowed to be used by the TOPP tool (default: 
                                  '1')
  -write_ini <file>               Writes the default configuration file
  --help                          Shows options
  --helphelp                      Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+RTModelTrains a model for the retention time prediction of peptides from a training set.
version2.5.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'RTModel'
in This is the name of the input file (RT prediction). It is assumed that the file type is idXML. Alternatively you can provide a .txt file having a sequence and the corresponding rt per line.
input file*.idXML,*.txt
in_positive input file with positive examples (peptide separation prediction)
input file*.idXML
in_negative input file with negative examples (peptide separation prediction)
input file*.idXML
out output file: the model in libsvm formatoutput file*.txt
out_oligo_params output file with additional model parameters when using the OLIGO kerneloutput file*.paramXML
out_oligo_trainset output file with the used training dataset when using the OLIGO kerneloutput file*.txt
svm_typeNU_SVR the type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically set
to C_SVC for separation prediction)
NU_SVR,NU_SVC,EPSILON_SVR,C_SVC
nu0.5 the nu parameter [0..1] of the svm (for nu-SVR)0.0:1.0
p0.1 the epsilon parameter of the svm (for epsilon-SVR)
c1.0 the penalty parameter of the svm
kernel_typeOLIGO the kernel type of the svmLINEAR,RBF,POLY,OLIGO
degree1 the degree parameter of the kernel function of the svm (POLY kernel)
1:∞
border_length22 length of the POBK1:∞
max_std10.0 max standard deviation for a peptide to be included (if there are several ones for one peptide string)(median is taken)0.0:∞
k_mer_length1 k_mer length of the POBK1:∞
sigma5.0 sigma of the POBK
total_gradient_time1.0 the time (in seconds) of the gradient (only for RT prediction)1.0e-05:∞
first_dim_rtfalse if set the model will be built for first_dim_rttrue,false
additive_cvfalse if the step sizes should be interpreted additively (otherwise the actual value is multiplied
with the step size to get the new value
true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false
+++cvParameters for the grid search / cross validation:
skip_cvfalse Set to enable Cross-Validation or set to true if the model should just be trained with 1 set of specified parameters.true,false
number_of_runs1 number of runs for the CV (each run creates a new random partition of the data)1:∞
number_of_partitions10 number of CV partitions2:∞
degree_start1 starting point of degree1:∞
degree_step_size2 step size point of degree
degree_stop4 stopping point of degree
p_start1.0 starting point of p
p_step_size10.0 step size point of p
p_stop1000.0 stopping point of p
c_start1.0 starting point of c
c_step_size10.0 step size of c
c_stop1000.0 stopping point of c
nu_start0.3 starting point of nu0.0:1.0
nu_step_size1.2 step size of nu
nu_stop0.7 stopping point of nu0.0:1.0
sigma_start1.0 starting point of sigma
sigma_step_size1.3 step size of sigma
sigma_stop15.0 stopping point of sigma