OpenMS
2.5.0
|
Used to train a model for peptide retention time prediction or peptide separation prediction.
For retention time prediction, a support vector machine is trained with peptide sequences and their measured retention times. For peptide separation prediction, two files have to be given: One file contains the positive examples (the peptides which are collected) and the other contains the negative examples (the flowthrough peptides).
These methods and applications of this model are described in the following publications:
Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Statistical learning of peptide retention behavior in chromatographic separations: A new kernel-based approach for computational proteomics. BMC Bioinformatics 2007, 8:468
Nico Pfeifer, Andreas Leinenbach, Christian G. Huber and Oliver Kohlbacher Improving Peptide Identification in Proteome Analysis by a Two-Dimensional Retention Time Filtering Approach J. Proteome Res. 2009, 8(8):4109-15
There are a number of parameters which can be changed for the svm (specified in the ini file and command line):
The last five parameters (sigma, degree, c, nu and p) can be used in a cross validation (CV) to find the best parameters according to the training set. Therefore you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation for example for the parameter c, enable CV (across all 5 parameters) and set skip_cv to false in the INI file. This can be easily done with using the INIFileEditor.
Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.
Consequently you have two choices to use this application:
The model can be used in RTPredict, to predict retention times for peptides or peptide separation depending on how you trained the model.
The command line parameters of this tool are:
RTModel -- Trains a model for the retention time prediction of peptides from a training set. Full documentation: http://www.openms.de/documentation/TOPP_RTModel.html Version: 2.5.0 Feb 20 2020, 20:13:06, Revision: f649042 To cite OpenMS: Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959. Usage: RTModel <options> Options (mandatory options marked with '*'): -in <file> This is the name of the input file (RT prediction). It is assumed that the file type is idXML. Alternatively you can provide a .txt file having a sequ ence and the corresponding rt per line. (valid formats: 'idXML', 'txt') -in_positive <file> Input file with positive examples (peptide separation prediction) (valid formats: 'idXML') -in_negative <file> Input file with negative examples (peptide separation prediction) (valid formats: 'idXML') -out <file>* Output file: the model in libsvm format (valid formats: 'txt') -out_oligo_params <file> Output file with additional model parameters when using the OLIGO kernel (valid formats: 'paramXML') -out_oligo_trainset <file> Output file with the used training dataset when using the OLIGO kernel (val id formats: 'txt') -svm_type <type> The type of the svm (NU_SVR or EPSILON_SVR for RT prediction, automatically set to C_SVC for separation prediction) (default: 'NU_SVR' valid: 'NU_SVR', 'NU_SVC', 'EPSILON_SVR', 'C_SVC') -nu <float> The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5' min: '0.0' max: '1.0') -p <float> The epsilon parameter of the svm (for epsilon-SVR) (default: '0.1') -c <float> The penalty parameter of the svm (default: '1.0') -kernel_type <type> The kernel type of the svm (default: 'OLIGO' valid: 'LINEAR', 'RBF', 'POLY' , 'OLIGO') -degree <int> The degree parameter of the kernel function of the svm (POLY kernel) (default: '1' min: '1') -border_length <int> Length of the POBK (default: '22' min: '1') -max_std <float> Max standard deviation for a peptide to be included (if there are several ones for one peptide string)(median is taken) (default: '10.0' min: '0.0') -k_mer_length <int> K_mer length of the POBK (default: '1' min: '1') -sigma <float> Sigma of the POBK (default: '5.0') -total_gradient_time <time> The time (in seconds) of the gradient (only for RT prediction) (default: '1.0' min: '1.0e-05') -first_dim_rt If set the model will be built for first_dim_rt -additive_cv If the step sizes should be interpreted additively (otherwise the actual value is multiplied with the step size to get the new value Parameters for the grid search / cross validation:: -cv:skip_cv Set to enable Cross-Validation or set to true if the model should just be trained with 1 set of specified parameters. -cv:number_of_runs <int> Number of runs for the CV (each run creates a new random partition of the data) (default: '1' min: '1') -cv:number_of_partitions <int> Number of CV partitions (default: '10' min: '2') -cv:degree_start <int> Starting point of degree (default: '1' min: '1') -cv:degree_step_size <int> Step size point of degree (default: '2') -cv:degree_stop <int> Stopping point of degree (default: '4') -cv:p_start <float> Starting point of p (default: '1.0') -cv:p_step_size <float> Step size point of p (default: '10.0') -cv:p_stop <float> Stopping point of p (default: '1000.0') -cv:c_start <float> Starting point of c (default: '1.0') -cv:c_step_size <float> Step size of c (default: '10.0') -cv:c_stop <float> Stopping point of c (default: '1000.0') -cv:nu_start <float> Starting point of nu (default: '0.3' min: '0.0' max: '1.0') -cv:nu_step_size <float> Step size of nu (default: '1.2') -cv:nu_stop <float> Stopping point of nu (default: '0.7' min: '0.0' max: '1.0') -cv:sigma_start <float> Starting point of sigma (default: '1.0') -cv:sigma_step_size <float> Step size of sigma (default: '1.3') -cv:sigma_stop <float> Stopping point of sigma (default: '15.0') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool: