OpenMS
2.7.0
|
Used to train a model for the prediction of proteotypic peptides.
The input consists of two files: One file contains the positive examples (the peptides which are proteotypic) and the other contains the negative examples (the nonproteotypic peptides).
Parts of this model has been described in the publication
Ole Schulz-Trieglaff, Nico Pfeifer, Clemens Gröpl, Oliver Kohlbacher and Knut Reinert LC-MSsim - a simulation software for Liquid Chromatography Mass Spectrometry data BMC Bioinformatics 2008, 9:423.
There are a number of parameters which can be changed for the svm (specified in the ini file):
The last five parameters (sigma, degree, c, nu and p) are used in a cross validation (CV) to find the best parameters according to the training set. Thus, you have to specify the start value of a parameter, the step size in which the parameters should be increased and a final value for the particular parameter such that the tested parameter is never bigger than the given final value. If you want to perform a cross validation, for example, for the parameter c, you have to specify c_start, c_step_size and c_stop in the ini file. Let's say you want to perform a CV for c from 0.1 to 2 with step size 0.1. Open up your ini-file with INIFileEditor and modify the fields c_start, c_step_size, and c_stop accordingly.
If the CV should test additional parameters in a certain range you just include them analogously to the example above. Furthermore, you can specify the number of partitions for the CV with number_of_partitions in the ini file and the number of runs with number_of_runs.
Consequently you have two choices to use this application:
The model can be used in PTPredict, to predict the likelihood for peptides to be proteotypic.
The command line parameters of this tool are:
PTModel -- Trains a model for the prediction of proteotypic peptides from a training set. Full documentation: http://www.openms.de/doxygen/release/2.7.0/html/TOPP_PTModel.html Version: 2.7.0 Sep 13 2021, 20:58:47, Revision: 9110e58 To cite OpenMS: Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959. Usage: PTModel <options> Options (mandatory options marked with '*'): -in_positive <file>* Input file with positive examples (valid formats: 'idXML') -in_negative <file>* Input file with negative examples (valid formats: 'idXML') -out <file>* Output file: the model in libsvm format (valid formats: 'txt') -out_oligo_params <file> Output file with additional model parameters when using the OLIGO kernel (valid formats: 'paramXML') -out_oligo_trainset <file> Output file with the used training dataset when using the OLIGO kernel (val id formats: 'txt') -c <float> The penalty parameter of the svm (default: '1.0') -svm_type <type> The type of the svm (NU_SVC or C_SVC) (default: 'C_SVC' valid: 'NU_SVC', 'C_SVC') -nu <float> The nu parameter [0..1] of the svm (for nu-SVR) (default: '0.5' min: '0.0' max: '1.0') -kernel_type <type> The kernel type of the svm (default: 'OLIGO' valid: 'LINEAR', 'RBF', 'POLY' , 'OLIGO') -degree <int> The degree parameter of the kernel function of the svm (POLY kernel) (defau lt: '1' min: '1') -border_length <int> Length of the POBK (default: '22' min: '1') -k_mer_length <int> K_mer length of the POBK (default: '1' min: '1') -sigma <float> Sigma of the POBK (default: '5.0') -max_positive_count <int> Quantity of positive samples for training (randomly chosen if smaller than available quantity) (default: '1000' min: '1') -max_negative_count <int> Quantity of positive samples for training (randomly chosen if smaller than available quantity) (default: '1000' min: '1') -redundant If the input sets are redundant and the redundant peptides should occur more than once in the training set, this flag has to be set -additive_cv If the step sizes should be interpreted additively (otherwise the actual value is multiplied with the step size to get the new value Parameters for the grid search / cross validation:: -cv:skip_cv Has to be set if the cv should be skipped and the model should just be trai ned with the specified parameters. -cv:number_of_runs <int> Number of runs for the CV (default: '10' min: '1') -cv:number_of_partitions <int> Number of CV partitions (default: '10' min: '2') -cv:degree_start <int> Starting point of degree (default: '1' min: '1') -cv:degree_step_size <int> Step size point of degree (default: '2') -cv:degree_stop <int> Stopping point of degree (default: '4') -cv:c_start <float> Starting point of c (default: '1.0') -cv:c_step_size <float> Step size of c (default: '100.0') -cv:c_stop <float> Stopping point of c (default: '1000.0') -cv:nu_start <float> Starting point of nu (default: '0.1' min: '0.0' max: '1.0') -cv:nu_step_size <float> Step size of nu (default: '1.3') -cv:nu_stop <float> Stopping point of nu (default: '0.9' min: '0.0' max: '1.0') -cv:sigma_start <float> Starting point of sigma (default: '1.0') -cv:sigma_step_size <float> Step size of sigma (default: '1.3') -cv:sigma_stop <float> Stopping point of sigma (default: '15.0') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool: