OpenMS
2.7.0
|
A peptide-centric algorithm for protein inference.
pot. predecessor tools | ProteinResolver | pot. successor tools |
IDFilter | (external) |
This tool is an imlementation of
Meyer-Arendt K, Old WM, et al. (2011)
IsoformResolver: A peptide-centric algorithm for protein inference
Journal of Proteome Research 10 (7): 3060-75, DOI: 10.1021/pr200039p
The algorithm tries to assign to each protein its experimentally validated peptide (meaning you should supply peptides with have undergone FDR filtering or alike). Proteins are grouped into ISD groups (in-silico derived) and MSD groups (MS/MS derived) if they have in-silico derived or MS/MS derived peptides in common. Proteins and peptides span a bipartite graph. There is an edge between a protein node and a peptide node if and only if the protein contains the peptide. ISD groups are connected graphs in the forementionend bipartite graph. MSD groups are subgraphs of ISD groups. For further information see above paper.
Remark: If parameter in
is given, in_path
is ignored. Parameter in_path
is considered only if in
is empty.
Input
Since the ProteinResolver offers two different input parameters, there are some possibilites how to use this TOPP tool.
in
) The ProteinResolver simply performs the protein inference based on the above mentioned algortihm of Meyer-Arendt et al. (2011) for that specific file.
in
or in_path
) Output
Four possible outputs are available:
The results for different input files are appended and written into the same output file. In other words, no matter how many input files you have, you will end up with one single output file.
Text file format of the quantitative experimental design:
The text file has to be column-based and must contain only one additional line as header. The header must specify two specific columns that represents the file name and an identifier for the experimental setup. These two header identifiers can be defined as parameter and must be unique (default: "File" and "ExperimentalSetting"). There are four options how the columns can be separated: tabulator, comma, semi-colon and whitespace.
Example for text file format:
Slice | File | ExperimentalSetting |
1 | SILAC_2_1 | S1224 |
4 | SILAC_3_4 | D1224 |
2 | SILAC_10_2 | S1224 |
7 | SILAC_8_7 | S1224 |
In this case the values of the parameters "experiment" and "file" which are by default set to "ExperimentalSetting" and "File", respectively, are ok. If you use other column headers you need to change these parameters.
The separator should be changed if the file is not tab separated. Every other column (here: first column) is just ignored. Not every file mentioned in the design file has to be given as input file; and every input file that has no match in the design file is ignored for the computation.
Consider the following scenario:
Input files: SILAC_2_1.consensusXML, SILAC_3_4.consensusXML, SILAC_10_2.consensusXML and SILAC_8_7_.consensusXML
First step: Data from SILAC_2_1.consensusXML and SILAC_10_2.consensusXML is merged, because both files can be mapped to the same setting S1224. SILAC_8_7_.consensusXML is ignored, since SILAC_8_7_ is no match to SILAC_8_7.
Second step: ProteinResolver computes results for the merged data, and the data from the file SILAC_3_4.
Third step: ProteinResolver writes the results for experimental setting S1224 and D1224 to the same output file.
The command line parameters of this tool are:
ProteinResolver -- protein inference Full documentation: http://www.openms.de/doxygen/release/2.7.0/html/TOPP_ProteinResolver.html Version: 2.7.0 Sep 13 2021, 20:58:47, Revision: 9110e58 To cite OpenMS: Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959. Usage: ProteinResolver <options> Options (mandatory options marked with '*'): -fasta <file>* Input database file (valid formats: 'fasta') -in <file(s)> Input file(s) holding experimental data (valid formats: 'idXML', 'cons ensusXML') -in_path <file> Path to idXMLs or consensusXMLs files. Ignored if 'in' is given. -design <file> Text file containing the experimental design. See documentation for specific format requirements (valid formats: 'txt') -protein_groups <file> Output file. Contains all protein groups (valid formats: 'csv') -peptide_table <file> Output file. Contains one peptide per line and all proteins which cont ain that peptide (valid formats: 'csv') -protein_table <file> Output file. Contains one protein per line (valid formats: 'csv') Additional options for algorithm: -resolver:missed_cleavages <number> Number of allowed missed cleavages (default: '2' min: '0') -resolver:min_length <number> Minimum length of peptide (default: '6' min: '1') -resolver:enzyme <choice> Digestion enzyme (default: 'Trypsin' valid: 'Trypsin') Additional options for quantitative experimental design: -designer:experiment <text> Identifier for the experimental design. (default: 'ExperimentalSetting ') -designer:file <text> Identifier for the file name. (default: 'File') -designer:separator <choice> Separator, which should be used to split a row into columns (default: 'tab' valid: 'tab', 'semi-colon', 'comma', 'whitespace') Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (defaul t: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool: