OpenMS
Digestor

Digests a protein database in-silico.

pot. predecessor tools → Digestor → pot. successor tools
none (FASTA input) IDFilter (peptide blacklist)

This application is used to digest a protein database to get all peptides given a cleavage enzyme.

The output can be used e.g. as a blacklist filter input to IDFilter, to remove certain peptides.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

Digestor -- Digests a protein database in-silico.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_Digestor.html
Version: 3.2.0-pre-nightly-2024-07-21 Jul 22 2024, 02:13:52, Revision: b650df0
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  Digestor <options>

Options (mandatory options marked with '*'):
  -in <file>*                  Input file (valid formats: 'fasta')
  -out <file>*                 Output file (peptides) (valid formats: 'idXML', 'fasta')
  -out_type <type>             Set this if you cannot control the filename of 'out', e.g., in TOPPAS. (valid:
                                'idXML', 'fasta')
  -missed_cleavages <number>   The number of allowed missed cleavages (default: '1') (min: '0')
  -min_length <number>         Minimum length of peptide (default: '6')
  -max_length <number>         Maximum length of peptide (default: '40')
  -enzyme <string>             The type of digestion enzyme (default: 'Trypsin') (valid: 'Arg-C/P', 'Asp-N', 
                               'Asp-N/B', 'Clostripain/P', 'elastase-trypsin-chymotrypsin', 'no cleavage', 
                               'unspecific cleavage', 'Trypsin', 'Arg-C', 'staphylococcal protease/D', 'proli
                               ne-endopeptidase/HKR', 'Glu-C+P', 'PepsinA + P', 'cyanogen-bromide', 'leukocyt
                               e elastase', 'proline endopeptidase', 'Asp-N_ambic', 'Chymotrypsin', 'Chymotry
                               psin/P', 'CNBr', 'Formic_acid', 'Lys-C', 'Lys-N', 'Lys-C/P', 'PepsinA', 'TrypC
                               hymo', 'Trypsin/P', 'V8-DE', 'V8-E', 'glutamyl endopeptidase', 'Alpha-lytic 
                               protease', '2-iodobenzoate', 'iodosobenzoate')

Options for FASTA output files:
  -FASTA:ID <option>           Identifier to use for each peptide: copy from parent protein (parent); a conse
                               cutive number (number); parent ID + consecutive number (both) (default: 'paren
                               t') (valid: 'parent', 'number', 'both')
  -FASTA:description <option>  Keep or remove the (possibly lengthy) FASTA header description. Keeping it 
                               can increase resulting FASTA file significantly. (default: 'remove') (valid: 
                               'remove', 'keep')

                               
Common TOPP options:
  -ini <file>                  Use the given TOPP INI file
  -threads <n>                 Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>            Writes the default configuration file
  --help                       Shows options
  --helphelp                   Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+DigestorDigests a protein database in-silico.
version3.2.0-pre-nightly-2024-07-21 Version of the tool that generated this parameters file.
++1Instance '1' section for 'Digestor'
in input fileinput file*.fasta
out Output file (peptides)output file*.idXML, *.fasta
out_type Set this if you cannot control the filename of 'out', e.g., in TOPPAS.idXML, fasta
missed_cleavages1 The number of allowed missed cleavages0:∞
min_length6 Minimum length of peptide
max_length40 Maximum length of peptide
enzymeTrypsin The type of digestion enzymeArg-C/P, Asp-N, Asp-N/B, Clostripain/P, elastase-trypsin-chymotrypsin, no cleavage, unspecific cleavage, Trypsin, Arg-C, staphylococcal protease/D, proline-endopeptidase/HKR, Glu-C+P, PepsinA + P, cyanogen-bromide, leukocyte elastase, proline endopeptidase, Asp-N_ambic, Chymotrypsin, Chymotrypsin/P, CNBr, Formic_acid, Lys-C, Lys-N, Lys-C/P, PepsinA, TrypChymo, Trypsin/P, V8-DE, V8-E, glutamyl endopeptidase, Alpha-lytic protease, 2-iodobenzoate, iodosobenzoate
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++FASTAOptions for FASTA output files
IDparent Identifier to use for each peptide: copy from parent protein (parent); a consecutive number (number); parent ID + consecutive number (both)parent, number, both
descriptionremove Keep or remove the (possibly lengthy) FASTA header description. Keeping it can increase resulting FASTA file significantly.remove, keep