Digests a protein database in-silico.

pot. predecessor tools	→ Digestor →	pot. successor tools
none (FASTA input)	→ Digestor →	IDFilter (peptide blacklist)

This application is used to digest a protein database to get all peptides given a cleavage enzyme.

The output can be used e.g. as a blacklist filter input to IDFilter, to remove certain peptides.

Note: Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

Digestor -- Digests a protein database in-silico.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_Digestor.html
Version: 3.6.0-pre-nightly-2026-01-31 Jan 31 2026, 01:46:01, Revision: d8ac3d6
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spec
   trometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  Digestor <options>

Options (mandatory options marked with '*'):
  -in <file>*                  Input file (valid formats: 'fasta')
  -out <file>*                 Output file (peptides) (valid formats: 'idXML', 'fasta')
  -out_type <type>             Set this if you cannot control the filename of 'out', e.g., in TOPPAS. (valid:
                                'idXML', 'fasta')
  -missed_cleavages <number>   The number of allowed missed cleavages (default: '1') (min: '0')
  -min_length <number>         Minimum length of peptide (default: '6')
  -max_length <number>         Maximum length of peptide (default: '40')
  -enzyme <string>             The type of digestion enzyme (default: 'Trypsin') (valid: 'Clostripain/P', 
                               'elastase-trypsin-chymotrypsin', 'no cleavage', 'unspecific cleavage', 'Trypsi
                               n', 'Arg-C', 'Arg-C/P', 'Asp-N', 'Asp-N/B', 'staphylococcal protease/D', 'prol
                               ine-endopeptidase/HKR', 'Glu-C+P', 'PepsinA + P', 'cyanogen-bromide', 'leukocy
                               te elastase', 'proline endopeptidase', 'Asp-N_ambic', 'Chymotrypsin', 'Chymotr
                               ypsin/P', 'CNBr', 'Formic_acid', 'Lys-C', 'Lys-N', 'Lys-C/P', 'PepsinA', 'Tryp
                               Chymo', 'Trypsin/P', 'V8-DE', 'V8-E', 'glutamyl endopeptidase', 'Alpha-lytic 
                               protease', '2-iodobenzoate', 'iodosobenzoate')

Options for FASTA output files:
  -FASTA:ID <option>           Identifier to use for each peptide: copy from parent protein (parent); a conse
                               cutive number (number); parent ID + consecutive number (both) (default: 'paren
                               t') (valid: 'parent', 'number', 'both')
  -FASTA:description <option>  Keep or remove the (possibly lengthy) FASTA header description. Keeping it 
                               can increase resulting FASTA file significantly. (default: 'remove') (valid: 
                               'remove', 'keep')

  -replace_ambiguous           Replace ambiguous amino acids with a random unambiguous amino acid. This is 
                               useful for generating an output file that mimics a search engine result (since
                                they usually do not contain ambiguous amino acids).
                               
Common TOPP options:
  -ini <file>                  Use the given TOPP INI file
  -threads <n>                 Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>            Writes the default configuration file
  --help                       Shows options
  --helphelp                   Shows all options (including advanced)

INI file documentation of this tool:

Legend:

required parameter

advanced parameter

+DigestorDigests a protein database in-silico.

version3.6.0-pre-nightly-2026-01-31 Version of the tool that generated this parameters file.

++1Instance '1' section for 'Digestor'

in input fileinput file*.fasta

out Output file (peptides)output file*.idXML, *.fasta

out_type Set this if you cannot control the filename of 'out', e.g., in TOPPAS.idXML, fasta

missed_cleavages1 The number of allowed missed cleavages0:∞

min_length6 Minimum length of peptide

max_length40 Maximum length of peptide

enzymeTrypsin The type of digestion enzymeClostripain/P, elastase-trypsin-chymotrypsin, no cleavage, unspecific cleavage, Trypsin, Arg-C, Arg-C/P, Asp-N, Asp-N/B, staphylococcal protease/D, proline-endopeptidase/HKR, Glu-C+P, PepsinA + P, cyanogen-bromide, leukocyte elastase, proline endopeptidase, Asp-N_ambic, Chymotrypsin, Chymotrypsin/P, CNBr, Formic_acid, Lys-C, Lys-N, Lys-C/P, PepsinA, TrypChymo, Trypsin/P, V8-DE, V8-E, glutamyl endopeptidase, Alpha-lytic protease, 2-iodobenzoate, iodosobenzoate

replace_ambiguousfalse Replace ambiguous amino acids with a random unambiguous amino acid. This is useful for generating an output file that mimics a search engine result (since they usually do not contain ambiguous amino acids).true, false

log Name of log file (created only when specified)

debug0 Sets the debug level

threads1 Sets the number of threads allowed to be used by the TOPP tool

no_progressfalse Disables progress logging to command linetrue, false

forcefalse Overrides tool-specific checkstrue, false

testfalse Enables the test mode (needed for internal use only)true, false

+++FASTAOptions for FASTA output files

IDparent Identifier to use for each peptide: copy from parent protein (parent); a consecutive number (number); parent ID + consecutive number (both)parent, number, both

descriptionremove Keep or remove the (possibly lengthy) FASTA header description. Keeping it can increase resulting FASTA file significantly.remove, keep