OpenMS
TextExporter

This application converts several OpenMS XML formats (featureXML, consensusXML, and idXML) to text files.

potential predecessor tools → TextExporter → potential successor tools
almost any TOPP tool external tools (MS Excel, OpenOffice, Notepad)

The goal of this tool is to create output in a table format that is easily readable in Excel or OpenOffice. Lines in the output correspond to rows in the table; the individual columns are delineated by a separator, e.g. tab (default, TSV format) or comma (CSV format).

Output files begin with comment lines, starting with the special character "#". The last such line(s) will be a header with column names, but this may be preceded by more general comments.

Because the OpenMS XML formats contain different kinds of data in a hierarchical structure, TextExporter produces somewhat unusual TSV/CSV files for many inputs: Different lines in the output may belong to different types of data, and the number of columns and the meanings of the individual fields depend on the type. In such cases, the first column always contains an indicator (in capital letters) for the data type of the current line. In addition, some lines have to be understood relative to a previous line, if there is a hierarchical relationship in the data. (See below for details and examples.)

Missing values are represented by "-1" or "nan" in numeric fields and by blanks in character/text fields.

Depending on the input and the parameters, the output contains the following columns:

featureXML input:

  • first column: RUN / PROTEIN / UNASSIGNEDPEPTIDE / FEATURE / PEPTIDE (indicator for the type of data in the current row)
  • a RUN line contains information about a protein identification run; further columns: run_id, score_type, score_direction, data_time, search_engine_version, parameters
  • a PROTEIN line contains data of a protein identified in the previously listed run; further columns: score, rank, accession, coverage, sequence
  • an UNASSIGNEDPEPTIDE line contains data of peptide hit that was not assigned to any feature; further columns: rt, mz, score, rank, sequence, charge, aa_before, aa_after, score_type, search_identifier, accessions
  • a FEATURE line contains data of a single feature; further columns: rt, mz, intensity, charge, width, quality, rt_quality, mz_quality, rt_start, rt_end
  • a PEPTIDE line contains data of a peptide hit annotated to the previous feature; further columns: same as for UNASSIGNEDPEPTIDE

With the no_ids flag, only FEATURE lines (without the FEATURE indicator) are written.

With the feature:minimal flag, only the rt, mz, and intensity columns of FEATURE lines are written.

consensusXML input:

Output format produced for the out parameter:

  • first column: MAP / RUN / PROTEIN / UNASSIGNEDPEPTIDE / CONSENSUS / PEPTIDE (indicator for the type of data in the current row)
  • a MAP line contains information about a sub-map; further columns: id, filename, label, size (potentially followed by further columns containing meta data, depending on the input)
  • a CONSENSUS line contains data of a single consensus feature; further columns: rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf, rt_X0, mz_X0, ..., rt_X1, mz_X1, ...
  • "..._cf" columns refer to the consensus feature itself, "..._Xi" columns refer to a sub-feature from the map with ID "Xi" (no quality column in this case); missing sub-features are indicated by "nan" values
  • see above for the formats of RUN, PROTEIN, UNASSIGNEDPEPTIDE, PEPTIDE lines

With the no_ids flag, only MAP and CONSENSUS lines are written.

Output format produced for the consensus_centroids parameter:

  • one line per consensus centroid
  • columns: rt, mz, intensity, charge, width, quality

Output format produced for the consensus_elements parameter:

  • one line per sub-feature (element) of a consensus feature
  • first column: H / L (indicator for new/repeated element)
  • H indicates a new element, L indicates the replication of the first element of the current consensus feature (for plotting)
  • further columns: rt, mz, intensity, charge, width, rt_cf, mz_cf, intensity_cf, charge_cf, width_cf, quality_cf
  • "..._cf" columns refer to the consensus feature, the other columns refer to the sub-feature

With the consensus:add_metavalues flag, meta values for each consensus feature are written.

Output format produced for the consensus_features parameter:

  • one line per consensus feature (suitable for processing with e.g. R)
  • columns: same as for a CONSENSUS line above, followed by additional columns for identification data
  • additional columns: peptide_N0, n_diff_peptides_N0, protein_N0, n_diff_proteins_N0, peptide_N1, ...
  • "..._Ni" columns refer to the identification run with index "Ni", n_diff_... stands for "number of different ..."; different peptides/proteins in one column are separated by "/"

With the no_ids flag, the additional columns are not included.

idXML input:

  • first column: RUN / PROTEIN / PEPTIDE (indicator for the type of data in the current row)
  • see above for the formats of RUN, PROTEIN, PEPTIDE lines
  • additional column for PEPTIDE lines: predicted_rt (predicted retention time)
  • additional column for PEPTIDE lines: predicted_pt (predicted proteotypicity)

With the id:proteins_only flag, only RUN and PROTEIN lines are written.

With the id:peptides_only flag, only PEPTIDE lines (without the PEPTIDE indicator) are written.

With the id:first_dim_rt flag, the additional columns rt_first_dim and predicted_rt_first_dim are included for PEPTIDE lines.

Note
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.

The command line parameters of this tool are:

stty: 'standard input': Inappropriate ioctl for device

TextExporter -- Exports various XML formats to a text file.
Full documentation: http://www.openms.de/doxygen/nightly/html/TOPP_TextExporter.html
Version: 3.4.0-pre-nightly-2024-12-16 Dec 17 2024, 02:41:12, Revision: 96ad74c
To cite OpenMS:
 + Pfeuffer, J., Bielow, C., Wein, S. et al.. OpenMS 3 enables reproducible analysis of large-scale mass spectrometry data. Nat Methods (2024). doi:10.1038/s41592-024-02197-7.

Usage:
  TextExporter <options>

Options (mandatory options marked with '*'):
  -in <file>*                                     Input file  (valid formats: 'featureXML', 'consensusXML', 'idXML', 'mzML')
  -out <file>*                                    Output file. (valid formats: 'tsv', 'csv', 'txt')
  -out_type <type>                                Output file type -- default: determined from file extension, ambiguous file extensions are interpreted as tsv (valid: 'tsv', 'csv', 'txt')
  -replacement <string>                           Used to replace occurrences of the separator in strings before writing, if 'quoting' is 'none' (default: '_')
  -quoting <method>                               Method for quoting of strings: 'none' for no quoting, 'double' for quoting with doubling of embedded quotes,
                                                  'escape' for quoting with backslash-escaping of embedded quotes (default: 'none') (valid: 'none', 'double', 'escape')
  -no_ids                                         Suppresses output of identification data.
                                                  

Options for featureXML input files:
  -feature:minimal                                Set this flag to write only three attributes: RT, m/z, and intensity.
  -feature:add_metavalues <min_frequency>         Add columns for meta values which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1') (min: '-1' max: '100')

                                                  

Options for idXML input files:
  -id:proteins_only                               Set this flag if you want only protein information from an idXML file
  -id:peptides_only                               Set this flag if you want only peptide information from an idXML file
  -id:protein_groups                              Set this flag if you want to also write indist. group information from an idXML file
  -id:first_dim_rt                                If this flag is set the first_dim RT of the peptide hits will also be printed (if present).
  -id:add_metavalues <min_frequency>              Add columns for meta values of PeptideID (=spectrum) entries which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1') (min: '-1' max: '100')
  -id:add_hit_metavalues <min_frequency>          Add columns for meta values of PeptideHit (=PSM) entries which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1') (min: '-1' max: '100')
  -id:add_protein_hit_metavalues <min_frequency>  Add columns for meta values on protein level which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1') (min: '-1' max: '100')

                                                  

Options for consensusXML input files:
  -consensus:centroids <file>                     Output file for centroids of consensus features (valid formats: 'csv')
  -consensus:elements <file>                      Output file for elements of consensus features (valid formats: 'csv')
  -consensus:features <file>                      Output file for consensus features and contained elements from all maps (writes 'nan's if elements are missing) (valid formats: 'csv')
  -consensus:sorting_method <method>              Sorting options can be combined. The precedence is: sort_by_size, sort_by_maps, sorting_method (default: 'none') (valid: 'none', 'RT', 'MZ', 'RT_then_MZ', 'intensity', 'quality_decreasing', 'quality_increasing')
  -consensus:sort_by_maps                         Apply a stable sort by the covered maps, lexicographically
  -consensus:sort_by_size                         Apply a stable sort by decreasing size (i.e., the number of elements)
  -consensus:add_metavalues                       Add columns for ConsensusFeature meta values.

                                                  
Common TOPP options:
  -ini <file>                                     Use the given TOPP INI file
  -threads <n>                                    Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>                               Writes the default configuration file
  --help                                          Shows options
  --helphelp                                      Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+TextExporterExports various XML formats to a text file.
version3.4.0-pre-nightly-2024-12-16 Version of the tool that generated this parameters file.
++1Instance '1' section for 'TextExporter'
in Input file input file*.featureXML, *.consensusXML, *.idXML, *.mzML
out Output file.output file*.tsv, *.csv, *.txt
out_type Output file type -- default: determined from file extension, ambiguous file extensions are interpreted as tsvtsv, csv, txt
replacement_ Used to replace occurrences of the separator in strings before writing, if 'quoting' is 'none'
quotingnone Method for quoting of strings: 'none' for no quoting, 'double' for quoting with doubling of embedded quotes,
'escape' for quoting with backslash-escaping of embedded quotes
none, double, escape
no_idsfalse Suppresses output of identification data.true, false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++featureOptions for featureXML input files
minimalfalse Set this flag to write only three attributes: RT, m/z, and intensity.true, false
add_metavalues-1 Add columns for meta values which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default).-1:100
+++idOptions for idXML input files
proteins_onlyfalse Set this flag if you want only protein information from an idXML filetrue, false
peptides_onlyfalse Set this flag if you want only peptide information from an idXML filetrue, false
protein_groupsfalse Set this flag if you want to also write indist. group information from an idXML filetrue, false
first_dim_rtfalse If this flag is set the first_dim RT of the peptide hits will also be printed (if present).true, false
add_metavalues-1 Add columns for meta values of PeptideID (=spectrum) entries which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default).-1:100
add_hit_metavalues-1 Add columns for meta values of PeptideHit (=PSM) entries which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default).-1:100
add_protein_hit_metavalues-1 Add columns for meta values on protein level which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default).-1:100
+++consensusOptions for consensusXML input files
centroids Output file for centroids of consensus featuresoutput file*.csv
elements Output file for elements of consensus featuresoutput file*.csv
features Output file for consensus features and contained elements from all maps (writes 'nan's if elements are missing)output file*.csv
sorting_methodnone Sorting options can be combined. The precedence is: sort_by_size, sort_by_maps, sorting_methodnone, RT, MZ, RT_then_MZ, intensity, quality_decreasing, quality_increasing
sort_by_mapsfalse Apply a stable sort by the covered maps, lexicographicallytrue, false
sort_by_sizefalse Apply a stable sort by decreasing size (i.e., the number of elements)true, false
add_metavaluesfalse Add columns for ConsensusFeature meta values.true, false