OpenMS
2.7.0
|
This application converts several OpenMS XML formats (featureXML, consensusXML, and idXML) to text files.
potential predecessor tools | TextExporter | potential successor tools |
almost any TOPP tool | external tools (MS Excel, OpenOffice, Notepad) |
The goal of this tool is to create output in a table format that is easily readable in Excel or OpenOffice. Lines in the output correspond to rows in the table; the individual columns are delineated by a separator, e.g. tab (default, TSV format) or comma (CSV format).
Output files begin with comment lines, starting with the special character "#". The last such line(s) will be a header with column names, but this may be preceded by more general comments.
Because the OpenMS XML formats contain different kinds of data in a hierarchical structure, TextExporter produces somewhat unusual TSV/CSV files for many inputs: Different lines in the output may belong to different types of data, and the number of columns and the meanings of the individual fields depend on the type. In such cases, the first column always contains an indicator (in capital letters) for the data type of the current line. In addition, some lines have to be understood relative to a previous line, if there is a hierarchical relationship in the data. (See below for details and examples.)
Missing values are represented by "-1" or "nan" in numeric fields and by blanks in character/text fields.
Depending on the input and the parameters, the output contains the following columns:
featureXML input:
RUN
/ PROTEIN
/ UNASSIGNEDPEPTIDE
/ FEATURE
/ PEPTIDE
(indicator for the type of data in the current row)RUN
line contains information about a protein identification run; further columns: run_id
, score_type
, score_direction
, data_time
, search_engine_version
, parameters
PROTEIN
line contains data of a protein identified in the previously listed run; further columns: score
, rank
, accession
, coverage
, sequence
UNASSIGNEDPEPTIDE
line contains data of peptide hit that was not assigned to any feature; further columns: rt
, mz
, score
, rank
, sequence
, charge
, aa_before
, aa_after
, score_type
, search_identifier
, accessions
FEATURE
line contains data of a single feature; further columns: rt
, mz
, intensity
, charge
, width
, quality
, rt_quality
, mz_quality
, rt_start
, rt_end
PEPTIDE
line contains data of a peptide hit annotated to the previous feature; further columns: same as for UNASSIGNEDPEPTIDE
With the no_ids
flag, only FEATURE
lines (without the FEATURE
indicator) are written.
With the feature:minimal
flag, only the rt
, mz
, and intensity
columns of FEATURE
lines are written.
consensusXML input:
Output format produced for the out
parameter:
MAP
/ RUN
/ PROTEIN
/ UNASSIGNEDPEPTIDE
/ CONSENSUS
/ PEPTIDE
(indicator for the type of data in the current row)MAP
line contains information about a sub-map; further columns: id
, filename
, label
, size
(potentially followed by further columns containing meta data, depending on the input)CONSENSUS
line contains data of a single consensus feature; further columns: rt_cf
, mz_cf
, intensity_cf
, charge_cf
, width_cf
, quality_cf
, rt_X0
, mz_X0
, ..., rt_X1, mz_X1, ..."..._cf"
columns refer to the consensus feature itself, "..._Xi"
columns refer to a sub-feature from the map with ID "Xi" (no quality
column in this case); missing sub-features are indicated by "nan" valuesRUN
, PROTEIN
, UNASSIGNEDPEPTIDE
, PEPTIDE
linesWith the no_ids
flag, only MAP
and CONSENSUS
lines are written.
Output format produced for the consensus_centroids
parameter:
rt
, mz
, intensity
, charge
, width
, quality
Output format produced for the consensus_elements
parameter:
H
/ L
(indicator for new/repeated element)H
indicates a new element, L
indicates the replication of the first element of the current consensus feature (for plotting)rt
, mz
, intensity
, charge
, width
, rt_cf
, mz_cf
, intensity_cf
, charge_cf
, width_cf
, quality_cf
"..._cf"
columns refer to the consensus feature, the other columns refer to the sub-featureOutput format produced for the consensus_features
parameter:
CONSENSUS
line above, followed by additional columns for identification datapeptide_N0
, n_diff_peptides_N0
, protein_N0
, n_diff_proteins_N0
, peptide_N1
, ..."..._Ni"
columns refer to the identification run with index "Ni", n_diff_
... stands for "number of different ..."; different peptides/proteins in one column are separated by "/"With the no_ids
flag, the additional columns are not included.
idXML input:
RUN
/ PROTEIN
/ PEPTIDE
(indicator for the type of data in the current row)RUN
, PROTEIN
, PEPTIDE
linesPEPTIDE
lines: predicted_rt
(predicted retention time)PEPTIDE
lines: predicted_pt
(predicted proteotypicity)With the id:proteins_only
flag, only RUN
and PROTEIN
lines are written.
With the id:peptides_only
flag, only PEPTIDE
lines (without the PEPTIDE
indicator) are written.
With the id:first_dim_rt
flag, the additional columns rt_first_dim
and predicted_rt_first_dim
are included for PEPTIDE
lines.
The command line parameters of this tool are:
TextExporter -- Exports various XML formats to a text file. Full documentation: http://www.openms.de/doxygen/release/2.7.0/html/TOPP_TextExporter.html Version: 2.7.0 Sep 13 2021, 20:58:47, Revision: 9110e58 To cite OpenMS: Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959. Usage: TextExporter <options> Options (mandatory options marked with '*'): -in <file>* Input file (valid formats: 'featureXML', 'consensusXML', 'idXML', 'mzML') -out <file>* Output file. (valid formats: 'tsv', 'csv', 'txt') -out_type <type> Output file type -- default: determined from file extension , ambiguous file extensions are interpreted as tsv (valid: 'tsv', 'csv', 'txt') -replacement <string> Used to replace occurrences of the separator in strings before writing, if 'quoting' is 'none' (default: '_') -quoting <method> Method for quoting of strings: 'none' for no quoting, 'doub le' for quoting with doubling of embedded quotes, 'escape' for quoting with backslash-escaping of embedded q uotes (default: 'none' valid: 'none', 'double', 'escape') -no_ids Suppresses output of identification data. Options for featureXML input files: -feature:minimal Set this flag to write only three attributes: RT, m/z, and intensity. -feature:add_metavalues <min_frequency> Add columns for meta values which occur with a certain freq uency (0-100%). Set to -1 to omit meta values (default). (default: '-1' min: '-1' max: '100') Options for idXML input files: -id:proteins_only Set this flag if you want only protein information from an idXML file -id:peptides_only Set this flag if you want only peptide information from an idXML file -id:protein_groups Set this flag if you want to also write indist. group infor mation from an idXML file -id:first_dim_rt If this flag is set the first_dim RT of the peptide hits will also be printed (if present). -id:add_metavalues <min_frequency> Add columns for meta values of PeptideID (=spectrum) entrie s which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1' min: '-1' max: '100') -id:add_hit_metavalues <min_frequency> Add columns for meta values of PeptideHit (=PSM) entries which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1' min: '-1' max: '100') -id:add_protein_hit_metavalues <min_frequency> Add columns for meta values on protein level which occur with a certain frequency (0-100%). Set to -1 to omit meta values (default). (default: '-1' min: '-1' max: '100') Options for consensusXML input files: -consensus:centroids <file> Output file for centroids of consensus features (valid form ats: 'csv') -consensus:elements <file> Output file for elements of consensus features (valid forma ts: 'csv') -consensus:features <file> Output file for consensus features and contained elements from all maps (writes 'nan's if elements are missing) (vali d formats: 'csv') -consensus:sorting_method <method> Sorting options can be combined. The precedence is: sort_by _size, sort_by_maps, sorting_method (default: 'none' valid: 'none', 'RT', 'MZ', 'RT_then_MZ', 'intensity', 'quality_de creasing', 'quality_increasing') -consensus:sort_by_maps Apply a stable sort by the covered maps, lexicographically -consensus:sort_by_size Apply a stable sort by decreasing size (i.e., the number of elements) Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced)
INI file documentation of this tool: