Writes a SIRIUS .ms file from an MSExperiment, optionally enriched with feature/adduct/formula annotations.
More...
#include <OpenMS/ANALYSIS/ID/SiriusMSConverter.h>
|
| class | AccessionInfo |
| | Source-file accession metadata of the input mzML, captured for the mzTab-M MS_RUN section. More...
|
| |
| class | CompoundInfo |
| | Per-compound metadata accumulated while writing the .ms file. More...
|
| |
|
| static void | writeMsFile_ (std::ofstream &os, const MSExperiment &spectra, const std::vector< size_t > &ms2_spectra_index, const SiriusMSFile::AccessionInfo &ainfo, const StringList &adducts, const std::vector< std::string > &v_description, const std::vector< std::string > &v_sumformula, const std::vector< std::pair< double, double > > &f_isotopes, int &feature_charge, uint64_t &feature_id, const double &feature_rt, const double &feature_mz, bool &writecompound, const bool &no_masstrace_info_isotope_pattern, const int &isotope_pattern_iterations, int &count_skipped_spectra, int &count_assume_mono, int &count_no_ms1, std::vector< SiriusMSFile::CompoundInfo > &v_cmpinfo, const size_t &file_index) |
| | Internal structure to write the .ms file (called in store function)
|
| |
| static Int | getHighestIntensityPeakInMZRange_ (double test_mz, const MSSpectrum &spectrum, double tolerance, bool ppm) |
| | Return the index of the most-intense peak of spectrum whose m/z lies within a tolerance window around test_mz.
|
| |
| static std::vector< Peak1D > | extractPrecursorIsotopePattern_ (const double &precursor_mz, const MSSpectrum &precursor_spectrum, int &iterations, const int &charge) |
| | Walk an isotope ladder from the precursor m/z, picking the most-intense peak at each C12-C13 step.
|
| |
Writes a SIRIUS .ms file from an MSExperiment, optionally enriched with feature/adduct/formula annotations.
Used by SiriusExport to translate centroided MS2 data (and, optionally, feature annotations from FeatureFindingMetabo, MetaboliteAdductDecharger, and/or AccurateMassSearch) into the .ms compound format consumed by SIRIUS.
The writer chooses one of three layouts based on the input FeatureMapping::FeatureToMs2Indices —
- feature-driven (
assignedMS2 non-empty): one compound block per feature, carrying its associated MS2 spectra plus adduct / formula / description metadata if those are present on the feature.
- unassigned-MS2 (only when
unassignedMS2 is non-empty and feature_only is false): an additional compound block per unassigned MS2 spectrum with UNKNOWN description / sumformula / adducts.
- no-feature-information (both maps empty): every MS2 spectrum in
spectra is emitted as its own UNKNOWN compound. This is the mzML-only fallback. The first two layouts can both fire in the same call (feature-driven followed by unassigned-MS2). For each compound emitted, a matching CompoundInfo entry is appended to v_cmpinfo for downstream mzTab-M export.
Constraints enforced during store —
◆ OpenMS::SiriusMSFile::AccessionInfo
| class OpenMS::SiriusMSFile::AccessionInfo |
Source-file accession metadata of the input mzML, captured for the mzTab-M MS_RUN section.
| Class Members |
|
string |
native_id_accession |
nativeID accession for mztab-m |
|
string |
native_id_type |
nativeID type for mztab-m |
|
string |
sf_accession |
sourcefile accessions for mztab-m |
|
string |
sf_filename |
sourcefile name for mztab-m |
|
string |
sf_path |
sourcefile path for mztab-m |
|
string |
sf_type |
sourcefile type for mztab-m |
◆ OpenMS::SiriusMSFile::CompoundInfo
| class OpenMS::SiriusMSFile::CompoundInfo |
Per-compound metadata accumulated while writing the .ms file.
One entry is appended for every compound block emitted by store. The same data is later serialised via saveFeatureCompoundInfoAsTSV and used by downstream tools to map SIRIUS results back to their originating spectra / features and to populate the mzTab-M SmallMolecule / SmallMoleculeFeature sections.
| Class Members |
|
int |
charge |
precursor/feature charge |
|
string |
cmp |
query_id used compound in .ms file |
|
string |
des |
description/name of the compound |
|
string |
fid |
annotated feature_id (if available) |
|
int |
file_index |
source file index > |
|
double |
fmz |
annotated mass of a feature (if available) |
|
string |
formula |
sumformula of the compound |
|
string |
ionization |
adduct information |
|
vector< string > |
m_ids |
native ids and identifier for multiple possible identification via AMS ("|" separator) |
|
string |
m_ids_id |
concatenated list of native ids and identifier for multiple possible identification via AMS ("|" separator) used for mapping of compounds and the annotated spectrum. |
|
vector< string > |
native_ids |
native ids of the associated spectra |
|
string |
native_ids_id |
concatenated list of the associated spectra |
|
double |
pint_mono |
parent/precursor intensity of the compound |
|
double |
pmass |
parent/precursor mass of the compound |
|
double |
rt |
retention time of the compound |
|
vector< string > |
scan_indices |
index of the associated spectra |
|
string |
source_file |
sourcefile for mztab-m |
|
string |
source_format |
format of the sourcefile for mztab-m |
|
string |
specref_format |
spectra ref format for mztab-m |
|
vector< string > |
specrefs |
spectra reference for mztab-m |
◆ extractPrecursorIsotopePattern_()
| static std::vector< Peak1D > extractPrecursorIsotopePattern_ |
( |
const double & |
precursor_mz, |
|
|
const MSSpectrum & |
precursor_spectrum, |
|
|
int & |
iterations, |
|
|
const int & |
charge |
|
) |
| |
|
staticprotected |
Walk an isotope ladder from the precursor m/z, picking the most-intense peak at each C12-C13 step.
Used by store when no per-feature mass-trace information is available. The monoisotopic peak is located inside a fixed 10 ppm window around precursor_mz; subsequent traces are located at precursor_mz + k * C13C12_MASSDIFF_U / |charge| inside a 1 ppm window, stopping when iterations reaches zero or when no peak is found at the current step.
- Parameters
-
| [in] | precursor_mz | Precursor m/z to start the ladder from. |
| [in] | precursor_spectrum | Spectrum to search. |
| [in,out] | iterations | Maximum number of isotope-trace steps; decremented per successful step. |
| [in] | charge | Precursor charge (used to scale the C12-C13 distance; 0 disables scaling). |
- Returns
- Sequence of picked isotope peaks, starting with the monoisotopic peak (empty if not even that is found).
◆ getHighestIntensityPeakInMZRange_()
| static Int getHighestIntensityPeakInMZRange_ |
( |
double |
test_mz, |
|
|
const MSSpectrum & |
spectrum, |
|
|
double |
tolerance, |
|
|
bool |
ppm |
|
) |
| |
|
staticprotected |
Return the index of the most-intense peak of spectrum whose m/z lies within a tolerance window around test_mz.
- Parameters
-
| [in] | test_mz | Target m/z; the search window is built around this value. |
| [in] | spectrum | Spectrum to scan. |
| [in] | tolerance | Half-width of the tolerance window (in Da or ppm, depending on ppm). |
| [in] | ppm | If true, tolerance is interpreted as ppm; otherwise as Da. |
- Returns
- Index of the most-intense peak inside the window, or
-1 if the window contains no peak.
◆ saveFeatureCompoundInfoAsTSV()
| static void saveFeatureCompoundInfoAsTSV |
( |
const std::vector< SiriusMSFile::CompoundInfo > & |
v_cmpinfo, |
|
|
const std::string & |
filename |
|
) |
| |
|
static |
Serialise the CompoundInfo records collected during store as a TSV file.
The file is written with a fixed 16-column header line in the order: cmp, file_index, pmass, pint_mono, rt, fmz, fid, formula, charge, ionization, des, specref_format, source_file, source_format, native_ids_id, m_ids_id. The columns native_ids and m_ids (the raw vector versions) are not written — only their already-concatenated _id forms.
- Parameters
-
| [in] | v_cmpinfo | Compound records to write (typically the vector populated by store). |
| [in] | filename | Destination path of the TSV file. |
- Exceptions
-
| std::runtime_error | if filename cannot be opened for writing. |
◆ store()
Write one mzML/featureXML pair to a SIRIUS .ms output stream.
Selects the .ms layout (feature-driven, unassigned-MS2, or no-feature-information) according to the contents of feature_mapping; see the class documentation for the branching rules. For every compound block emitted, a CompoundInfo record is appended to v_cmpinfo for downstream mzTab-M export. When adduct information is missing for a spectrum no adduct line is written — SIRIUS will then assume defaults.
The accession-CV term for the sourcefile is resolved by loading the PSI-MS OBO (CV/psi-ms.obo via File::find) and locating the term whose name matches the SourceFile type. The output stream os is written to but not closed; ownership remains with the caller.
- Parameters
-
| [in] | spectra | Peakmap from the input mzML. The first spectrum must be centroided. |
| [in] | os | Open output stream the .ms text is written to. |
| [in] | feature_mapping | Result of FeatureMapping::assignMS2IndexToFeature; selects the output layout. |
| [in] | feature_only | If true, MS2 spectra that are not associated with any feature are dropped instead of being emitted as additional UNKNOWN compounds. |
| [in] | isotope_pattern_iterations | Upper bound on the number of isotope-trace peaks searched per precursor when no feature mass-traces are available. |
| [in] | no_masstrace_info_isotope_pattern | If true, fall back to spectrum-based isotope-pattern extraction when feature mass-traces are absent. |
| [in,out] | v_cmpinfo | Receives one CompoundInfo per emitted compound (appended; existing entries preserved). |
| [in] | file_index | Numeric identifier mixed into compound IDs to disambiguate entries from different mzML files. |
- Exceptions
-
- Note
- SIRIUS supports only singly charged precursors: features (and spectra) with
|charge| > 1 are skipped. A summary of the skipped/assumed counts is emitted via OPENMS_LOG_WARN at the end of the call (even when all counts are zero).
◆ writeMsFile_()
| static void writeMsFile_ |
( |
std::ofstream & |
os, |
|
|
const MSExperiment & |
spectra, |
|
|
const std::vector< size_t > & |
ms2_spectra_index, |
|
|
const SiriusMSFile::AccessionInfo & |
ainfo, |
|
|
const StringList & |
adducts, |
|
|
const std::vector< std::string > & |
v_description, |
|
|
const std::vector< std::string > & |
v_sumformula, |
|
|
const std::vector< std::pair< double, double > > & |
f_isotopes, |
|
|
int & |
feature_charge, |
|
|
uint64_t & |
feature_id, |
|
|
const double & |
feature_rt, |
|
|
const double & |
feature_mz, |
|
|
bool & |
writecompound, |
|
|
const bool & |
no_masstrace_info_isotope_pattern, |
|
|
const int & |
isotope_pattern_iterations, |
|
|
int & |
count_skipped_spectra, |
|
|
int & |
count_assume_mono, |
|
|
int & |
count_no_ms1, |
|
|
std::vector< SiriusMSFile::CompoundInfo > & |
v_cmpinfo, |
|
|
const size_t & |
file_index |
|
) |
| |
|
staticprotected |
Internal structure to write the .ms file (called in store function)
- Parameters
-
| [out] | os | stream |
| [in] | spectra | spectra |
| [in] | ms2_spectra_index | vector of index ms2 spectra (in feature) |
| [in] | ainfo | accession information |
| [in] | adducts | vector of adducts |
| [in] | v_description | vector of descriptions |
| [in] | v_sumformula | vector of sumformulas |
| [in] | f_isotopes | isotope pattern of the feature |
| [in] | feature_charge | feature charge |
| [in] | feature_id | feature id |
| [in] | feature_rt | features retention time |
| [in] | feature_mz | feature mass to charge |
| [out] | writecompound | bool if new compound should be written in .ms file |
| [in] | no_masstrace_info_isotope_pattern | bool if isotope pattern should be extracted (if not in feature) |
| [in] | isotope_pattern_iterations | number of iterations (trying to find a C13 pattern) |
| [in] | count_skipped_spectra | count number of skipped spectra |
| [in] | count_assume_mono | count number of features where mono charge was assumed |
| [in] | count_no_ms1 | count number of compounds without a valid ms1 spectrum |
| [in] | v_cmpinfo | vector of CompoundInfo |
| [in] | file_index | file index (to differentiate entries derived from different mzML files and resolve ambiguities) |