OpenMS
Loading...
Searching...
No Matches
SiriusMSFile Class Reference

Writes a SIRIUS .ms file from an MSExperiment, optionally enriched with feature/adduct/formula annotations. More...

#include <OpenMS/ANALYSIS/ID/SiriusMSConverter.h>

Classes

class  AccessionInfo
 Source-file accession metadata of the input mzML, captured for the mzTab-M MS_RUN section. More...
 
class  CompoundInfo
 Per-compound metadata accumulated while writing the .ms file. More...
 

Static Public Member Functions

static void store (const MSExperiment &spectra, std::ofstream &os, const FeatureMapping::FeatureToMs2Indices &feature_mapping, const bool &feature_only, const int &isotope_pattern_iterations, const bool no_masstrace_info_isotope_pattern, std::vector< SiriusMSFile::CompoundInfo > &v_cmpinfo, const size_t &file_index)
 Write one mzML/featureXML pair to a SIRIUS .ms output stream.
 
static void saveFeatureCompoundInfoAsTSV (const std::vector< SiriusMSFile::CompoundInfo > &v_cmpinfo, const std::string &filename)
 Serialise the CompoundInfo records collected during store as a TSV file.
 

Static Protected Member Functions

static void writeMsFile_ (std::ofstream &os, const MSExperiment &spectra, const std::vector< size_t > &ms2_spectra_index, const SiriusMSFile::AccessionInfo &ainfo, const StringList &adducts, const std::vector< std::string > &v_description, const std::vector< std::string > &v_sumformula, const std::vector< std::pair< double, double > > &f_isotopes, int &feature_charge, uint64_t &feature_id, const double &feature_rt, const double &feature_mz, bool &writecompound, const bool &no_masstrace_info_isotope_pattern, const int &isotope_pattern_iterations, int &count_skipped_spectra, int &count_assume_mono, int &count_no_ms1, std::vector< SiriusMSFile::CompoundInfo > &v_cmpinfo, const size_t &file_index)
 Internal structure to write the .ms file (called in store function)
 
static Int getHighestIntensityPeakInMZRange_ (double test_mz, const MSSpectrum &spectrum, double tolerance, bool ppm)
 Return the index of the most-intense peak of spectrum whose m/z lies within a tolerance window around test_mz.
 
static std::vector< Peak1DextractPrecursorIsotopePattern_ (const double &precursor_mz, const MSSpectrum &precursor_spectrum, int &iterations, const int &charge)
 Walk an isotope ladder from the precursor m/z, picking the most-intense peak at each C12-C13 step.
 

Detailed Description

Writes a SIRIUS .ms file from an MSExperiment, optionally enriched with feature/adduct/formula annotations.

Used by SiriusExport to translate centroided MS2 data (and, optionally, feature annotations from FeatureFindingMetabo, MetaboliteAdductDecharger, and/or AccurateMassSearch) into the .ms compound format consumed by SIRIUS.

The writer chooses one of three layouts based on the input FeatureMapping::FeatureToMs2Indices

  • feature-driven (assignedMS2 non-empty): one compound block per feature, carrying its associated MS2 spectra plus adduct / formula / description metadata if those are present on the feature.
  • unassigned-MS2 (only when unassignedMS2 is non-empty and feature_only is false): an additional compound block per unassigned MS2 spectrum with UNKNOWN description / sumformula / adducts.
  • no-feature-information (both maps empty): every MS2 spectrum in spectra is emitted as its own UNKNOWN compound. This is the mzML-only fallback. The first two layouts can both fire in the same call (feature-driven followed by unassigned-MS2). For each compound emitted, a matching CompoundInfo entry is appended to v_cmpinfo for downstream mzTab-M export.

Constraints enforced during store


Class Documentation

◆ OpenMS::SiriusMSFile::AccessionInfo

class OpenMS::SiriusMSFile::AccessionInfo

Source-file accession metadata of the input mzML, captured for the mzTab-M MS_RUN section.

Collaboration diagram for SiriusMSFile::AccessionInfo:
[legend]
Class Members
string native_id_accession nativeID accession for mztab-m
string native_id_type nativeID type for mztab-m
string sf_accession sourcefile accessions for mztab-m
string sf_filename sourcefile name for mztab-m
string sf_path sourcefile path for mztab-m
string sf_type sourcefile type for mztab-m

◆ OpenMS::SiriusMSFile::CompoundInfo

class OpenMS::SiriusMSFile::CompoundInfo

Per-compound metadata accumulated while writing the .ms file.

One entry is appended for every compound block emitted by store. The same data is later serialised via saveFeatureCompoundInfoAsTSV and used by downstream tools to map SIRIUS results back to their originating spectra / features and to populate the mzTab-M SmallMolecule / SmallMoleculeFeature sections.

Collaboration diagram for SiriusMSFile::CompoundInfo:
[legend]
Class Members
int charge precursor/feature charge
string cmp query_id used compound in .ms file
string des description/name of the compound
string fid annotated feature_id (if available)
int file_index source file index >
double fmz annotated mass of a feature (if available)
string formula sumformula of the compound
string ionization adduct information
vector< string > m_ids native ids and identifier for multiple possible identification via AMS ("|" separator)
string m_ids_id concatenated list of native ids and identifier for multiple possible identification via AMS ("|" separator) used for mapping of compounds and the annotated spectrum.
vector< string > native_ids native ids of the associated spectra
string native_ids_id concatenated list of the associated spectra
double pint_mono parent/precursor intensity of the compound
double pmass parent/precursor mass of the compound
double rt retention time of the compound
vector< string > scan_indices index of the associated spectra
string source_file sourcefile for mztab-m
string source_format format of the sourcefile for mztab-m
string specref_format spectra ref format for mztab-m
vector< string > specrefs spectra reference for mztab-m

Member Function Documentation

◆ extractPrecursorIsotopePattern_()

static std::vector< Peak1D > extractPrecursorIsotopePattern_ ( const double &  precursor_mz,
const MSSpectrum precursor_spectrum,
int &  iterations,
const int &  charge 
)
staticprotected

Walk an isotope ladder from the precursor m/z, picking the most-intense peak at each C12-C13 step.

Used by store when no per-feature mass-trace information is available. The monoisotopic peak is located inside a fixed 10 ppm window around precursor_mz; subsequent traces are located at precursor_mz + k * C13C12_MASSDIFF_U / |charge| inside a 1 ppm window, stopping when iterations reaches zero or when no peak is found at the current step.

Parameters
[in]precursor_mzPrecursor m/z to start the ladder from.
[in]precursor_spectrumSpectrum to search.
[in,out]iterationsMaximum number of isotope-trace steps; decremented per successful step.
[in]chargePrecursor charge (used to scale the C12-C13 distance; 0 disables scaling).
Returns
Sequence of picked isotope peaks, starting with the monoisotopic peak (empty if not even that is found).

◆ getHighestIntensityPeakInMZRange_()

static Int getHighestIntensityPeakInMZRange_ ( double  test_mz,
const MSSpectrum spectrum,
double  tolerance,
bool  ppm 
)
staticprotected

Return the index of the most-intense peak of spectrum whose m/z lies within a tolerance window around test_mz.

Parameters
[in]test_mzTarget m/z; the search window is built around this value.
[in]spectrumSpectrum to scan.
[in]toleranceHalf-width of the tolerance window (in Da or ppm, depending on ppm).
[in]ppmIf true, tolerance is interpreted as ppm; otherwise as Da.
Returns
Index of the most-intense peak inside the window, or -1 if the window contains no peak.

◆ saveFeatureCompoundInfoAsTSV()

static void saveFeatureCompoundInfoAsTSV ( const std::vector< SiriusMSFile::CompoundInfo > &  v_cmpinfo,
const std::string &  filename 
)
static

Serialise the CompoundInfo records collected during store as a TSV file.

The file is written with a fixed 16-column header line in the order: cmp, file_index, pmass, pint_mono, rt, fmz, fid, formula, charge, ionization, des, specref_format, source_file, source_format, native_ids_id, m_ids_id. The columns native_ids and m_ids (the raw vector versions) are not written — only their already-concatenated _id forms.

Parameters
[in]v_cmpinfoCompound records to write (typically the vector populated by store).
[in]filenameDestination path of the TSV file.
Exceptions
std::runtime_errorif filename cannot be opened for writing.

◆ store()

static void store ( const MSExperiment spectra,
std::ofstream &  os,
const FeatureMapping::FeatureToMs2Indices feature_mapping,
const bool &  feature_only,
const int &  isotope_pattern_iterations,
const bool  no_masstrace_info_isotope_pattern,
std::vector< SiriusMSFile::CompoundInfo > &  v_cmpinfo,
const size_t &  file_index 
)
static

Write one mzML/featureXML pair to a SIRIUS .ms output stream.

Selects the .ms layout (feature-driven, unassigned-MS2, or no-feature-information) according to the contents of feature_mapping; see the class documentation for the branching rules. For every compound block emitted, a CompoundInfo record is appended to v_cmpinfo for downstream mzTab-M export. When adduct information is missing for a spectrum no adduct line is written — SIRIUS will then assume defaults.

The accession-CV term for the sourcefile is resolved by loading the PSI-MS OBO (CV/psi-ms.obo via File::find) and locating the term whose name matches the SourceFile type. The output stream os is written to but not closed; ownership remains with the caller.

Parameters
[in]spectraPeakmap from the input mzML. The first spectrum must be centroided.
[in]osOpen output stream the .ms text is written to.
[in]feature_mappingResult of FeatureMapping::assignMS2IndexToFeature; selects the output layout.
[in]feature_onlyIf true, MS2 spectra that are not associated with any feature are dropped instead of being emitted as additional UNKNOWN compounds.
[in]isotope_pattern_iterationsUpper bound on the number of isotope-trace peaks searched per precursor when no feature mass-traces are available.
[in]no_masstrace_info_isotope_patternIf true, fall back to spectrum-based isotope-pattern extraction when feature mass-traces are absent.
[in,out]v_cmpinfoReceives one CompoundInfo per emitted compound (appended; existing entries preserved).
[in]file_indexNumeric identifier mixed into compound IDs to disambiguate entries from different mzML files.
Exceptions
OpenMS::Exception::IllegalArgumentif spectra contains profile data (centroiding is required).
OpenMS::Exception::IllegalArgumentif spectra carries no SourceFile annotation.
Note
SIRIUS supports only singly charged precursors: features (and spectra) with |charge| > 1 are skipped. A summary of the skipped/assumed counts is emitted via OPENMS_LOG_WARN at the end of the call (even when all counts are zero).

◆ writeMsFile_()

static void writeMsFile_ ( std::ofstream &  os,
const MSExperiment spectra,
const std::vector< size_t > &  ms2_spectra_index,
const SiriusMSFile::AccessionInfo ainfo,
const StringList adducts,
const std::vector< std::string > &  v_description,
const std::vector< std::string > &  v_sumformula,
const std::vector< std::pair< double, double > > &  f_isotopes,
int &  feature_charge,
uint64_t &  feature_id,
const double &  feature_rt,
const double &  feature_mz,
bool &  writecompound,
const bool &  no_masstrace_info_isotope_pattern,
const int &  isotope_pattern_iterations,
int &  count_skipped_spectra,
int &  count_assume_mono,
int &  count_no_ms1,
std::vector< SiriusMSFile::CompoundInfo > &  v_cmpinfo,
const size_t &  file_index 
)
staticprotected

Internal structure to write the .ms file (called in store function)

Parameters
[out]osstream
[in]spectraspectra
[in]ms2_spectra_indexvector of index ms2 spectra (in feature)
[in]ainfoaccession information
[in]adductsvector of adducts
[in]v_descriptionvector of descriptions
[in]v_sumformulavector of sumformulas
[in]f_isotopesisotope pattern of the feature
[in]feature_chargefeature charge
[in]feature_idfeature id
[in]feature_rtfeatures retention time
[in]feature_mzfeature mass to charge
[out]writecompoundbool if new compound should be written in .ms file
[in]no_masstrace_info_isotope_patternbool if isotope pattern should be extracted (if not in feature)
[in]isotope_pattern_iterationsnumber of iterations (trying to find a C13 pattern)
[in]count_skipped_spectracount number of skipped spectra
[in]count_assume_monocount number of features where mono charge was assumed
[in]count_no_ms1count number of compounds without a valid ms1 spectrum
[in]v_cmpinfovector of CompoundInfo
[in]file_indexfile index (to differentiate entries derived from different mzML files and resolve ambiguities)