OpenMS
Loading...
Searching...
No Matches
OPXLSpectrumProcessingAlgorithms Class Reference

Spectrum preprocessing and theoretical-vs-experimental peak alignment helpers used by the OpenPepXL cross-link search engines. More...

#include <OpenMS/ANALYSIS/XLMS/OPXLSpectrumProcessingAlgorithms.h>

Static Public Member Functions

static PeakSpectrum mergeAnnotatedSpectra (PeakSpectrum &first_spectrum, PeakSpectrum &second_spectrum)
 Merge two annotated spectra into one peak list, preserving paired DataArrays.
 
static PeakMap preprocessSpectra (PeakMap &exp, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm, Size peptide_min_size, Int min_precursor_charge, Int max_precursor_charge, bool deisotope, bool labeled)
 Preprocess an MSExperiment for cross-link search and return the surviving MS2 spectra.
 
static void getSpectrumAlignmentFastCharge (std::vector< std::pair< Size, Size > > &alignment, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm, const PeakSpectrum &theo_spectrum, const PeakSpectrum &exp_spectrum, const DataArrays::IntegerDataArray &theo_charges, const DataArrays::IntegerDataArray &exp_charges, DataArrays::FloatDataArray &ppm_error_array, double intensity_cutoff=0.0)
 Align a theoretical and an experimental fragment spectrum using charge annotations and an intensity-ratio cut-off.
 
static void getSpectrumAlignmentSimple (std::vector< std::pair< Size, Size > > &alignment, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm, const std::vector< SimpleTSGXLMS::SimplePeak > &theo_spectrum, const PeakSpectrum &exp_spectrum, const DataArrays::IntegerDataArray &exp_charges)
 Align a SimplePeak-based theoretical spectrum to an experimental spectrum using charge annotations only.
 

Detailed Description

Spectrum preprocessing and theoretical-vs-experimental peak alignment helpers used by the OpenPepXL cross-link search engines.

Static utilities (the class carries no state) that the cross-link identification workflows in OpenPepXL build on:

  • Pre-filter an MS2 dataset before searching — drop spectra with too few peaks or with precursor charges outside the configured range, normalise intensities, and (optionally) deisotope.
  • Merge two annotated spectra into a single peak list while preserving paired FloatDataArray / StringDataArray / IntegerDataArray annotations.
  • Align a theoretical and an experimental fragment spectrum honouring per-peak charge annotations, with optional intensity-ratio filtering.

Member Function Documentation

◆ getSpectrumAlignmentFastCharge()

static void getSpectrumAlignmentFastCharge ( std::vector< std::pair< Size, Size > > &  alignment,
double  fragment_mass_tolerance,
bool  fragment_mass_tolerance_unit_ppm,
const PeakSpectrum theo_spectrum,
const PeakSpectrum exp_spectrum,
const DataArrays::IntegerDataArray theo_charges,
const DataArrays::IntegerDataArray exp_charges,
DataArrays::FloatDataArray ppm_error_array,
double  intensity_cutoff = 0.0 
)
static

Align a theoretical and an experimental fragment spectrum using charge annotations and an intensity-ratio cut-off.

For each theoretical peak, the closest experimental peak inside the mass-tolerance window is picked, restricted to peaks whose charge and intensity also pass the per-peak filters.

Tolerance window half-width: theo_mz * fragment_mass_tolerance * 1e-6 when fragment_mass_tolerance_unit_ppm is true, else fragment_mass_tolerance Da.

Charge filter: a pair (theoretical charge tz, experimental charge ez) matches when tz == ez or either side is 0 (treated as "unknown"). If theo_charges or exp_charges is empty, the charge filter degrades to permissive.

Intensity filter: a pair (theoretical intensity ti, experimental intensity ei) matches when min(ti,ei) / max(ti,ei) > intensity_cutoff. Pass 0 to disable.

Both spectra must be sorted by m/z; alignment and ppm_error_array must be empty on entry (precondition). When either spectrum is empty, the function returns with no output.

Parameters
[out]alignmentReceives (theo-index, exp-index) match pairs. Must be empty on entry.
[in]fragment_mass_toleranceTolerance window half-width.
[in]fragment_mass_tolerance_unit_ppmInterpret fragment_mass_tolerance as ppm (true) or Da (false).
[in]theo_spectrumTheoretical spectrum (sorted by m/z).
[in]exp_spectrumExperimental spectrum (sorted by m/z).
[in]theo_chargesPer-peak charges for theo_spectrum; an empty array disables the charge filter.
[in]exp_chargesPer-peak charges for exp_spectrum; an empty array disables the charge filter.
[out]ppm_error_arrayReceives per-match ppm errors (exp_mz - theo_mz) / theo_mz * 1e6. Must be empty on entry.
[in]intensity_cutoffMinimum smaller-over-larger intensity ratio for a match; 0 disables the intensity filter.

◆ getSpectrumAlignmentSimple()

static void getSpectrumAlignmentSimple ( std::vector< std::pair< Size, Size > > &  alignment,
double  fragment_mass_tolerance,
bool  fragment_mass_tolerance_unit_ppm,
const std::vector< SimpleTSGXLMS::SimplePeak > &  theo_spectrum,
const PeakSpectrum exp_spectrum,
const DataArrays::IntegerDataArray exp_charges 
)
static

Align a SimplePeak-based theoretical spectrum to an experimental spectrum using charge annotations only.

Mirror of getSpectrumAlignmentFastCharge but without the intensity-ratio filter and without ppm-error output, and with the theoretical side expressed as a vector<SimpleTSGXLMS::SimplePeak> (charges carried per peak inside the SimplePeak struct, not as a separate DataArray). alignment is cleared on entry — it does not need to start empty.

Charge filter rule is identical to the fast-charge variant: (theo_charge == exp_charge or either side is 0) matches. An empty exp_charges array makes the charge filter permissive.

Parameters
[out]alignmentReceives (theo-index, exp-index) match pairs. Cleared on entry.
[in]fragment_mass_toleranceTolerance window half-width.
[in]fragment_mass_tolerance_unit_ppmInterpret fragment_mass_tolerance as ppm (true) or Da (false).
[in]theo_spectrumTheoretical spectrum (vector of SimplePeak; per-peak m/z and charge).
[in]exp_spectrumExperimental spectrum (sorted by m/z).
[in]exp_chargesPer-peak charges for exp_spectrum; an empty array disables the charge filter.

◆ mergeAnnotatedSpectra()

static PeakSpectrum mergeAnnotatedSpectra ( PeakSpectrum first_spectrum,
PeakSpectrum second_spectrum 
)
static

Merge two annotated spectra into one peak list, preserving paired DataArrays.

Peaks of first_spectrum and second_spectrum are concatenated and the result is sorted by m/z. For each kind of DataArray (Float / String / Integer) the i-th array of first_spectrum is paired with the i-th array of second_spectrum and their contents are concatenated; the output array inherits its name from first_spectrum's i-th array. Extra arrays present only in second_spectrum are dropped — pairing is positional, not by name.

Despite the non-const references in the signature, neither input is modified.

Parameters
[in,out]first_spectrumSpectrum whose DataArray names and ordering define the output.
[in,out]second_spectrumSpectrum whose peaks and (positionally paired) DataArrays are appended.
Returns
Merged spectrum, sorted by m/z.

◆ preprocessSpectra()

static PeakMap preprocessSpectra ( PeakMap exp,
double  fragment_mass_tolerance,
bool  fragment_mass_tolerance_unit_ppm,
Size  peptide_min_size,
Int  min_precursor_charge,
Int  max_precursor_charge,
bool  deisotope,
bool  labeled 
)
static

Preprocess an MSExperiment for cross-link search and return the surviving MS2 spectra.

Two phases — first the input exp is modified in place: zero-intensity peaks are removed (ThresholdMower), intensities are normalised, and the spectra are sorted by retention time. Then the MS2 spectra are iterated (OpenMP-parallelised loop) and those that pass the per-spectrum filters are copied into a freshly built PeakMap that is returned. MS1 spectra are not copied to the output.

For unlabeled data (labeled is false) a spectrum is retained only if it has a single precursor whose charge lies in [min_precursor_charge, max_precursor_charge] and at least 2 * peptide_min_size peaks. Such spectra are further reduced by a WindowMower with hardcoded settings (window size 100, keep 20 peaks per window, "jump" mode). The final peak count must again exceed 2 * peptide_min_size.

For labeled data (labeled is true) the precursor and peak-count filters are bypassed, the WindowMower is not applied, and every MS2 spectrum of the input is present in the output — keeping spectrum indices stable across the heavy/light pairing performed downstream via consensusXML.

When deisotope is true, each surviving spectrum is run through Deisotoper::deisotopeAndSingleCharge with a simple averagine model, charge range [1,7], isopeak counts in [3,10]; charge and isotopic-peak counts are annotated and monoisotopic intensity is summed. The deisotoped result is kept only if it still exceeds the post-filter peak count or labeled is true.

Parameters
[in,out]expInput data (MS1 + MS2). Modified in place.
[in]fragment_mass_tolerancePeak mass tolerance used by deisotoping (ignored if deisotope is false).
[in]fragment_mass_tolerance_unit_ppmInterpret fragment_mass_tolerance as ppm (true) or Da (false).
[in]peptide_min_sizeLower bound on peak count: spectra must have at least 2 * peptide_min_size peaks both before and after the WindowMower / Deisotoper step.
[in]min_precursor_chargeMinimum allowed precursor charge for unlabeled data.
[in]max_precursor_chargeMaximum allowed precursor charge for unlabeled data.
[in]deisotopeIf true, deisotope each surviving spectrum.
[in]labeledIf true, bypass precursor/peak-count filters and the WindowMower, keeping every MS2 spectrum.
Returns
PeakMap containing only the preprocessed MS2 spectra that passed all filters; for labeled inputs this contains every input MS2 spectrum.