OpenMS
Loading...
Searching...
No Matches
NuXLFDR Class Reference

Class-separated FDR calculation for nucleic-acid cross-links (NuXL). More...

#include <OpenMS/ANALYSIS/NUXL/NuXLFDR.h>

Collaboration diagram for NuXLFDR:
[legend]

Public Member Functions

 NuXLFDR (size_t report_top_hits)
 Construct with a top-hit cap that controls how FalseDiscoveryRate is parametrised.
 
void splitIntoPeptidesAndXLs (const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep_pi, PeptideIdentificationList &xl_pi) const
 Partition peptide_ids into plain-peptide and cross-link PSM lists.
 
void mergePeptidesAndXLs (const PeptideIdentificationList &pep_pi, const PeptideIdentificationList &xl_pi, PeptideIdentificationList &peptide_ids) const
 Re-merge a previously split (splitIntoPeptidesAndXLs) pair of lists into one PSM list.
 
void QValueAtPSMLevel (PeptideIdentificationList &peptide_ids) const
 Compute PSM-level q-values across peptide_ids without splitting by class.
 
void calculatePeptideAndXLQValueAtPSMLevel (const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep_pi, PeptideIdentificationList &xl_pi) const
 Compute PSM- and peptide-level q-values separately for plain and cross-linked PSMs.
 
void calculatePeptideAndXLQValueAndFilterAtPSMLevel (const std::vector< ProteinIdentification > &protein_ids, const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep, double peptide_PSM_qvalue_threshold, double peptide_peptide_qvalue_threshold, PeptideIdentificationList &xl_pi, std::vector< double > xl_PSM_qvalue_thresholds, std::vector< double > xl_peptidelevel_qvalue_thresholds, const std::string &out_idxml, int decoy_factor) const
 Compute separated FDRs, apply class-specific filters, and write per-threshold idXML / TSV reports.
 

Private Attributes

size_t report_top_hits_
 Top-hit cap consulted by the FDR-applying methods to toggle use_all_hits.
 

Detailed Description

Class-separated FDR calculation for nucleic-acid cross-links (NuXL).

Splits PSMs into a "plain peptide" class (NuXL:isXL == 0) and a "cross-link" class (NuXL:isXL != 0), computes FDR separately for each, and (optionally) writes one filtered idXML per requested XL q-value threshold plus a TSV protein report for stringent (<= 10%) XL FDRs.

Built on top of FalseDiscoveryRate. Decoys are kept in the FDR result (so downstream Percolator and similar tools can still see them); the convenience filter method calls IDFilter::removeDecoyHits before writing.

Constructor & Destructor Documentation

◆ NuXLFDR()

NuXLFDR ( size_t  report_top_hits)
explicit

Construct with a top-hit cap that controls how FalseDiscoveryRate is parametrised.

report_top_hits >= 2 toggles use_all_hits = true on the underlying FalseDiscoveryRate so q-values are computed across all hits per PSM, not only the top hit. The value is otherwise stored verbatim and consulted by the QValueAtPSMLevel and calculatePeptideAndXLQValueAtPSMLevel methods.

Parameters
[in]report_top_hitsTop-hit cap; values >= 2 enable the underlying FDR engine's use_all_hits mode.

Member Function Documentation

◆ calculatePeptideAndXLQValueAndFilterAtPSMLevel()

void calculatePeptideAndXLQValueAndFilterAtPSMLevel ( const std::vector< ProteinIdentification > &  protein_ids,
const PeptideIdentificationList peptide_ids,
PeptideIdentificationList pep,
double  peptide_PSM_qvalue_threshold,
double  peptide_peptide_qvalue_threshold,
PeptideIdentificationList xl_pi,
std::vector< double >  xl_PSM_qvalue_thresholds,
std::vector< double >  xl_peptidelevel_qvalue_thresholds,
const std::string &  out_idxml,
int  decoy_factor 
) const

Compute separated FDRs, apply class-specific filters, and write per-threshold idXML / TSV reports.

Pipeline (every step is observable behaviour, not implementation detail):

  1. Calls calculatePeptideAndXLQValueAtPSMLevel to populate pep and xl_pi with FDR-annotated PSMs.
  2. When decoy_factor != 1, divides every cross-link hit's score by decoy_factor (peptide hits are not touched).
  3. Adds a deterministic score-range-normalised tie-breaker of magnitude <= 1e-5 to every cross-link score so PSMs that originally tied on the q-value resolve to a unique ordering. Uses "svm_score" when at least one XL hit carries it, otherwise "NuXL:score".
    Warning
    the tie-break divides by the (max - min) score range and is undefined when all XL hits share the same score.
  4. Calls IDFilter::removeDecoyHits on both pep and xl_pi.
  5. Filters pep on PEPTIDE_Q_VALUE when peptide_peptide_qvalue_threshold is in (0, 1); values <= 0 or >= 1 disable the filter.
  6. Filters pep PSMs by score when peptide_PSM_qvalue_threshold is in (0, 1).
  7. Writes the plain-peptide idXML to {out_idxml}{threshold:.4f}_peptides.idXML (with out_idxml the literal prefix and threshold the configured cut formatted to four decimal places), with unreferenced proteins pruned.
  8. Normalises xl_PSM_qvalue_thresholds: 0.0 entries are rewritten to 1.0 (disabled filter = 100% FDR) and the vector is then sorted in descending order so increasingly stringent filters are applied progressively to the same xl_pi (filters compound across iterations).
  9. For each (XL-PSM threshold, XL-peptide-level threshold) pair (paired by index; sizes must match — otherwise Exception::InvalidValue is thrown):
    • filter cross-links on PEPTIDE_Q_VALUE when the peptide-level threshold is in (0, 1);
    • filter cross-links by PSM-level q-value when the XL-PSM threshold is in (0, 1);
    • prune unreferenced proteins from a fresh copy of protein_ids;
    • compute coverage by cross-linked peptides on the first ProteinIdentification of that copy;
    • write the XL idXML to {out_idxml}{threshold:.4f}_XLs.idXML;
    • if the XL FDR is <= 0.1, also write a TSV protein report to {out_idxml}proteins{threshold:.4f}_XLs.tsv (skipped for permissive FDRs to bound output size).
Parameters
[in]protein_idsProtein identifications used as the source for the per-file protein lists; unreferenced proteins are pruned per output.
[in]peptide_idsInput PSMs (mixed plain + XL).
[out]pepReceives the filtered plain-peptide PSMs (after step 6).
[in]peptide_PSM_qvalue_thresholdPlain-peptide PSM-level q-value cut; values outside (0, 1) disable the filter.
[in]peptide_peptide_qvalue_thresholdPlain-peptide peptide-level q-value cut; values outside (0, 1) disable the filter.
[out]xl_piReceives the filtered cross-link PSMs (state at the most stringent XL threshold).
[in]xl_PSM_qvalue_thresholdsCross-link PSM-level q-value thresholds; 0.0 is rewritten to 1.0 (disabled = 100% FDR). The vector is sorted descending in place.
[in]xl_peptidelevel_qvalue_thresholdsCross-link peptide-level q-value thresholds paired by index with xl_PSM_qvalue_thresholds (must have the same size).
[in]out_idxmlOutput filename prefix; the FDR threshold and a suffix (_peptides.idXML / _XLs.idXML / proteins...XLs.tsv) are appended.
[in]decoy_factorScore scale-down factor for the XL class; ignored when equal to 1.
Exceptions
OpenMS::Exception::InvalidValuewhen xl_PSM_qvalue_thresholds and xl_peptidelevel_qvalue_thresholds differ in size.

◆ calculatePeptideAndXLQValueAtPSMLevel()

void calculatePeptideAndXLQValueAtPSMLevel ( const PeptideIdentificationList peptide_ids,
PeptideIdentificationList pep_pi,
PeptideIdentificationList xl_pi 
) const

Compute PSM- and peptide-level q-values separately for plain and cross-linked PSMs.

Internally calls splitIntoPeptidesAndXLs and then applies FalseDiscoveryRate to each output list with the same parameters as QValueAtPSMLevel. The q-values are computed independently on each class so XL and non-XL FDR control are not confounded.

Parameters
[in]peptide_idsInput mixed plain-peptide / cross-link PSMs.
[out]pep_piPlain-peptide PSMs with PSM- and peptide-level q-values annotated.
[out]xl_piCross-link PSMs with PSM- and peptide-level q-values annotated.

◆ mergePeptidesAndXLs()

void mergePeptidesAndXLs ( const PeptideIdentificationList pep_pi,
const PeptideIdentificationList xl_pi,
PeptideIdentificationList peptide_ids 
) const

Re-merge a previously split (splitIntoPeptidesAndXLs) pair of lists into one PSM list.

pep_pi entries are appended verbatim. For each xl_pi entry, the meta value "spectrum_reference" is used as the merge key:

  • if no plain-peptide entry shares this spectrum reference, the XL entry is appended as a new PeptideIdentification;
  • if a plain-peptide entry already exists for this spectrum, the XL hits are appended to its hit list and that hit list is re-sorted in place.

peptide_ids is cleared on entry.

Parameters
[in]pep_piPlain-peptide PSMs (typically from splitIntoPeptidesAndXLs).
[in]xl_piCross-link PSMs (typically from splitIntoPeptidesAndXLs).
[out]peptide_idsReceives the merged PSM list keyed by "spectrum_reference".

◆ QValueAtPSMLevel()

void QValueAtPSMLevel ( PeptideIdentificationList peptide_ids) const

Compute PSM-level q-values across peptide_ids without splitting by class.

Wraps FalseDiscoveryRate with add_decoy_proteins = true, add_decoy_peptides = true (decoys are retained in the result for downstream tools such as Percolator), and use_all_hits = true iff report_top_hits_ passed to the constructor is >= 2. Also computes the peptide-level q-value meta value Constants::UserParam::PEPTIDE_Q_VALUE on each hit.

Parameters
[in,out]peptide_idsPSMs to score in place; each hit receives PSM- and peptide-level q-value meta values.

◆ splitIntoPeptidesAndXLs()

void splitIntoPeptidesAndXLs ( const PeptideIdentificationList peptide_ids,
PeptideIdentificationList pep_pi,
PeptideIdentificationList xl_pi 
) const

Partition peptide_ids into plain-peptide and cross-link PSM lists.

For each input PeptideIdentification, walks its hits and keeps only the first hit encountered for each class — even when report_top_hits >= 2 was passed to the constructor. The classification is based on the integer meta value "NuXL:isXL" (zero → plain peptide; anything else → cross-link). An input identification contributes to pep_pi or xl_pi (or both, if it had hits of both classes), but never to pep_pi twice.

pep_pi and xl_pi are cleared on entry.

Parameters
[in]peptide_idsInput mixed plain-peptide / cross-link PSMs.
[out]pep_piReceives the plain-peptide PSMs (one hit per PSM, the first encountered with NuXL:isXL == 0).
[out]xl_piReceives the cross-link PSMs (one hit per PSM, the first encountered with NuXL:isXL != 0).

Member Data Documentation

◆ report_top_hits_

size_t report_top_hits_
private

Top-hit cap consulted by the FDR-applying methods to toggle use_all_hits.