![]() |
OpenMS
|
Class-separated FDR calculation for nucleic-acid cross-links (NuXL). More...
#include <OpenMS/ANALYSIS/NUXL/NuXLFDR.h>
Public Member Functions | |
| NuXLFDR (size_t report_top_hits) | |
| Construct with a top-hit cap that controls how FalseDiscoveryRate is parametrised. | |
| void | splitIntoPeptidesAndXLs (const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep_pi, PeptideIdentificationList &xl_pi) const |
Partition peptide_ids into plain-peptide and cross-link PSM lists. | |
| void | mergePeptidesAndXLs (const PeptideIdentificationList &pep_pi, const PeptideIdentificationList &xl_pi, PeptideIdentificationList &peptide_ids) const |
| Re-merge a previously split (splitIntoPeptidesAndXLs) pair of lists into one PSM list. | |
| void | QValueAtPSMLevel (PeptideIdentificationList &peptide_ids) const |
Compute PSM-level q-values across peptide_ids without splitting by class. | |
| void | calculatePeptideAndXLQValueAtPSMLevel (const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep_pi, PeptideIdentificationList &xl_pi) const |
| Compute PSM- and peptide-level q-values separately for plain and cross-linked PSMs. | |
| void | calculatePeptideAndXLQValueAndFilterAtPSMLevel (const std::vector< ProteinIdentification > &protein_ids, const PeptideIdentificationList &peptide_ids, PeptideIdentificationList &pep, double peptide_PSM_qvalue_threshold, double peptide_peptide_qvalue_threshold, PeptideIdentificationList &xl_pi, std::vector< double > xl_PSM_qvalue_thresholds, std::vector< double > xl_peptidelevel_qvalue_thresholds, const std::string &out_idxml, int decoy_factor) const |
| Compute separated FDRs, apply class-specific filters, and write per-threshold idXML / TSV reports. | |
Private Attributes | |
| size_t | report_top_hits_ |
Top-hit cap consulted by the FDR-applying methods to toggle use_all_hits. | |
Class-separated FDR calculation for nucleic-acid cross-links (NuXL).
Splits PSMs into a "plain peptide" class (NuXL:isXL == 0) and a "cross-link" class (NuXL:isXL != 0), computes FDR separately for each, and (optionally) writes one filtered idXML per requested XL q-value threshold plus a TSV protein report for stringent (<= 10%) XL FDRs.
Built on top of FalseDiscoveryRate. Decoys are kept in the FDR result (so downstream Percolator and similar tools can still see them); the convenience filter method calls IDFilter::removeDecoyHits before writing.
|
explicit |
Construct with a top-hit cap that controls how FalseDiscoveryRate is parametrised.
report_top_hits >= 2 toggles use_all_hits = true on the underlying FalseDiscoveryRate so q-values are computed across all hits per PSM, not only the top hit. The value is otherwise stored verbatim and consulted by the QValueAtPSMLevel and calculatePeptideAndXLQValueAtPSMLevel methods.
| [in] | report_top_hits | Top-hit cap; values >= 2 enable the underlying FDR engine's use_all_hits mode. |
| void calculatePeptideAndXLQValueAndFilterAtPSMLevel | ( | const std::vector< ProteinIdentification > & | protein_ids, |
| const PeptideIdentificationList & | peptide_ids, | ||
| PeptideIdentificationList & | pep, | ||
| double | peptide_PSM_qvalue_threshold, | ||
| double | peptide_peptide_qvalue_threshold, | ||
| PeptideIdentificationList & | xl_pi, | ||
| std::vector< double > | xl_PSM_qvalue_thresholds, | ||
| std::vector< double > | xl_peptidelevel_qvalue_thresholds, | ||
| const std::string & | out_idxml, | ||
| int | decoy_factor | ||
| ) | const |
Compute separated FDRs, apply class-specific filters, and write per-threshold idXML / TSV reports.
Pipeline (every step is observable behaviour, not implementation detail):
pep and xl_pi with FDR-annotated PSMs.decoy_factor != 1, divides every cross-link hit's score by decoy_factor (peptide hits are not touched).<= 1e-5 to every cross-link score so PSMs that originally tied on the q-value resolve to a unique ordering. Uses "svm_score" when at least one XL hit carries it, otherwise "NuXL:score". (max - min) score range and is undefined when all XL hits share the same score.pep and xl_pi.pep on PEPTIDE_Q_VALUE when peptide_peptide_qvalue_threshold is in (0, 1); values <= 0 or >= 1 disable the filter.pep PSMs by score when peptide_PSM_qvalue_threshold is in (0, 1).{out_idxml}{threshold:.4f}_peptides.idXML (with out_idxml the literal prefix and threshold the configured cut formatted to four decimal places), with unreferenced proteins pruned.xl_PSM_qvalue_thresholds: 0.0 entries are rewritten to 1.0 (disabled filter = 100% FDR) and the vector is then sorted in descending order so increasingly stringent filters are applied progressively to the same xl_pi (filters compound across iterations).Exception::InvalidValue is thrown):PEPTIDE_Q_VALUE when the peptide-level threshold is in (0, 1);(0, 1);protein_ids;ProteinIdentification of that copy;{out_idxml}{threshold:.4f}_XLs.idXML;<= 0.1, also write a TSV protein report to {out_idxml}proteins{threshold:.4f}_XLs.tsv (skipped for permissive FDRs to bound output size).| [in] | protein_ids | Protein identifications used as the source for the per-file protein lists; unreferenced proteins are pruned per output. |
| [in] | peptide_ids | Input PSMs (mixed plain + XL). |
| [out] | pep | Receives the filtered plain-peptide PSMs (after step 6). |
| [in] | peptide_PSM_qvalue_threshold | Plain-peptide PSM-level q-value cut; values outside (0, 1) disable the filter. |
| [in] | peptide_peptide_qvalue_threshold | Plain-peptide peptide-level q-value cut; values outside (0, 1) disable the filter. |
| [out] | xl_pi | Receives the filtered cross-link PSMs (state at the most stringent XL threshold). |
| [in] | xl_PSM_qvalue_thresholds | Cross-link PSM-level q-value thresholds; 0.0 is rewritten to 1.0 (disabled = 100% FDR). The vector is sorted descending in place. |
| [in] | xl_peptidelevel_qvalue_thresholds | Cross-link peptide-level q-value thresholds paired by index with xl_PSM_qvalue_thresholds (must have the same size). |
| [in] | out_idxml | Output filename prefix; the FDR threshold and a suffix (_peptides.idXML / _XLs.idXML / proteins...XLs.tsv) are appended. |
| [in] | decoy_factor | Score scale-down factor for the XL class; ignored when equal to 1. |
| OpenMS::Exception::InvalidValue | when xl_PSM_qvalue_thresholds and xl_peptidelevel_qvalue_thresholds differ in size. |
| void calculatePeptideAndXLQValueAtPSMLevel | ( | const PeptideIdentificationList & | peptide_ids, |
| PeptideIdentificationList & | pep_pi, | ||
| PeptideIdentificationList & | xl_pi | ||
| ) | const |
Compute PSM- and peptide-level q-values separately for plain and cross-linked PSMs.
Internally calls splitIntoPeptidesAndXLs and then applies FalseDiscoveryRate to each output list with the same parameters as QValueAtPSMLevel. The q-values are computed independently on each class so XL and non-XL FDR control are not confounded.
| [in] | peptide_ids | Input mixed plain-peptide / cross-link PSMs. |
| [out] | pep_pi | Plain-peptide PSMs with PSM- and peptide-level q-values annotated. |
| [out] | xl_pi | Cross-link PSMs with PSM- and peptide-level q-values annotated. |
| void mergePeptidesAndXLs | ( | const PeptideIdentificationList & | pep_pi, |
| const PeptideIdentificationList & | xl_pi, | ||
| PeptideIdentificationList & | peptide_ids | ||
| ) | const |
Re-merge a previously split (splitIntoPeptidesAndXLs) pair of lists into one PSM list.
pep_pi entries are appended verbatim. For each xl_pi entry, the meta value "spectrum_reference" is used as the merge key:
PeptideIdentification;peptide_ids is cleared on entry.
| [in] | pep_pi | Plain-peptide PSMs (typically from splitIntoPeptidesAndXLs). |
| [in] | xl_pi | Cross-link PSMs (typically from splitIntoPeptidesAndXLs). |
| [out] | peptide_ids | Receives the merged PSM list keyed by "spectrum_reference". |
| void QValueAtPSMLevel | ( | PeptideIdentificationList & | peptide_ids | ) | const |
Compute PSM-level q-values across peptide_ids without splitting by class.
Wraps FalseDiscoveryRate with add_decoy_proteins = true, add_decoy_peptides = true (decoys are retained in the result for downstream tools such as Percolator), and use_all_hits = true iff report_top_hits_ passed to the constructor is >= 2. Also computes the peptide-level q-value meta value Constants::UserParam::PEPTIDE_Q_VALUE on each hit.
| [in,out] | peptide_ids | PSMs to score in place; each hit receives PSM- and peptide-level q-value meta values. |
| void splitIntoPeptidesAndXLs | ( | const PeptideIdentificationList & | peptide_ids, |
| PeptideIdentificationList & | pep_pi, | ||
| PeptideIdentificationList & | xl_pi | ||
| ) | const |
Partition peptide_ids into plain-peptide and cross-link PSM lists.
For each input PeptideIdentification, walks its hits and keeps only the first hit encountered for each class — even when report_top_hits >= 2 was passed to the constructor. The classification is based on the integer meta value "NuXL:isXL" (zero → plain peptide; anything else → cross-link). An input identification contributes to pep_pi or xl_pi (or both, if it had hits of both classes), but never to pep_pi twice.
pep_pi and xl_pi are cleared on entry.
| [in] | peptide_ids | Input mixed plain-peptide / cross-link PSMs. |
| [out] | pep_pi | Receives the plain-peptide PSMs (one hit per PSM, the first encountered with NuXL:isXL == 0). |
| [out] | xl_pi | Receives the cross-link PSMs (one hit per PSM, the first encountered with NuXL:isXL != 0). |
|
private |
Top-hit cap consulted by the FDR-applying methods to toggle use_all_hits.