![]() |
OpenMS
|
ID-guided MS1 feature finder; the algorithm behind FeatureFinderIdentification. More...
#include <OpenMS/FEATUREFINDER/FeatureFinderIdentificationAlgorithm.h>
Classes | |
| struct | FeatureCompare |
| comparison functor for features More... | |
| struct | FeatureFilterPeptides |
| predicate for filtering features by assigned peptides: More... | |
| struct | FeatureFilterQuality |
| predicate for filtering features by overall quality: More... | |
| struct | IMStats |
| Ion mobility statistics for a peptide in a specific RT region and charge state. More... | |
| struct | PeptideCompare |
| comparison functor for (unassigned) peptide IDs More... | |
| struct | RTRegion |
| region in RT in which a peptide elutes: More... | |
Public Member Functions | |
| FeatureFinderIdentificationAlgorithm () | |
| Default constructor; installs the FFid parameters (see class docs) | |
| void | run (PeptideIdentificationList peptides, const std::vector< ProteinIdentification > &proteins, PeptideIdentificationList peptides_ext, std::vector< ProteinIdentification > proteins_ext, FeatureMap &features, const FeatureMap &seeds=FeatureMap(), const std::string &spectra_file="") |
| Run the FFid pipeline; for FAIMS data this dispatches one run per CV group and merges results. | |
| void | runOnCandidates (FeatureMap &features) |
Re-score / filter an existing candidate FeatureMap in place using the configured classifier and quality cutoffs (entry point used to re-process candidates exported via the candidates_out parameter). | |
| PeakMap & | getMSData () |
| Mutable access to the cached MS1 data. | |
| const PeakMap & | getMSData () const |
| Read-only access to the cached MS1 data. | |
| void | setMSData (const PeakMap &ms_data) |
| Copy the MS data into the algorithm instance; useful from pyOpenMS where moving is awkward. | |
| void | setMSData (PeakMap &&ms_data) |
| Move the MS data into the algorithm instance (no copy). | |
| PeakMap & | getChromatograms () |
| Mutable access to the accumulated extracted chromatograms (XICs) from the last run. | |
| const PeakMap & | getChromatograms () const |
| Read-only access to the accumulated extracted chromatograms. | |
| ProgressLogger & | getProgressLogger () |
| Mutable access to the progress logger used by the algorithm. | |
| const ProgressLogger & | getProgressLogger () const |
| Read-only access to the progress logger. | |
| TargetedExperiment & | getLibrary () |
| Mutable access to the assay library used / produced by the last run. | |
| const TargetedExperiment & | getLibrary () const |
| Read-only access to the assay library used / produced by the last run. | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const std::string &name) | |
| Constructor with name that is displayed in error messages. | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. | |
| virtual | ~DefaultParamHandler () |
| Destructor. | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. | |
| const std::string & | getName () const |
| Non-mutable access to the name. | |
| void | setName (const std::string &name) |
| Mutable access to the name. | |
| const std::vector< std::string > & | getSubsections () const |
| Non-mutable access to the registered subsections. | |
Protected Types | |
| typedef FeatureFinderAlgorithmPickedHelperStructs::MassTrace | MassTrace |
| typedef FeatureFinderAlgorithmPickedHelperStructs::MassTraces | MassTraces |
| typedef std::multimap< double, PeptideIdentification * > | RTMap |
| mapping: RT (not necessarily unique) -> pointer to peptide | |
| typedef std::map< Int, std::pair< RTMap, RTMap > > | ChargeMap |
| mapping: charge -> internal/external: (RT -> pointer to peptide) | |
| typedef std::map< AASequence, ChargeMap > | PeptideMap |
| mapping: sequence -> charge -> internal/external ID information | |
| typedef std::map< std::string, std::pair< RTMap, RTMap > > | PeptideRefRTMap |
| mapping: peptide ref. -> int./ext.: (RT -> pointer to peptide) | |
Protected Member Functions | |
| void | updateMembers_ () override |
| This method is used to update extra member variables at the end of the setParameters() method. | |
| void | generateTransitions_ (const std::string &peptide_id, double mz, Int charge, const IsotopeDistribution &iso_dist) |
| generate transitions (isotopic traces) for a peptide ion and add them to the library: | |
| void | addPeptideRT_ (TargetedExperiment::Peptide &peptide, double rt) const |
| void | getRTRegions_ (ChargeMap &peptide_data, std::vector< RTRegion > &rt_regions, bool clear_IDs=true) const |
| get regions in which peptide eludes (ideally only one) by clustering RT elution times | |
| IMStats | getRTRegionIMStats_ (const RTRegion &r) |
| Calculate ion mobility statistics for peptide identifications in an RT region. | |
| void | calculateGlobalIMStats_ () |
| Calculate global IM statistics from MS data and peptide identifications. | |
| void | annotateFeaturesFinalizeAssay_ (FeatureMap &features, std::map< Size, std::vector< PeptideIdentification * > > &feat_ids, RTMap &rt_internal) |
| void | annotateFeatures_ (FeatureMap &features, PeptideRefRTMap &ref_rt_map) |
| annotate identified features with m/z, isotope probabilities, etc. | |
| void | ensureConvexHulls_ (Feature &feature) const |
| void | postProcess_ (FeatureMap &features, bool with_external_ids) |
| void | validateSVMParameters_ () const |
| Helper functions for run() | |
| void | initializeFeatureFinder_ () |
| double | calculateRTWindow_ (double rt_uncertainty) const |
| void | removeSeedPseudoIDs_ (FeatureMap &features) |
| std::pair< double, double > | calculateRTBounds_ (double rt_min, double rt_max) const |
| Calculate RT bounds with optional tolerance expansion. | |
| void | statistics_ (const FeatureMap &features) const |
| some statistics on detected features | |
| void | createAssayLibrary_ (const PeptideMap::iterator &begin, const PeptideMap::iterator &end, PeptideRefRTMap &ref_rt_map, bool clear_IDs=true) |
| void | addPeptideToMap_ (PeptideIdentification &peptide, PeptideMap &peptide_map, bool external=false) |
| void | filterFeatures_ (FeatureMap &features, bool classified) |
| void | runSingleGroup_ (PeptideIdentificationList peptides, const std::vector< ProteinIdentification > &proteins, PeptideIdentificationList peptides_ext, std::vector< ProteinIdentification > proteins_ext, FeatureMap &features, const FeatureMap &seeds, const std::string &spectra_file) |
| Size | addSeeds_ (PeptideIdentificationList &peptides, const FeatureMap &seeds) |
| Size | addOffsetPeptides_ (PeptideIdentificationList &peptides, double offset) |
| template<typename It > | |
| std::vector< std::pair< It, It > > | chunk_ (It range_from, It range_to, const std::ptrdiff_t batch_size) |
Protected Member Functions inherited from DefaultParamHandler | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. | |
Static Protected Member Functions | |
| static bool | isSeedPseudoHit_ (const PeptideHit &hit) |
| Helper function to check if a peptide hit is a seed pseudo-ID. | |
Protected Attributes | |
| PeptideMap | peptide_map_ |
| Size | n_internal_peps_ |
| number of internal peptide | |
| Size | n_external_peps_ |
| number of external peptides | |
| Size | batch_size_ |
| nr of peptides to use at the same time during chromatogram extraction | |
| double | rt_window_ |
| RT window width. | |
| double | mz_window_ |
| m/z window width | |
| bool | mz_window_ppm_ |
| m/z window width is given in PPM (not Da)? | |
| double | mapping_tolerance_ |
| RT tolerance for mapping IDs to features. | |
| double | isotope_pmin_ |
| min. isotope probability for peptide assay | |
| Size | n_isotopes_ |
| number of isotopes for peptide assay | |
| double | rt_quantile_ |
| double | peak_width_ |
| double | min_peak_width_ |
| double | signal_to_noise_ |
| std::string | elution_model_ |
| double | svm_min_prob_ |
| StringList | svm_predictor_names_ |
| std::string | svm_xval_out_ |
| double | svm_quality_cutoff |
| Size | svm_n_parts_ |
| number of partitions for SVM cross-validation | |
| Size | svm_n_samples_ |
| number of samples for SVM training | |
| std::string | candidates_out_ |
| Size | debug_level_ |
| struct OpenMS::FeatureFinderIdentificationAlgorithm::FeatureFilterQuality | feature_filter_quality_ |
| struct OpenMS::FeatureFinderIdentificationAlgorithm::FeatureFilterPeptides | feature_filter_peptides_ |
| struct OpenMS::FeatureFinderIdentificationAlgorithm::PeptideCompare | peptide_compare_ |
| struct OpenMS::FeatureFinderIdentificationAlgorithm::FeatureCompare | feature_compare_ |
| PeakMap | ms_data_ |
| input LC-MS data | |
| PeakMap | chrom_data_ |
| accumulated chromatograms (XICs) | |
| TargetedExperiment | library_ |
| assays for peptides (cleared per chunk during processing) | |
| TargetedExperiment | output_library_ |
| accumulated assays for output (populated from library_ before clearing) | |
| bool | quantify_decoys_ |
| double | add_mass_offset_peptides_ {0.0} |
| non-zero if for every feature an additional offset features should be extracted | |
| double | seed_apex_rt_tolerance_ {5.0} |
| max allowed RT deviation (s) between seed apex and detected feature apex; seed-derived features exceeding this are removed (0 = disabled) | |
| bool | use_psm_cutoff_ |
| double | psm_score_cutoff_ |
| PeptideIdentificationList | unassignedIDs_ |
| const double | seed_rt_window_ = 60.0 |
| extraction window used for seeds (smaller than rt_window_ as we know the exact apex positions) | |
| std::map< double, std::pair< Size, Size > > | svm_probs_internal_ |
| SVM probability -> number of pos./neg. features (for FDR calculation): | |
| std::multiset< double > | svm_probs_external_ |
| SVM probabilities for "external" features (for FDR calculation): | |
| Size | n_internal_features_ |
| internal feature counter (for FDR calculation) | |
| Size | n_external_features_ |
| std::map< std::string, double > | isotope_probs_ |
| TransformationDescription trafo_; // RT transformation (to range 0-1) | |
| std::map< std::string, IMStats > | im_stats_ |
| Ion mobility statistics per peptide reference (peptide sequence/charge:region) | |
| IMStats | global_im_stats_ |
| Global ion mobility statistics from all peptide identifications. | |
| MRMFeatureFinderScoring | feat_finder_ |
| OpenSWATH feature finder. | |
| Internal::FFIDAlgoExternalIDHandler | external_id_handler_ |
| Handler for external peptide IDs. | |
| ProgressLogger | prog_log_ |
Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ |
| Container for current parameters. | |
| Param | defaults_ |
| Container for default parameters. This member should be filled in the constructor of derived classes! | |
| std::vector< std::string > | subsections_ |
| Container for registered subsections. This member should be filled in the constructor of derived classes! | |
| std::string | error_name_ |
| Name that is displayed in error messages during the parameter checking. | |
| bool | check_defaults_ |
| If this member is set to false no checking if parameters in done;. | |
| bool | warn_empty_defaults_ |
| If this member is set to false no warning is emitted when defaults are empty;. | |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="") |
| Writes all parameters to meta values. | |
ID-guided MS1 feature finder; the algorithm behind FeatureFinderIdentification.
Uses peptide identifications as targets to drive chromatogram extraction from MS1 data, builds a per-peptide assay library on the fly, scores extracted peak groups with MRMFeatureFinderScoring, and emits a FeatureMap of quantified features.
Two ID layers are accepted:
peptides / proteins) — the high-confidence anchors used to define candidate elution windows and to train the SVM classifier (when classification is enabled).peptides_ext / proteins_ext) — optional, lower-confidence transfer IDs (e.g. from match-between-runs) used to extend the search space without contributing to SVM training. When external lists are empty, the algorithm skips machine learning and FDR estimation entirely.Optional seeds from an upstream untargeted feature finder can also be added.
If the input PeakMap carries multiple FAIMS compensation voltages, run splits the data via IMDataConverter::splitByFAIMSCV, runs the per-CV pipeline in a fresh algorithm instance per group (peptide IDs filtered by FAIMS_CV through FAIMSHelper::filterPeptidesByFAIMSCV), annotates the resulting features with the FAIMS_CV meta value, and merges everything back into the caller's features. IDs without a FAIMS_CV are processed against every CV group for backward compatibility. In the multi-FAIMS case, getLibrary returns an empty library because each CV group has its own assay library.
Both ID lists are taken by value (see the run() signature), so the caller's containers are never observably modified. The transformations described here happen on the function's local copies and influence which IDs reach features:
peptides is shrunk to the best hit per identification.features.seeds carries IDs, those are appended to the local copy of peptides before scoring.After feature detection the FeatureMap's primaryMSRunPath is set to the ms_data_'s recorded run path; if that is not a valid / readable mzML, spectra_file is used as a fallback annotation (see MSExperiment overload of FeatureMap::setPrimaryMSRunPath).
| struct OpenMS::FeatureFinderIdentificationAlgorithm::IMStats |
Ion mobility statistics for a peptide in a specific RT region and charge state.
This structure stores statistical measures of ion mobility values collected from peptide identifications within a single RT region. These statistics are used for:
All values default to -1.0 to indicate missing/unavailable IM data.
| struct OpenMS::FeatureFinderIdentificationAlgorithm::RTRegion |
mapping: charge -> internal/external: (RT -> pointer to peptide)
|
protected |
|
protected |
|
protected |
mapping: sequence -> charge -> internal/external ID information
|
protected |
mapping: peptide ref. -> int./ext.: (RT -> pointer to peptide)
|
protected |
mapping: RT (not necessarily unique) -> pointer to peptide
Default constructor; installs the FFid parameters (see class docs)
|
protected |
|
protected |
|
protected |
CAUTION: This method stores a pointer to the given peptide reference in internals Make sure it stays valid until destruction of the class.
|
protected |
|
protected |
annotate identified features with m/z, isotope probabilities, etc.
|
protected |
|
protected |
Calculate global IM statistics from MS data and peptide identifications.
Uses MSExperiment::getMinMobility()/getMaxMobility() to get the full IM range from raw data (min/max), and calculates median from peptide identifications for robust central tendency. Must be called BEFORE addSeeds_() to ensure global statistics are based only on identified peptides.
Seeds may or may not have IM annotation depending on the feature finder. Seeds with IM annotation use their own IM value; seeds without IM are extracted across the full IM range of the dataset.
|
protected |
Calculate RT bounds with optional tolerance expansion.
|
protected |
|
inlineprotected |
Chunks an iterator range (allowing advance and distance) into batches of size batch_size. Last batch might be smaller.
|
protected |
creates an assay library out of the peptide sequences and their RT elution windows the PeptideMap is mutable since we clear it on-the-go clear_IDs set to false to keep IDs in internal charge maps (only needed for debugging purposes)
|
protected |
|
protected |
|
protected |
generate transitions (isotopic traces) for a peptide ion and add them to the library:
| PeakMap & getChromatograms | ( | ) |
| const PeakMap & getChromatograms | ( | ) | const |
Read-only access to the accumulated extracted chromatograms.
| TargetedExperiment & getLibrary | ( | ) |
Mutable access to the assay library used / produced by the last run.
| const TargetedExperiment & getLibrary | ( | ) | const |
Read-only access to the assay library used / produced by the last run.
| PeakMap & getMSData | ( | ) |
Mutable access to the cached MS1 data.
Referenced by NuXLRTPrediction::train().
| const PeakMap & getMSData | ( | ) | const |
Read-only access to the cached MS1 data.
| ProgressLogger & getProgressLogger | ( | ) |
Mutable access to the progress logger used by the algorithm.
| const ProgressLogger & getProgressLogger | ( | ) | const |
Read-only access to the progress logger.
Calculate ion mobility statistics for peptide identifications in an RT region.
Computes median, min, and max IM values from peptide identifications within the given RT region (across all charge states). Individual IDs lacking IM annotation are skipped (with warning), and statistics are calculated from the remaining IDs with valid IM data. The median is used for robust central tendency estimation and is more resistant to outliers than the mean.
Seeds from untargeted feature finders may or may not have an IM meta value set, depending on the feature finder. If IM is annotated on the seed, it is used for targeted extraction. If not, the seed is extracted across the full IM range (ChromatogramExtractor disables IM filtering when ion_mobility < 0).
Note: RT region boundaries are determined from ALL IDs (including those without IM), so this only affects IM statistics calculation, not RT extraction.
| [in] | r | RT region containing peptide identifications grouped by charge state |
|
protected |
get regions in which peptide eludes (ideally only one) by clustering RT elution times
|
protected |
|
staticprotected |
Helper function to check if a peptide hit is a seed pseudo-ID.
|
protected |
|
protected |
| void run | ( | PeptideIdentificationList | peptides, |
| const std::vector< ProteinIdentification > & | proteins, | ||
| PeptideIdentificationList | peptides_ext, | ||
| std::vector< ProteinIdentification > | proteins_ext, | ||
| FeatureMap & | features, | ||
| const FeatureMap & | seeds = FeatureMap(), |
||
| const std::string & | spectra_file = "" |
||
| ) |
Run the FFid pipeline; for FAIMS data this dispatches one run per CV group and merges results.
See the class brief for the role of external IDs, the seeds list, the FAIMS handling, the side effects on the input ID lists, and the primary-MS-run-path fallback semantics.
| [in] | peptides | Internal peptide IDs (taken by value). The local copy is shrunk to best hit per identification and FFid meta values are added before being written into features; the caller's container is not modified. |
| [in] | proteins | Protein IDs corresponding to peptides. |
| [in] | peptides_ext | External peptide IDs, optional (may be empty); taken by value with the same shrink-and-annotate treatment as peptides. |
| [in] | proteins_ext | Protein IDs corresponding to peptides_ext. |
| [out] | features | Quantified feature map; pre-existing contents are cleared for FAIMS data and replaced. |
| [in] | seeds | Optional pre-detected features from an untargeted feature finder. |
| [in] | spectra_file | Source mzML path used as a fallback for primaryMSRunPath annotation when the MSExperiment's own path isn't usable. |
Referenced by NuXLRTPrediction::train().
| void runOnCandidates | ( | FeatureMap & | features | ) |
Re-score / filter an existing candidate FeatureMap in place using the configured classifier and quality cutoffs (entry point used to re-process candidates exported via the candidates_out parameter).
|
protected |
Core processing logic for a single (non-FAIMS or single FAIMS group) dataset Called by run() either directly or for each FAIMS CV group
| void setMSData | ( | const PeakMap & | ms_data | ) |
Copy the MS data into the algorithm instance; useful from pyOpenMS where moving is awkward.
| [in] | ms_data | Source PeakMap; copied into the internal cache. |
| void setMSData | ( | PeakMap && | ms_data | ) |
|
protected |
some statistics on detected features
|
overrideprotectedvirtual |
This method is used to update extra member variables at the end of the setParameters() method.
Also call it at the end of the derived classes' copy constructor and assignment operator.
The default implementation is empty.
Reimplemented from DefaultParamHandler.
|
protected |
Helper functions for run()
|
protected |
non-zero if for every feature an additional offset features should be extracted
|
protected |
nr of peptides to use at the same time during chromatogram extraction
|
protected |
|
protected |
accumulated chromatograms (XICs)
|
protected |
|
protected |
|
protected |
Handler for external peptide IDs.
|
protected |
OpenSWATH feature finder.
|
protected |
|
protected |
|
protected |
|
protected |
Global ion mobility statistics from all peptide identifications.
Calculated from peptide identifications BEFORE seeds are added (ensuring we only learn from real IDs with IM annotation). Provides context for the typical IM range in the dataset.
|
protected |
Ion mobility statistics per peptide reference (peptide sequence/charge:region)
Maps from full peptide reference (e.g., "PEPTIDE/2:1") to IM statistics. Populated during createAssayLibrary_() and used during annotateFeatures_() to add IM_median, IM_min, and IM_max meta-values to features.
|
protected |
min. isotope probability for peptide assay
|
protected |
TransformationDescription trafo_; // RT transformation (to range 0-1)
isotope probabilities of transitions
|
protected |
assays for peptides (cleared per chunk during processing)
|
protected |
RT tolerance for mapping IDs to features.
|
protected |
|
protected |
input LC-MS data
|
protected |
m/z window width
|
protected |
m/z window width is given in PPM (not Da)?
|
protected |
external feature counter (for FDR calculation)
|
protected |
number of external peptides
|
protected |
internal feature counter (for FDR calculation)
|
protected |
number of internal peptide
|
protected |
number of isotopes for peptide assay
|
protected |
accumulated assays for output (populated from library_ before clearing)
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
RT window width.
|
protected |
max allowed RT deviation (s) between seed apex and detected feature apex; seed-derived features exceeding this are removed (0 = disabled)
|
protected |
extraction window used for seeds (smaller than rt_window_ as we know the exact apex positions)
|
protected |
|
protected |
|
protected |
number of partitions for SVM cross-validation
|
protected |
number of samples for SVM training
|
protected |
|
protected |
SVM probabilities for "external" features (for FDR calculation):
SVM probability -> number of pos./neg. features (for FDR calculation):
|
protected |
|
protected |
|
protected |
|
protected |