![]() |
OpenMS
|
Fragment-index-based peptide database search algorithm (experimental). More...
#include <OpenMS/ANALYSIS/ID/ProSEAlgorithm.h>
Classes | |
| struct | AnnotatedHit_ |
| Slimmer structure as storing all scored candidates in PeptideHit objects takes too much space. More... | |
| struct | CalibrationResult_ |
| Result of a calibration pass. More... | |
| struct | MultiFileSearchResult |
| Multi-file search result bundle. More... | |
| struct | SearchContext |
| Prepared per-database state shared across multiple spectrum files. More... | |
| struct | SearchResult |
| Comprehensive search result including modification analysis. More... | |
Public Types | |
| enum class | ExitCodes { EXECUTION_OK , INPUT_FILE_EMPTY , UNEXPECTED_RESULT , UNKNOWN_ERROR , ILLEGAL_PARAMETERS } |
| Exit codes. More... | |
Public Types inherited from ProgressLogger | |
| enum | LogType { CMD , GUI , NONE } |
| Possible log types. More... | |
Public Member Functions | |
| ProSEAlgorithm () | |
| ExitCodes | search (const String &in_spectra, const String &in_db, std::vector< ProteinIdentification > &prot_ids, PeptideIdentificationList &pep_ids) const |
| Search spectra in a spectrum file (mzML or Bruker .d) against a protein database using an FI-backed workflow. | |
| SearchResult | searchWithModificationAnalysis (const String &in_spectra, const String &in_db, const String &output_base_name="") const |
| Search with comprehensive results including modification analysis tables. | |
| ExitCodes | search (PeakMap &spectra, const std::vector< FASTAFile::FASTAEntry > &fasta_db, std::vector< ProteinIdentification > &prot_ids, PeptideIdentificationList &pep_ids) const |
| In-memory search: search spectra against a protein database without file I/O. | |
| SearchContext | prepareContext (const std::vector< FASTAFile::FASTAEntry > &fasta_db) const |
| Build a SearchContext (decoy-augmented database + FragmentIndex) for reuse. | |
| ExitCodes | search (PeakMap &spectra, SearchContext &ctx, std::vector< ProteinIdentification > &prot_ids, PeptideIdentificationList &pep_ids) const |
| In-memory search using a pre-built SearchContext. | |
| SearchResult | searchWithModificationAnalysis (PeakMap &spectra, const std::vector< FASTAFile::FASTAEntry > &fasta_db, const String &output_base_name="") const |
| In-memory search with modification analysis: no file I/O required. | |
| MultiFileSearchResult | searchWithModificationAnalysis (const std::vector< String > &in_spectra_files, const std::vector< FASTAFile::FASTAEntry > &fasta_db, const std::vector< String > &output_base_names={}, const String &aggregate_base_name="") const |
| Multi-file search with modification analysis (in-memory FASTA). | |
| MultiFileSearchResult | searchWithModificationAnalysis (const std::vector< String > &in_spectra_files, const String &in_db, const std::vector< String > &output_base_names={}, const String &aggregate_base_name="") const |
| Multi-file search with modification analysis (FASTA file path). | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const String &name) | |
| Constructor with name that is displayed in error messages. | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. | |
| virtual | ~DefaultParamHandler () |
| Destructor. | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. | |
| const String & | getName () const |
| Non-mutable access to the name. | |
| void | setName (const String &name) |
| Mutable access to the name. | |
| const std::vector< String > & | getSubsections () const |
| Non-mutable access to the registered subsections. | |
Public Member Functions inherited from ProgressLogger | |
| ProgressLogger () | |
| Constructor. | |
| virtual | ~ProgressLogger () |
| Destructor. | |
| ProgressLogger (const ProgressLogger &other) | |
| Copy constructor. | |
| ProgressLogger & | operator= (const ProgressLogger &other) |
| Assignment Operator. | |
| void | setLogType (LogType type) const |
| Sets the progress log that should be used. The default type is NONE! | |
| LogType | getLogType () const |
| Returns the type of progress log being used. | |
| void | setLogger (ProgressLoggerImpl *logger) |
| Sets the logger to be used for progress logging. | |
| void | startProgress (SignedSize begin, SignedSize end, const String &label) const |
| Initializes the progress display. | |
| void | setProgress (SignedSize value) const |
| Sets the current progress. | |
| void | endProgress (UInt64 bytes_processed=0) const |
| void | nextProgress () const |
| increment progress by 1 (according to range begin-end) | |
Protected Member Functions | |
| void | updateMembers_ () override |
| This method is used to update extra member variables at the end of the setParameters() method. | |
| void | postProcessHits_ (const PeakMap &exp, std::vector< std::vector< ProSEAlgorithm::AnnotatedHit_ > > &annotated_hits, std::vector< ProteinIdentification > &protein_ids, PeptideIdentificationList &peptide_ids, Size top_hits, const StringList &modifications_fixed, const StringList &modifications_variable, Int peptide_missed_cleavages, double precursor_mass_tolerance, double fragment_mass_tolerance, const String &precursor_mass_tolerance_unit_ppm, const String &fragment_mass_tolerance_unit_ppm, const Int precursor_min_charge, const Int precursor_max_charge, const String &enzyme, const String &database_name) const |
| Filter and annotate search results. | |
| double | computeModMatchTolerance_ () const |
| CalibrationResult_ | runCalibrationPass_ (PeakMap &spectra, FragmentIndex &fragment_index, const std::vector< FASTAFile::FASTAEntry > &db) const |
| Run a fast calibration pass on a subset of spectra to estimate mass accuracy. | |
| void | logModificationAnalysisSummary_ (const SearchResult &result, const String &output_base_name) const |
| Helper: log the modification analysis summary (shared by in-memory and file-based paths) | |
| void | logSearchDiagnostics_ (const PeakMap &spectra, const std::vector< ProteinIdentification > &protein_ids, const PeptideIdentificationList &peptide_ids) const |
| Helper: log search summary statistics and per-run tolerance estimation. | |
| bool | isOpenSearchMode_ () const |
| Helper function to determine if open search should be used based on tolerance. | |
Protected Member Functions inherited from DefaultParamHandler | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. | |
Static Protected Member Functions | |
| static void | preprocessSpectra_ (PeakMap &exp, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm) |
| filter, deisotope, decharge spectra | |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &key_prefix="") |
| Writes all parameters to meta values. | |
Static Protected Attributes inherited from ProgressLogger | |
| static int | recursion_depth_ |
Fragment-index-based peptide database search algorithm (experimental).
Provides a self-contained search engine that matches MS/MS spectra against a protein database using an FI (Fragment Index). Typical usage:
Notes:
| struct OpenMS::ProSEAlgorithm::CalibrationResult_ |
Result of a calibration pass.
Holds the estimated precursor and fragment tolerances computed from confident PSMs during the calibration pass. When success is false, the tolerance values are undefined and should not be used.
| struct OpenMS::ProSEAlgorithm::MultiFileSearchResult |
Multi-file search result bundle.
Returned by the file-list searchWithModificationAnalysis() overloads. Holds one SearchResult per input file (in per_file, in input order) and a single aggregate result whose peptide_ids are the concatenation of all per-file PSMs and whose modification_analysis is computed once on the pooled set of PSMs.
Special cases for aggregate:
aggregate is left almost-empty (only is_open_search and exit_code are set) — the single-file pooled aggregate would just duplicate per_file[0] and re-run modification analysis on the same PSMs. Callers should use per_file[0] for the result in this case.aggregate.exit_code is set to the first non-OK per-file exit code (so callers can inspect it without walking the per_file vector).The aggregate's protein_ids template is taken from the first successful per-file result (search parameters are identical across files by construction), with the primary MS run path overwritten to list every input file.
| Class Members | ||
|---|---|---|
| SearchResult | aggregate | |
| vector< SearchResult > | per_file | |
| struct OpenMS::ProSEAlgorithm::SearchContext |
Prepared per-database state shared across multiple spectrum files.
Holds the (decoy-augmented) protein database and the built FragmentIndex so that searching N spectrum files against the same FASTA pays the index build cost only once. Construct via prepareContext() and pass to the context-taking search() overload.
| Class Members | ||
|---|---|---|
| vector< FASTAEntry > | db | |
| FragmentIndex | fragment_index | |
| struct OpenMS::ProSEAlgorithm::SearchResult |
Comprehensive search result including modification analysis.
This structure contains all outputs from an open search including:
| Class Members | ||
|---|---|---|
| ExitCodes | exit_code = ExitCodes::EXECUTION_OK | |
| bool | is_open_search = false | |
| OpenSearchAnalysisResult | modification_analysis | |
| PeptideIdentificationList | peptide_ids | |
| vector< ProteinIdentification > | protein_ids | |
|
strong |
| ProSEAlgorithm | ( | ) |
|
inlineprotected |
Scalar tolerance passed to OpenSearchModificationAnalysis under asymmetric bounds. Uses the tighter of the two positive magnitudes — semantically correct for UniMod Δmass matching precision. OpenSearchModificationAnalysis internally clamps this at MAX_MOD_MAPPING_TOL_ = 0.02 Da; see spec §7 for rationale.
Zero on one side is a legal one-sided window (e.g., [0, 500] Da = "search only positive mass shifts"). In that case std::min() would collapse to 0, passing a useless zero tolerance into the mod analyzer — masked in ppm mode by the internal clamp, but genuinely broken in Da mode. Fall back to the non-zero side so the mod-matching precision reflects the configured tolerance.
|
inlineprotected |
Helper function to determine if open search should be used based on tolerance.
|
protected |
Helper: log the modification analysis summary (shared by in-memory and file-based paths)
|
protected |
Helper: log search summary statistics and per-run tolerance estimation.
|
protected |
Filter and annotate search results.
Trims per-spectrum candidate hits to the top N and converts them into PeptideIdentification objects, adding requested PSM annotations and populating protein-level search metadata.
| [in] | exp | Input MS experiment providing spectra/metadata for annotation. |
| [in,out] | annotated_hits | Per-spectrum candidate hits (trimmed to top_hits in-place). |
| [out] | protein_ids | Output container for protein-level identification and search metadata. |
| [out] | peptide_ids | Output container for spectrum-level peptide identifications (PSMs). |
| [in] | top_hits | Number of top-scoring hits to retain per spectrum (report_top_hits_). |
| [in] | modifications_fixed | Fixed modifications (by name) used during the search. |
| [in] | modifications_variable | Variable modifications (by name) used during the search. |
| [in] | peptide_missed_cleavages | Allowed missed cleavages in digestion. |
| [in] | precursor_mass_tolerance | Precursor mass tolerance value. |
| [in] | fragment_mass_tolerance | Fragment mass tolerance value. |
| [in] | precursor_mass_tolerance_unit_ppm | Precursor tolerance unit ("true"->ppm, "false"->Da). |
| [in] | fragment_mass_tolerance_unit_ppm | Fragment tolerance unit ("true"->ppm, "false"->Da). |
| [in] | precursor_min_charge | Minimum precursor charge considered. |
| [in] | precursor_max_charge | Maximum precursor charge considered. |
| [in] | enzyme | Digestion enzyme name. |
| [out] | database_name | Database file name used for the search (stored in protein_ids). |
| SearchContext prepareContext | ( | const std::vector< FASTAFile::FASTAEntry > & | fasta_db | ) | const |
Build a SearchContext (decoy-augmented database + FragmentIndex) for reuse.
Performs the database preparation and FragmentIndex construction steps so that subsequent calls to search(spectra, ctx, ...) can reuse the same index across many spectrum files. If decoy generation is enabled (parameter "decoys"), decoys are generated and shuffled into the returned context's db member exactly once here.
| [in] | fasta_db | Protein sequence database as FASTA entries. |
Thread-safety: the returned context's FragmentIndex is read-only during subsequent search() calls; concurrent search() calls reading the same SearchContext are safe (per FragmentIndex query thread-safety contract). Do not call prepareContext() concurrently on the same algorithm instance.
|
staticprotected |
filter, deisotope, decharge spectra
|
protected |
Run a fast calibration pass on a subset of spectra to estimate mass accuracy.
Scores a TIC-ranked subset of spectra against the fragment index, collects precursor and fragment mass errors from high-confidence PSMs, and returns calibrated tolerances using median + 3*MAD estimation.
| [in] | spectra | Preprocessed MS/MS spectra (subset is selected internally by TIC). |
| [in,out] | fragment_index | Pre-built fragment index for candidate lookup. |
| [in] | db | Protein database (for sequence reconstruction of candidates). |
| ExitCodes search | ( | const String & | in_spectra, |
| const String & | in_db, | ||
| std::vector< ProteinIdentification > & | prot_ids, | ||
| PeptideIdentificationList & | pep_ids | ||
| ) | const |
Search spectra in a spectrum file (mzML or Bruker .d) against a protein database using an FI-backed workflow.
Populates protein and peptide identifications, including search meta data, PSM hits, and search engine annotations. Parameters are taken from this instance (DefaultParamHandler).
| [in] | in_spectra | Input path to the spectra file (mzML or Bruker .d) containing MS/MS spectra to search. |
| [in] | in_db | Input path to the protein sequence database in FASTA format. |
| [out] | prot_ids | Output container receiving search meta data and protein-level information. |
| [out] | pep_ids | Output container receiving spectrum-level peptide identifications (PSMs). |
Side effects:
Errors:
| ExitCodes search | ( | PeakMap & | spectra, |
| const std::vector< FASTAFile::FASTAEntry > & | fasta_db, | ||
| std::vector< ProteinIdentification > & | prot_ids, | ||
| PeptideIdentificationList & | pep_ids | ||
| ) | const |
In-memory search: search spectra against a protein database without file I/O.
Same as the file-based search() but takes pre-loaded spectra and FASTA entries directly. Spectra are preprocessed in-place (filtered, deisotoped, normalized).
| [in,out] | spectra | MS/MS spectra to search (preprocessed in-place). |
| [in] | fasta_db | Protein sequence database as FASTA entries. |
| [out] | prot_ids | Output protein-level identifications. |
| [out] | pep_ids | Output spectrum-level peptide identifications (PSMs). |
Internally this is a thin wrapper around prepareContext() + the context-taking search() overload, so the FragmentIndex is rebuilt on every call. For repeated searches against the same database, prefer calling prepareContext() once and reusing the resulting SearchContext.
| ExitCodes search | ( | PeakMap & | spectra, |
| SearchContext & | ctx, | ||
| std::vector< ProteinIdentification > & | prot_ids, | ||
| PeptideIdentificationList & | pep_ids | ||
| ) | const |
In-memory search using a pre-built SearchContext.
Searches spectra against the database and FragmentIndex held in ctx. The fragment index build cost (decoy generation, peptide/fragment generation, sorting, bucketing) is paid by prepareContext() and is not repeated here, making this overload the right choice when searching many spectrum files against the same database.
| [in,out] | spectra | MS/MS spectra to search (preprocessed in-place). |
| [in,out] | ctx | Pre-built SearchContext from prepareContext(). Taken by non-const reference because the underlying FragmentIndex query API is non-const, even though the index content is not modified during the search; the db member is also handed non-const to the downstream PeptideIndexing step (which requires a non-const reference). |
| [out] | prot_ids | Output protein-level identifications. |
| [out] | pep_ids | Output spectrum-level peptide identifications (PSMs). |
| MultiFileSearchResult searchWithModificationAnalysis | ( | const std::vector< String > & | in_spectra_files, |
| const std::vector< FASTAFile::FASTAEntry > & | fasta_db, | ||
| const std::vector< String > & | output_base_names = {}, |
||
| const String & | aggregate_base_name = "" |
||
| ) | const |
Multi-file search with modification analysis (in-memory FASTA).
Builds a single SearchContext (decoy generation + FragmentIndex) from fasta_db and reuses it across all input spectrum files. Each input file produces its own SearchResult including a per-file modification analysis (TSV written if a non-empty per-file base name is provided). An additional aggregate SearchResult is computed by pooling all per-file peptide identifications and running modification analysis once on the pooled set.
| [in] | in_spectra_files | Spectrum file paths (mzML or Bruker .d). |
| [in] | fasta_db | Protein sequence database as FASTA entries. |
| [in] | output_base_names | Optional per-file base names for modification-analysis TSV outputs. Must be empty or have the same length as in_spectra_files. Empty entries skip TSV writing for that file. |
| [in] | aggregate_base_name | Optional base name for the aggregate modification-analysis TSV output. Empty disables aggregate TSV writing (the aggregate analysis is still computed). |
Errors:
output_base_names is non-empty and its size differs from in_spectra_files. | MultiFileSearchResult searchWithModificationAnalysis | ( | const std::vector< String > & | in_spectra_files, |
| const String & | in_db, | ||
| const std::vector< String > & | output_base_names = {}, |
||
| const String & | aggregate_base_name = "" |
||
| ) | const |
Multi-file search with modification analysis (FASTA file path).
Convenience overload that loads the FASTA database from in_db and delegates to the in-memory multi-file overload. The database file path is recorded in each per-file ProteinIdentification's SearchParameters (and on the aggregate result).
| SearchResult searchWithModificationAnalysis | ( | const String & | in_spectra, |
| const String & | in_db, | ||
| const String & | output_base_name = "" |
||
| ) | const |
Search with comprehensive results including modification analysis tables.
This method performs a peptide database search and additionally returns structured modification analysis results for open search mode. This is the recommended method for modification discovery workflows.
The method automatically:
| in_spectra | Input path to the spectra file (mzML or Bruker .d) containing MS/MS spectra |
| in_db | Input path to the protein sequence database in FASTA format |
| output_base_name | Optional base name for output files (TSV tables) |
Example usage:
| SearchResult searchWithModificationAnalysis | ( | PeakMap & | spectra, |
| const std::vector< FASTAFile::FASTAEntry > & | fasta_db, | ||
| const String & | output_base_name = "" |
||
| ) | const |
In-memory search with modification analysis: no file I/O required.
Same as the file-based searchWithModificationAnalysis() but takes pre-loaded data.
| [in,out] | spectra | MS/MS spectra (preprocessed in-place). |
| [in] | fasta_db | Protein sequence database as FASTA entries. |
| [in] | output_base_name | Optional base name for TSV output files. |
|
overrideprotectedvirtual |
This method is used to update extra member variables at the end of the setParameters() method.
Also call it at the end of the derived classes' copy constructor and assignment operator.
The default implementation is empty.
Reimplemented from DefaultParamHandler.
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
mutableprotected |
Most recent calibration result (valid after any search that invoked runCalibrationPass_). Stored for test observability and diagnostics. Marked mutable because it is pure diagnostic/telemetry state that doesn't affect the logical const-ness of search().
|
mutableprotected |
Scalar tolerance passed to OpenSearchModificationAnalysis on the most recent search() call. Stored for test observability: because the calibration writeback restores the tolerance members on exit (to avoid per-file state leaks in the multi-file wrapper), tests that want to verify "the mod analyzer received the calibrated value, not the user-configured one" can't just read the members post-search — they need to see what was actually passed to the analyzer. Default -1.0 (sentinel: no search has run yet).
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
protected |
|
mutableprotected |
|
protected |
|
mutableprotected |
positive magnitude
|
protected |
|
protected |
|
protected |