![]() |
OpenMS
|
Minimal in-memory peptide-spectrum search engine. More...
#include <OpenMS/ANALYSIS/ID/SimpleSearchEngineAlgorithm.h>
Classes | |
| struct | AnnotatedHit_ |
| Compact internal record for one scored peptide candidate against one spectrum. More... | |
Public Types | |
| enum class | ExitCodes { EXECUTION_OK , INPUT_FILE_EMPTY , UNEXPECTED_RESULT , UNKNOWN_ERROR , ILLEGAL_PARAMETERS } |
| Outcome of search(), distinguishing recoverable input issues from execution errors. More... | |
Public Types inherited from ProgressLogger | |
| enum | LogType { CMD , GUI , NONE } |
| Possible log types. More... | |
Public Member Functions | |
| SimpleSearchEngineAlgorithm () | |
| Default constructor; installs the search parameters (see class docs) | |
| ExitCodes | search (const std::string &in_spectra, const std::string &in_db, std::vector< ProteinIdentification > &prot_ids, PeptideIdentificationList &pep_ids) const |
Search the MS2 spectra in in_spectra against the protein database in in_db. | |
Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const std::string &name) | |
| Constructor with name that is displayed in error messages. | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor. | |
| virtual | ~DefaultParamHandler () |
| Destructor. | |
| DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) |
| Assignment operator. | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const |
| Equality operator. | |
| void | setParameters (const Param ¶m) |
| Sets the parameters. | |
| const Param & | getParameters () const |
| Non-mutable access to the parameters. | |
| const Param & | getDefaults () const |
| Non-mutable access to the default parameters. | |
| const std::string & | getName () const |
| Non-mutable access to the name. | |
| void | setName (const std::string &name) |
| Mutable access to the name. | |
| const std::vector< std::string > & | getSubsections () const |
| Non-mutable access to the registered subsections. | |
Public Member Functions inherited from ProgressLogger | |
| ProgressLogger () | |
| Constructor. | |
| virtual | ~ProgressLogger () |
| Destructor. | |
| ProgressLogger (const ProgressLogger &other) | |
| Copy constructor. | |
| ProgressLogger & | operator= (const ProgressLogger &other) |
| Assignment Operator. | |
| void | setLogType (LogType type) const |
| Sets the progress log that should be used. The default type is NONE! | |
| LogType | getLogType () const |
| Returns the type of progress log being used. | |
| void | setLogger (ProgressLoggerImpl *logger) |
| Sets the logger to be used for progress logging. | |
| void | startProgress (SignedSize begin, SignedSize end, const std::string &label) const |
| Initializes the progress display. | |
| void | setProgress (SignedSize value) const |
| Sets the current progress. | |
| void | endProgress (UInt64 bytes_processed=0) const |
| void | nextProgress () const |
| increment progress by 1 (according to range begin-end) | |
Protected Member Functions | |
| void | updateMembers_ () override |
| This method is used to update extra member variables at the end of the setParameters() method. | |
| void | postProcessHits_ (const PeakMap &exp, std::vector< std::vector< SimpleSearchEngineAlgorithm::AnnotatedHit_ > > &annotated_hits, std::vector< ProteinIdentification > &protein_ids, PeptideIdentificationList &peptide_ids, Size top_hits, const ModifiedPeptideGenerator::MapToResidueType &fixed_modifications, const ModifiedPeptideGenerator::MapToResidueType &variable_modifications, Size max_variable_mods_per_peptide, const StringList &modifications_fixed, const StringList &modifications_variable, Int peptide_missed_cleavages, double precursor_mass_tolerance, double fragment_mass_tolerance, const std::string &precursor_mass_tolerance_unit_ppm, const std::string &fragment_mass_tolerance_unit_ppm, const Int precursor_min_charge, const Int precursor_max_charge, const std::string &enzyme, const std::string &database_name) const |
| Materialise top-N scored candidates per spectrum into PeptideHit / ProteinIdentification objects. | |
Protected Member Functions inherited from DefaultParamHandler | |
| void | defaultsToParam_ () |
| Updates the parameters after the defaults have been set in the constructor. | |
Static Protected Member Functions | |
| static void | preprocessSpectra_ (PeakMap &exp, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm) |
| Preprocess MS2 spectra in place: filter, deisotope, decharge. | |
Protected Attributes | |
| double | precursor_mass_tolerance_ |
Precursor mass tolerance (value); unit in precursor_mass_tolerance_unit_. | |
| std::string | precursor_mass_tolerance_unit_ |
| "ppm" or "Da" | |
| Size | precursor_min_charge_ |
| Minimum precursor charge considered. | |
| Size | precursor_max_charge_ |
| Maximum precursor charge considered. | |
| IntList | precursor_isotopes_ |
| Allowed precursor isotope offsets (0 = monoisotopic, 1 = +1 Da, etc.) | |
| double | fragment_mass_tolerance_ |
Fragment mass tolerance (value); unit in fragment_mass_tolerance_unit_. | |
| std::string | fragment_mass_tolerance_unit_ |
| "ppm" or "Da" | |
| StringList | modifications_fixed_ |
| UniMod names of fixed modifications. | |
| StringList | modifications_variable_ |
| UniMod names of variable modifications. | |
| Size | modifications_max_variable_mods_per_peptide_ |
| Cap on simultaneous variable modifications per peptide. | |
| std::string | enzyme_ |
| Enzyme name as recognised by EnzymaticDigestion. | |
| bool | decoys_ |
| If true, generate target/decoy results. | |
| double | fdr_psm_ |
q-value threshold for PSM filtering (0 = disabled); requires decoys_ | |
| StringList | annotate_psm_ |
PSM meta-value annotations to add (see annotate:PSM defaults) | |
| Size | peptide_min_size_ |
| Minimum peptide length after digestion. | |
| Size | peptide_max_size_ |
| Maximum peptide length after digestion (0 = unlimited) | |
| Size | peptide_missed_cleavages_ |
| Allowed missed cleavages in digestion. | |
| EnzymaticDigestion::Specificity | peptide_enzyme_specificity_ {EnzymaticDigestion::SPEC_FULL} |
| full / semi / none | |
| std::string | peptide_motif_ |
| Optional regex motif; only peptides matching are considered. | |
| Size | report_top_hits_ |
| Number of top-scoring PSMs reported per spectrum. | |
Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ |
| Container for current parameters. | |
| Param | defaults_ |
| Container for default parameters. This member should be filled in the constructor of derived classes! | |
| std::vector< std::string > | subsections_ |
| Container for registered subsections. This member should be filled in the constructor of derived classes! | |
| std::string | error_name_ |
| Name that is displayed in error messages during the parameter checking. | |
| bool | check_defaults_ |
| If this member is set to false no checking if parameters in done;. | |
| bool | warn_empty_defaults_ |
| If this member is set to false no warning is emitted when defaults are empty;. | |
Protected Attributes inherited from ProgressLogger | |
| LogType | type_ |
| time_t | last_invoke_ |
| ProgressLoggerImpl * | current_logger_ |
Additional Inherited Members | |
Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="") |
| Writes all parameters to meta values. | |
Static Protected Attributes inherited from ProgressLogger | |
| static int | recursion_depth_ |
Minimal in-memory peptide-spectrum search engine.
Searches MS2 spectra against a protein FASTA database and produces protein- and peptide-level identifications. Designed as a self-contained reference / teaching implementation; it is not intended as a feature-complete replacement for external engines such as MSGF+, Comet, Sage, or MSFragger.
The pipeline run by search():
in_spectra and the protein database from in_db.modifications:variable_max_per_peptide), and score them against MS2 spectra whose precursor m/z matches the candidate peptide mass within precursor:mass_tolerance (with optional 1/-1 isotope correction per precursor:isotopes).report:top_hits), optionally generate target/decoy results (decoys), filter by FDR (FDR:PSM, q-value) and annotate PSMs (annotate:PSM) in postProcessHits_().Configuration is exposed through the DefaultParamHandler base — see the defaults installed by the constructor for the full list of supported keys (precursor/fragment tolerances and units, charge range, modifications, enzyme, peptide size/motif/missed-cleavage filters, FDR threshold, top-hits, etc.).
|
strong |
Outcome of search(), distinguishing recoverable input issues from execution errors.
| Enumerator | |
|---|---|
| EXECUTION_OK | Search completed; |
| INPUT_FILE_EMPTY | Spectrum input contained no usable MS2 spectra after loading/filtering. |
| UNEXPECTED_RESULT | Internal post-condition violated (e.g. no candidates scored at all) |
| UNKNOWN_ERROR | Caught a generic exception; details written to the log. |
| ILLEGAL_PARAMETERS | Configuration is internally inconsistent or unsupported. |
Default constructor; installs the search parameters (see class docs)
|
protected |
Materialise top-N scored candidates per spectrum into PeptideHit / ProteinIdentification objects.
Converts the in-memory AnnotatedHit_ records produced by the scoring loop into first-class identification objects: re-applies the modification variant indicated by peptide_mod_index, annotates PSM meta values (per annotate:PSM), populates protein references, and stamps search-engine settings (tolerances, modifications, enzyme, etc.) onto the resulting ProteinIdentification so the output is self-describing.
Most parameters mirror the algorithm's own configuration and are passed in explicitly so this routine can also be reused outside member context.
| [in] | exp | Preprocessed spectra used as scoring input. |
| [in] | annotated_hits | Per-spectrum vectors of scored candidates (already top-N filtered). |
| [out] | protein_ids | Protein identifications to populate. |
| [out] | peptide_ids | Peptide-spectrum matches to populate. |
| [in] | top_hits | Maximum number of PSMs per spectrum to materialise. |
| [in] | fixed_modifications | Resolved fixed-modification table. |
| [in] | variable_modifications | Resolved variable-modification table. |
| [in] | max_variable_mods_per_peptide | Cap on simultaneous variable modifications. |
| [in] | modifications_fixed | UniMod names of fixed modifications (for the ID metadata stamp). |
| [in] | modifications_variable | UniMod names of variable modifications (for the ID metadata stamp). |
| [in] | peptide_missed_cleavages | Allowed missed cleavages (for the ID metadata stamp). |
| [in] | precursor_mass_tolerance | Precursor mass tolerance value. |
| [in] | fragment_mass_tolerance | Fragment mass tolerance value. |
| [in] | precursor_mass_tolerance_unit_ppm | "ppm" or "Da"; recorded in the ID metadata. |
| [in] | fragment_mass_tolerance_unit_ppm | "ppm" or "Da"; recorded in the ID metadata. |
| [in] | precursor_min_charge | Minimum precursor charge considered. |
| [in] | precursor_max_charge | Maximum precursor charge considered. |
| [in] | enzyme | Enzyme name (for the ID metadata stamp). |
| [in] | database_name | FASTA database path/name (for the ID metadata stamp). |
|
staticprotected |
Preprocess MS2 spectra in place: filter, deisotope, decharge.
Applies the standard search-engine spectrum normalisation used before scoring: filtering out low-quality peaks, charge state deconvolution, and isotope-pattern deisotoping using the supplied fragment tolerance.
| [in,out] | exp | Spectra to preprocess in place. |
| [in] | fragment_mass_tolerance | Tolerance for deisotoping and decharging. |
| [in] | fragment_mass_tolerance_unit_ppm | If true, fragment_mass_tolerance is ppm; otherwise Th. |
| ExitCodes search | ( | const std::string & | in_spectra, |
| const std::string & | in_db, | ||
| std::vector< ProteinIdentification > & | prot_ids, | ||
| PeptideIdentificationList & | pep_ids | ||
| ) | const |
Search the MS2 spectra in in_spectra against the protein database in in_db.
Spectra and database are loaded from disk; the result is written into the two output arguments. Existing contents of prot_ids and pep_ids are not cleared by this call. The current parameter set (see the class brief) controls tolerances, modifications, enzyme, FDR, etc.
| [in] | in_spectra | Path to the spectrum input (mzML or any format readable by FileHandler). |
| [in] | in_db | Path to the protein FASTA database to search against. |
| [out] | prot_ids | Protein identifications produced by the search (one run per call). |
| [out] | pep_ids | Peptide-spectrum matches (PSMs) produced by the search. |
|
overrideprotectedvirtual |
This method is used to update extra member variables at the end of the setParameters() method.
Also call it at the end of the derived classes' copy constructor and assignment operator.
The default implementation is empty.
Reimplemented from DefaultParamHandler.
|
protected |
PSM meta-value annotations to add (see annotate:PSM defaults)
|
protected |
If true, generate target/decoy results.
|
protected |
Enzyme name as recognised by EnzymaticDigestion.
|
protected |
q-value threshold for PSM filtering (0 = disabled); requires decoys_
|
protected |
Fragment mass tolerance (value); unit in fragment_mass_tolerance_unit_.
|
protected |
"ppm" or "Da"
|
protected |
UniMod names of fixed modifications.
|
protected |
Cap on simultaneous variable modifications per peptide.
|
protected |
UniMod names of variable modifications.
|
protected |
full / semi / none
|
protected |
Maximum peptide length after digestion (0 = unlimited)
|
protected |
Minimum peptide length after digestion.
|
protected |
Allowed missed cleavages in digestion.
|
protected |
Optional regex motif; only peptides matching are considered.
|
protected |
Allowed precursor isotope offsets (0 = monoisotopic, 1 = +1 Da, etc.)
|
protected |
Precursor mass tolerance (value); unit in precursor_mass_tolerance_unit_.
|
protected |
"ppm" or "Da"
|
protected |
Maximum precursor charge considered.
|
protected |
Minimum precursor charge considered.
|
protected |
Number of top-scoring PSMs reported per spectrum.