OpenMS
Loading...
Searching...
No Matches
SimpleSearchEngineAlgorithm Class Reference

Minimal in-memory peptide-spectrum search engine. More...

#include <OpenMS/ANALYSIS/ID/SimpleSearchEngineAlgorithm.h>

Inheritance diagram for SimpleSearchEngineAlgorithm:
[legend]
Collaboration diagram for SimpleSearchEngineAlgorithm:
[legend]

Classes

struct  AnnotatedHit_
 Compact internal record for one scored peptide candidate against one spectrum. More...
 

Public Types

enum class  ExitCodes {
  EXECUTION_OK , INPUT_FILE_EMPTY , UNEXPECTED_RESULT , UNKNOWN_ERROR ,
  ILLEGAL_PARAMETERS
}
 Outcome of search(), distinguishing recoverable input issues from execution errors. More...
 
- Public Types inherited from ProgressLogger
enum  LogType { CMD , GUI , NONE }
 Possible log types. More...
 

Public Member Functions

 SimpleSearchEngineAlgorithm ()
 Default constructor; installs the search parameters (see class docs)
 
ExitCodes search (const std::string &in_spectra, const std::string &in_db, std::vector< ProteinIdentification > &prot_ids, PeptideIdentificationList &pep_ids) const
 Search the MS2 spectra in in_spectra against the protein database in in_db.
 
- Public Member Functions inherited from DefaultParamHandler
 DefaultParamHandler (const std::string &name)
 Constructor with name that is displayed in error messages.
 
 DefaultParamHandler (const DefaultParamHandler &rhs)
 Copy constructor.
 
virtual ~DefaultParamHandler ()
 Destructor.
 
DefaultParamHandleroperator= (const DefaultParamHandler &rhs)
 Assignment operator.
 
virtual bool operator== (const DefaultParamHandler &rhs) const
 Equality operator.
 
void setParameters (const Param &param)
 Sets the parameters.
 
const ParamgetParameters () const
 Non-mutable access to the parameters.
 
const ParamgetDefaults () const
 Non-mutable access to the default parameters.
 
const std::string & getName () const
 Non-mutable access to the name.
 
void setName (const std::string &name)
 Mutable access to the name.
 
const std::vector< std::string > & getSubsections () const
 Non-mutable access to the registered subsections.
 
- Public Member Functions inherited from ProgressLogger
 ProgressLogger ()
 Constructor.
 
virtual ~ProgressLogger ()
 Destructor.
 
 ProgressLogger (const ProgressLogger &other)
 Copy constructor.
 
ProgressLoggeroperator= (const ProgressLogger &other)
 Assignment Operator.
 
void setLogType (LogType type) const
 Sets the progress log that should be used. The default type is NONE!
 
LogType getLogType () const
 Returns the type of progress log being used.
 
void setLogger (ProgressLoggerImpl *logger)
 Sets the logger to be used for progress logging.
 
void startProgress (SignedSize begin, SignedSize end, const std::string &label) const
 Initializes the progress display.
 
void setProgress (SignedSize value) const
 Sets the current progress.
 
void endProgress (UInt64 bytes_processed=0) const
 
void nextProgress () const
 increment progress by 1 (according to range begin-end)
 

Protected Member Functions

void updateMembers_ () override
 This method is used to update extra member variables at the end of the setParameters() method.
 
void postProcessHits_ (const PeakMap &exp, std::vector< std::vector< SimpleSearchEngineAlgorithm::AnnotatedHit_ > > &annotated_hits, std::vector< ProteinIdentification > &protein_ids, PeptideIdentificationList &peptide_ids, Size top_hits, const ModifiedPeptideGenerator::MapToResidueType &fixed_modifications, const ModifiedPeptideGenerator::MapToResidueType &variable_modifications, Size max_variable_mods_per_peptide, const StringList &modifications_fixed, const StringList &modifications_variable, Int peptide_missed_cleavages, double precursor_mass_tolerance, double fragment_mass_tolerance, const std::string &precursor_mass_tolerance_unit_ppm, const std::string &fragment_mass_tolerance_unit_ppm, const Int precursor_min_charge, const Int precursor_max_charge, const std::string &enzyme, const std::string &database_name) const
 Materialise top-N scored candidates per spectrum into PeptideHit / ProteinIdentification objects.
 
- Protected Member Functions inherited from DefaultParamHandler
void defaultsToParam_ ()
 Updates the parameters after the defaults have been set in the constructor.
 

Static Protected Member Functions

static void preprocessSpectra_ (PeakMap &exp, double fragment_mass_tolerance, bool fragment_mass_tolerance_unit_ppm)
 Preprocess MS2 spectra in place: filter, deisotope, decharge.
 

Protected Attributes

double precursor_mass_tolerance_
 Precursor mass tolerance (value); unit in precursor_mass_tolerance_unit_.
 
std::string precursor_mass_tolerance_unit_
 "ppm" or "Da"
 
Size precursor_min_charge_
 Minimum precursor charge considered.
 
Size precursor_max_charge_
 Maximum precursor charge considered.
 
IntList precursor_isotopes_
 Allowed precursor isotope offsets (0 = monoisotopic, 1 = +1 Da, etc.)
 
double fragment_mass_tolerance_
 Fragment mass tolerance (value); unit in fragment_mass_tolerance_unit_.
 
std::string fragment_mass_tolerance_unit_
 "ppm" or "Da"
 
StringList modifications_fixed_
 UniMod names of fixed modifications.
 
StringList modifications_variable_
 UniMod names of variable modifications.
 
Size modifications_max_variable_mods_per_peptide_
 Cap on simultaneous variable modifications per peptide.
 
std::string enzyme_
 Enzyme name as recognised by EnzymaticDigestion.
 
bool decoys_
 If true, generate target/decoy results.
 
double fdr_psm_
 q-value threshold for PSM filtering (0 = disabled); requires decoys_
 
StringList annotate_psm_
 PSM meta-value annotations to add (see annotate:PSM defaults)
 
Size peptide_min_size_
 Minimum peptide length after digestion.
 
Size peptide_max_size_
 Maximum peptide length after digestion (0 = unlimited)
 
Size peptide_missed_cleavages_
 Allowed missed cleavages in digestion.
 
EnzymaticDigestion::Specificity peptide_enzyme_specificity_ {EnzymaticDigestion::SPEC_FULL}
 full / semi / none
 
std::string peptide_motif_
 Optional regex motif; only peptides matching are considered.
 
Size report_top_hits_
 Number of top-scoring PSMs reported per spectrum.
 
- Protected Attributes inherited from DefaultParamHandler
Param param_
 Container for current parameters.
 
Param defaults_
 Container for default parameters. This member should be filled in the constructor of derived classes!
 
std::vector< std::string > subsections_
 Container for registered subsections. This member should be filled in the constructor of derived classes!
 
std::string error_name_
 Name that is displayed in error messages during the parameter checking.
 
bool check_defaults_
 If this member is set to false no checking if parameters in done;.
 
bool warn_empty_defaults_
 If this member is set to false no warning is emitted when defaults are empty;.
 
- Protected Attributes inherited from ProgressLogger
LogType type_
 
time_t last_invoke_
 
ProgressLoggerImplcurrent_logger_
 

Additional Inherited Members

- Static Public Member Functions inherited from DefaultParamHandler
static void writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const std::string &key_prefix="")
 Writes all parameters to meta values.
 
- Static Protected Attributes inherited from ProgressLogger
static int recursion_depth_
 

Detailed Description

Minimal in-memory peptide-spectrum search engine.

Searches MS2 spectra against a protein FASTA database and produces protein- and peptide-level identifications. Designed as a self-contained reference / teaching implementation; it is not intended as a feature-complete replacement for external engines such as MSGF+, Comet, Sage, or MSFragger.

The pipeline run by search():

  1. Load spectra from in_spectra and the protein database from in_db.
  2. Preprocess MS2 spectra (filtering, deisotoping, decharging) via preprocessSpectra_() using the configured fragment tolerance.
  3. In-silico digest each protein with the configured enzyme/specificity/missed cleavages, generate all combinatorial fixed/variable modification variants (capped at modifications:variable_max_per_peptide), and score them against MS2 spectra whose precursor m/z matches the candidate peptide mass within precursor:mass_tolerance (with optional 1/-1 isotope correction per precursor:isotopes).
  4. Collect the top-N scoring candidates per spectrum (report:top_hits), optionally generate target/decoy results (decoys), filter by FDR (FDR:PSM, q-value) and annotate PSMs (annotate:PSM) in postProcessHits_().

Configuration is exposed through the DefaultParamHandler base — see the defaults installed by the constructor for the full list of supported keys (precursor/fragment tolerances and units, charge range, modifications, enzyme, peptide size/motif/missed-cleavage filters, FDR threshold, top-hits, etc.).

Member Enumeration Documentation

◆ ExitCodes

enum class ExitCodes
strong

Outcome of search(), distinguishing recoverable input issues from execution errors.

Enumerator
EXECUTION_OK 

Search completed; prot_ids and pep_ids contain the result.

INPUT_FILE_EMPTY 

Spectrum input contained no usable MS2 spectra after loading/filtering.

UNEXPECTED_RESULT 

Internal post-condition violated (e.g. no candidates scored at all)

UNKNOWN_ERROR 

Caught a generic exception; details written to the log.

ILLEGAL_PARAMETERS 

Configuration is internally inconsistent or unsupported.

Constructor & Destructor Documentation

◆ SimpleSearchEngineAlgorithm()

Default constructor; installs the search parameters (see class docs)

Member Function Documentation

◆ postProcessHits_()

void postProcessHits_ ( const PeakMap exp,
std::vector< std::vector< SimpleSearchEngineAlgorithm::AnnotatedHit_ > > &  annotated_hits,
std::vector< ProteinIdentification > &  protein_ids,
PeptideIdentificationList peptide_ids,
Size  top_hits,
const ModifiedPeptideGenerator::MapToResidueType fixed_modifications,
const ModifiedPeptideGenerator::MapToResidueType variable_modifications,
Size  max_variable_mods_per_peptide,
const StringList modifications_fixed,
const StringList modifications_variable,
Int  peptide_missed_cleavages,
double  precursor_mass_tolerance,
double  fragment_mass_tolerance,
const std::string &  precursor_mass_tolerance_unit_ppm,
const std::string &  fragment_mass_tolerance_unit_ppm,
const Int  precursor_min_charge,
const Int  precursor_max_charge,
const std::string &  enzyme,
const std::string &  database_name 
) const
protected

Materialise top-N scored candidates per spectrum into PeptideHit / ProteinIdentification objects.

Converts the in-memory AnnotatedHit_ records produced by the scoring loop into first-class identification objects: re-applies the modification variant indicated by peptide_mod_index, annotates PSM meta values (per annotate:PSM), populates protein references, and stamps search-engine settings (tolerances, modifications, enzyme, etc.) onto the resulting ProteinIdentification so the output is self-describing.

Most parameters mirror the algorithm's own configuration and are passed in explicitly so this routine can also be reused outside member context.

Parameters
[in]expPreprocessed spectra used as scoring input.
[in]annotated_hitsPer-spectrum vectors of scored candidates (already top-N filtered).
[out]protein_idsProtein identifications to populate.
[out]peptide_idsPeptide-spectrum matches to populate.
[in]top_hitsMaximum number of PSMs per spectrum to materialise.
[in]fixed_modificationsResolved fixed-modification table.
[in]variable_modificationsResolved variable-modification table.
[in]max_variable_mods_per_peptideCap on simultaneous variable modifications.
[in]modifications_fixedUniMod names of fixed modifications (for the ID metadata stamp).
[in]modifications_variableUniMod names of variable modifications (for the ID metadata stamp).
[in]peptide_missed_cleavagesAllowed missed cleavages (for the ID metadata stamp).
[in]precursor_mass_tolerancePrecursor mass tolerance value.
[in]fragment_mass_toleranceFragment mass tolerance value.
[in]precursor_mass_tolerance_unit_ppm"ppm" or "Da"; recorded in the ID metadata.
[in]fragment_mass_tolerance_unit_ppm"ppm" or "Da"; recorded in the ID metadata.
[in]precursor_min_chargeMinimum precursor charge considered.
[in]precursor_max_chargeMaximum precursor charge considered.
[in]enzymeEnzyme name (for the ID metadata stamp).
[in]database_nameFASTA database path/name (for the ID metadata stamp).

◆ preprocessSpectra_()

static void preprocessSpectra_ ( PeakMap exp,
double  fragment_mass_tolerance,
bool  fragment_mass_tolerance_unit_ppm 
)
staticprotected

Preprocess MS2 spectra in place: filter, deisotope, decharge.

Applies the standard search-engine spectrum normalisation used before scoring: filtering out low-quality peaks, charge state deconvolution, and isotope-pattern deisotoping using the supplied fragment tolerance.

Parameters
[in,out]expSpectra to preprocess in place.
[in]fragment_mass_toleranceTolerance for deisotoping and decharging.
[in]fragment_mass_tolerance_unit_ppmIf true, fragment_mass_tolerance is ppm; otherwise Th.

◆ search()

ExitCodes search ( const std::string &  in_spectra,
const std::string &  in_db,
std::vector< ProteinIdentification > &  prot_ids,
PeptideIdentificationList pep_ids 
) const

Search the MS2 spectra in in_spectra against the protein database in in_db.

Spectra and database are loaded from disk; the result is written into the two output arguments. Existing contents of prot_ids and pep_ids are not cleared by this call. The current parameter set (see the class brief) controls tolerances, modifications, enzyme, FDR, etc.

Parameters
[in]in_spectraPath to the spectrum input (mzML or any format readable by FileHandler).
[in]in_dbPath to the protein FASTA database to search against.
[out]prot_idsProtein identifications produced by the search (one run per call).
[out]pep_idsPeptide-spectrum matches (PSMs) produced by the search.
Returns
Status code; see ExitCodes.

◆ updateMembers_()

void updateMembers_ ( )
overrideprotectedvirtual

This method is used to update extra member variables at the end of the setParameters() method.

Also call it at the end of the derived classes' copy constructor and assignment operator.

The default implementation is empty.

Reimplemented from DefaultParamHandler.

Member Data Documentation

◆ annotate_psm_

StringList annotate_psm_
protected

PSM meta-value annotations to add (see annotate:PSM defaults)

◆ decoys_

bool decoys_
protected

If true, generate target/decoy results.

◆ enzyme_

std::string enzyme_
protected

Enzyme name as recognised by EnzymaticDigestion.

◆ fdr_psm_

double fdr_psm_
protected

q-value threshold for PSM filtering (0 = disabled); requires decoys_

◆ fragment_mass_tolerance_

double fragment_mass_tolerance_
protected

Fragment mass tolerance (value); unit in fragment_mass_tolerance_unit_.

◆ fragment_mass_tolerance_unit_

std::string fragment_mass_tolerance_unit_
protected

"ppm" or "Da"

◆ modifications_fixed_

StringList modifications_fixed_
protected

UniMod names of fixed modifications.

◆ modifications_max_variable_mods_per_peptide_

Size modifications_max_variable_mods_per_peptide_
protected

Cap on simultaneous variable modifications per peptide.

◆ modifications_variable_

StringList modifications_variable_
protected

UniMod names of variable modifications.

◆ peptide_enzyme_specificity_

EnzymaticDigestion::Specificity peptide_enzyme_specificity_ {EnzymaticDigestion::SPEC_FULL}
protected

full / semi / none

◆ peptide_max_size_

Size peptide_max_size_
protected

Maximum peptide length after digestion (0 = unlimited)

◆ peptide_min_size_

Size peptide_min_size_
protected

Minimum peptide length after digestion.

◆ peptide_missed_cleavages_

Size peptide_missed_cleavages_
protected

Allowed missed cleavages in digestion.

◆ peptide_motif_

std::string peptide_motif_
protected

Optional regex motif; only peptides matching are considered.

◆ precursor_isotopes_

IntList precursor_isotopes_
protected

Allowed precursor isotope offsets (0 = monoisotopic, 1 = +1 Da, etc.)

◆ precursor_mass_tolerance_

double precursor_mass_tolerance_
protected

Precursor mass tolerance (value); unit in precursor_mass_tolerance_unit_.

◆ precursor_mass_tolerance_unit_

std::string precursor_mass_tolerance_unit_
protected

"ppm" or "Da"

◆ precursor_max_charge_

Size precursor_max_charge_
protected

Maximum precursor charge considered.

◆ precursor_min_charge_

Size precursor_min_charge_
protected

Minimum precursor charge considered.

◆ report_top_hits_

Size report_top_hits_
protected

Number of top-scoring PSMs reported per spectrum.