OpenMS
2.5.0
|
Collection of functions for filtering peptide and protein identifications. More...
#include <OpenMS/FILTERING/ID/IDFilter.h>
Classes | |
struct | DigestionFilter |
Is peptide evidence digestion product of some protein. More... | |
struct | GetMatchingItems |
Builds a map index of data that have a String index to find matches and return the objects. More... | |
struct | HasDecoyAnnotation |
Is this a decoy hit? More... | |
struct | HasGoodScore |
Is the score of this hit at least as good as the given value? More... | |
struct | HasMatchingAccession |
Given a list of protein accessions, do any occur in the annotation(s) of this hit? More... | |
struct | HasMatchingAccessionUnordered |
Given a list of protein accessions, do any occur in the annotation(s) of this hit? More... | |
struct | HasMaxMetaValue |
Does a meta value of this hit have at most the given value? More... | |
struct | HasMaxRank |
Is the rank of this hit below or at the given cut-off? More... | |
struct | HasMetaValue |
Is a meta value with given key and value set on this hit? More... | |
struct | HasNoHits |
Is the list of hits of this peptide/protein ID empty? More... | |
class | PeptideDigestionFilter |
Filter Peptide Hit by its digestion product. More... | |
Public Types | |
typedef std::map< Int, PeptideHit * > | ChargeToPepHitP |
Typedefs. More... | |
typedef std::unordered_map< std::string, ChargeToPepHitP > | SequenceToChargeToPepHitP |
typedef std::map< std::string, SequenceToChargeToPepHitP > | RunToSequenceToChargeToPepHitP |
Public Member Functions | |
IDFilter () | |
Constructor. More... | |
virtual | ~IDFilter () |
Destructor. More... | |
Static Public Member Functions | |
Higher-order filter functions | |
Functions for filtering a container based on a predicate | |
template<class Container , class Predicate > | |
static void | removeMatchingItems (Container &items, const Predicate &pred) |
Remove items that satisfy a condition from a container (e.g. vector) More... | |
template<class Container , class Predicate > | |
static void | keepMatchingItems (Container &items, const Predicate &pred) |
Keep items that satisfy a condition in a container (e.g. vector), removing all others. More... | |
template<class IDContainer , class Predicate > | |
static void | removeMatchingItemsUnroll (IDContainer &items, const Predicate &pred) |
Remove Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs) More... | |
template<class IDContainer , class Predicate > | |
static void | keepMatchingItemsUnroll (IDContainer &items, const Predicate &pred) |
Keep Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs) More... | |
template<class MapType , class Predicate > | |
static void | keepMatchingPeptideHits (MapType &prot_and_pep_ids, Predicate &pred) |
template<class MapType , class Predicate > | |
static void | removeMatchingPeptideHits (MapType &prot_and_pep_ids, Predicate &pred) |
Helper functions | |
template<class IdentificationType > | |
static Size | countHits (const std::vector< IdentificationType > &ids) |
Returns the total number of peptide/protein hits in a vector of peptide/protein identifications. More... | |
template<class IdentificationType > | |
static bool | getBestHit (const std::vector< IdentificationType > &identifications, bool assume_sorted, typename IdentificationType::HitType &best_hit) |
Finds the best-scoring hit in a vector of peptide or protein identifications. More... | |
static void | extractPeptideSequences (const std::vector< PeptideIdentification > &peptides, std::set< String > &sequences, bool ignore_mods=false) |
Extracts all unique peptide sequences from a list of peptide IDs. More... | |
template<class EvidenceFilter > | |
static void | FilterPeptideEvidences (EvidenceFilter &filter, std::vector< PeptideIdentification > &peptides) |
remove peptide evidences based on a filter More... | |
Clean-up functions | |
template<class IdentificationType > | |
static void | updateHitRanks (std::vector< IdentificationType > &ids) |
Updates the hit ranks on all peptide or protein IDs. More... | |
static void | removeUnreferencedProteins (std::vector< ProteinIdentification > &proteins, const std::vector< PeptideIdentification > &peptides) |
Removes protein hits from proteins that are not referenced by a peptide in peptides . More... | |
static void | updateProteinReferences (std::vector< PeptideIdentification > &peptides, const std::vector< ProteinIdentification > &proteins, bool remove_peptides_without_reference=false) |
Removes references to missing proteins. More... | |
static void | updateProteinReferences (ConsensusMap &cmap, bool remove_peptides_without_reference=false) |
Removes references to missing proteins. More... | |
static bool | updateProteinGroups (std::vector< ProteinIdentification::ProteinGroup > &groups, const std::vector< ProteinHit > &hits) |
Update protein groups after protein hits were filtered. More... | |
Filter functions for peptide or protein IDs | |
template<class IdentificationType > | |
static void | removeEmptyIdentifications (std::vector< IdentificationType > &ids) |
Removes peptide or protein identifications that have no hits in them. More... | |
template<class IdentificationType > | |
static void | filterHitsByScore (std::vector< IdentificationType > &ids, double threshold_score) |
Filters peptide or protein identifications according to the score of the hits. More... | |
template<class IdentificationType > | |
static void | filterHitsByScore (IdentificationType &id, double threshold_score) |
Filters peptide or protein identifications according to the score of the hits. More... | |
template<class IdentificationType > | |
static void | keepNBestHits (std::vector< IdentificationType > &ids, Size n) |
Filters peptide or protein identifications according to the score of the hits, keeping the n best hits per ID. More... | |
template<class IdentificationType > | |
static void | filterHitsByRank (std::vector< IdentificationType > &ids, Size min_rank, Size max_rank) |
Filters peptide or protein identifications according to the ranking of the hits. More... | |
template<class IdentificationType > | |
static void | removeDecoyHits (std::vector< IdentificationType > &ids) |
Removes hits annotated as decoys from peptide or protein identifications. More... | |
template<class IdentificationType > | |
static void | removeHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > accessions) |
Filters peptide or protein identifications according to the given proteins (negative). More... | |
template<class IdentificationType > | |
static void | keepHitsMatchingProteins (std::vector< IdentificationType > &ids, const std::set< String > &accessions) |
Filters peptide or protein identifications according to the given proteins (positive). More... | |
Filter functions for peptide IDs only | |
static void | keepBestPeptideHits (std::vector< PeptideIdentification > &peptides, bool strict=false) |
Filters peptide identifications keeping only the single best-scoring hit per ID. More... | |
static void | filterPeptidesByLength (std::vector< PeptideIdentification > &peptides, Size min_length, Size max_length=UINT_MAX) |
Filters peptide identifications according to peptide sequence length. More... | |
static void | filterPeptidesByCharge (std::vector< PeptideIdentification > &peptides, Int min_charge, Int max_charge) |
Filters peptide identifications according to charge state. More... | |
static void | filterPeptidesByRT (std::vector< PeptideIdentification > &peptides, double min_rt, double max_rt) |
Filters peptide identifications by precursor RT, keeping only IDs in the given range. More... | |
static void | filterPeptidesByMZ (std::vector< PeptideIdentification > &peptides, double min_mz, double max_mz) |
Filters peptide identifications by precursor m/z, keeping only IDs in the given range. More... | |
static void | filterPeptidesByMZError (std::vector< PeptideIdentification > &peptides, double mass_error, bool unit_ppm) |
Filter peptide identifications according to mass deviation. More... | |
template<class Filter > | |
static void | filterPeptideEvidences (Filter &filter, std::vector< PeptideIdentification > &peptides) |
Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence. More... | |
static void | filterPeptidesByRTPredictPValue (std::vector< PeptideIdentification > &peptides, const String &metavalue_key, double threshold=0.05) |
Filters peptide identifications according to p-values from RTPredict. More... | |
static void | removePeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications) |
Removes all peptide hits that have at least one of the given modifications. More... | |
static void | keepPeptidesWithMatchingModifications (std::vector< PeptideIdentification > &peptides, const std::set< String > &modifications) |
Keeps only peptide hits that have at least one of the given modifications. More... | |
static void | removePeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &bad_peptides, bool ignore_mods=false) |
Removes all peptide hits with a sequence that matches one in bad_peptides . More... | |
static void | keepPeptidesWithMatchingSequences (std::vector< PeptideIdentification > &peptides, const std::vector< PeptideIdentification > &good_peptides, bool ignore_mods=false) |
Removes all peptide hits with a sequence that does not match one in good_peptides . More... | |
static void | keepUniquePeptidesPerProtein (std::vector< PeptideIdentification > &peptides) |
Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer) More... | |
static void | removeDuplicatePeptideHits (std::vector< PeptideIdentification > &peptides, bool seq_only=false) |
Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID). More... | |
Filter functions for MS/MS experiments | |
static void | filterHitsByScore (PeakMap &experiment, double peptide_threshold_score, double protein_threshold_score) |
Filters an MS/MS experiment according to score thresholds. More... | |
static void | keepNBestHits (PeakMap &experiment, Size n) |
Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum. More... | |
template<class MapType > | |
static void | keepNBestPeptideHits (MapType &map, Size n) |
Filters a Consensus/FeatureMap by keeping the N best peptide hits for every spectrum. More... | |
template<class MapType > | |
static void | removeEmptyIdentifications (MapType &prot_and_pep_ids) |
static void | keepBestPerPeptide (std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence. More... | |
static void | keepBestPerPeptidePerRun (std::vector< ProteinIdentification > &prot_ids, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
template<class MapType > | |
static void | keepBestPerPeptidePerRun (MapType &prot_and_pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
static void | annotateBestPerPeptidePerRun (const std::vector< ProteinIdentification > &prot_ids, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
static void | annotateBestPerPeptidePerRunWithData (RunToSequenceToChargeToPepHitP &best_peps_per_run, std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
static void | annotateBestPerPeptide (std::vector< PeptideIdentification > &pep_ids, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
static void | annotateBestPerPeptideWithData (SequenceToChargeToPepHitP &best_pep, PeptideIdentification &pep, bool ignore_mods, bool ignore_charges, Size nr_best_spectrum) |
static void | keepHitsMatchingProteins (PeakMap &experiment, const std::vector< FASTAFile::FASTAEntry > &proteins) |
Filters an MS/MS experiment according to the given proteins. More... | |
Filter functions for class IdentificationData | |
static void | keepBestMatchPerQuery (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref) |
static void | filterQueryMatchesByScore (IdentificationData &id_data, IdentificationData::ScoreTypeRef score_ref, double cutoff) |
static void | removeDecoys (IdentificationData &id_data) |
Collection of functions for filtering peptide and protein identifications.
This class provides functions for filtering collections of peptide or protein identifications according to various criteria. It also contains helper functions and classes (functors that implement predicates) that are used in this context.
The filter functions modify their inputs, rather than creating filtered copies.
Most filters work on the hit level, i.e. they remove peptide or protein hits from peptide or protein identifications (IDs). A few filters work on the ID level instead, i.e. they remove peptide or protein IDs from vectors thereof. Independent of this, the inputs for all filter functions are vectors of IDs, because the data most often comes in this form. This design also allows many helper objects to be set up only once per vector, rather than once per ID.
The filter functions for vectors of peptide/protein IDs do not include clean-up steps (e.g. removal of IDs without hits, reassignment of hit ranks, ...). They only carry out their specific filtering operations. This is so filters can be chained without having to repeat clean-up operations. The group of clean-up functions provides helpers that are useful to ensure data integrity after filters have been applied, but it is up to the individual developer to use them when necessary.
The filter functions for MS/MS experiments do include clean-up steps, because they filter peptide and protein IDs in conjunction and potential contradictions between the two must be eliminated.
typedef std::map<Int, PeptideHit*> ChargeToPepHitP |
Typedefs.
typedef std::map<std::string, SequenceToChargeToPepHitP> RunToSequenceToChargeToPepHitP |
typedef std::unordered_map<std::string, ChargeToPepHitP> SequenceToChargeToPepHitP |
IDFilter | ( | ) |
Constructor.
|
virtual |
Destructor.
|
inlinestatic |
Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. Does not check Run information and just goes over all Peptide IDs
|
inlinestatic |
Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering.
|
inlinestatic |
Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. To be used when a RunToSequenceToChargeToPepHitP map is already available
|
inlinestatic |
Annotates PeptideHits from PeptideIdentification if it is the best peptide hit for its peptide sequence Adds metavalue "bestForItsPeps" which can be used for additional filtering. Does not check Run information and just goes over all Peptide IDs To be used when a SequenceToChargeToPepHitP map is already available
References PeptideHit::getCharge(), PeptideIdentification::getHits(), PeptideHit::getScore(), PeptideHit::getSequence(), PeptideIdentification::isHigherScoreBetter(), MetaInfoInterface::setMetaValue(), PeptideIdentification::sort(), AASequence::toString(), and AASequence::toUnmodifiedString().
|
inlinestatic |
Returns the total number of peptide/protein hits in a vector of peptide/protein identifications.
|
static |
Extracts all unique peptide sequences from a list of peptide IDs.
peptides | Input |
sequences | Output |
ignore_mods | Extract sequences without modifications? |
|
inlinestatic |
Filters peptide or protein identifications according to the ranking of the hits.
The hits between min_rank
and max_rank
(both inclusive) in each ID are kept. Counting starts at 1, i.e. the best (highest/lowest scoring) hit has rank 1. The ranks are (re-)computed before filtering. max_rank
is ignored if it is smaller than min_rank
.
Note that there may be several hits with the same rank in a peptide or protein ID (if the scores are the same).
This method is useful if a range of higher hits is needed for decoy fairness analysis.
|
inlinestatic |
Filters peptide or protein identifications according to the score of the hits.
Only peptide/protein hits with a score at least as good as threshold_score
are kept. Score orientation (are higher scores better?) is taken into account.
|
inlinestatic |
Filters an MS/MS experiment according to score thresholds.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
Filters peptide or protein identifications according to the score of the hits.
Only peptide/protein hits with a score at least as good as threshold_score
are kept. Score orientation (are higher scores better?) is taken into account.
Referenced by UTILProteomicsLFQ::main_().
|
inlinestatic |
remove peptide evidences based on a filter
filter | filter function that overloads ()(PeptideEvidence&) operator |
peptides | a collection of peptide evidences |
|
static |
Digest a collection of proteins and filter PeptideEvidences based on specificity PeptideEvidences of peptides are removed if the digest of a protein did not produce the peptide sequence.
filter | filter function on PeptideEvidence level |
peptides | PeptideIdentification that will be scanned and filtered |
|
static |
Filters peptide identifications according to charge state.
Only peptide hits with a charge state between min_charge
and max_charge
(both inclusive) are kept. max_charge
is ignored if it is smaller than min_charge
.
|
static |
Filters peptide identifications according to peptide sequence length.
Only peptide hits with a sequence length between min_length
and max_length
(both inclusive) are kept. max_length
is ignored if it is smaller than min_length
.
|
static |
Filters peptide identifications by precursor m/z, keeping only IDs in the given range.
|
static |
Filter peptide identifications according to mass deviation.
Only peptide hits with a low mass deviation (between theoretical peptide mass and precursor mass) are kept.
identification | Input/output |
mass_error | Threshold for the mass deviation |
unit_ppm | Is mass_error given in PPM? |
|
static |
Filters peptide identifications by precursor RT, keeping only IDs in the given range.
|
static |
Filters peptide identifications according to p-values from RTPredict.
Filters the peptide hits by the probability (p-value) of a correct peptide identification having a deviation between observed and predicted RT equal to or greater than allowed.
peptides | Input/output |
metavalue_key | Name of the meta value that holds the p-value: "predicted_RT_p_value" or "predicted_RT_p_value_first_dim" |
threshold | P-value threshold |
|
static |
Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().
|
inlinestatic |
Finds the best-scoring hit in a vector of peptide or protein identifications.
If there are several hits with the best score, the first one is taken.
identifications | Vector of peptide or protein IDs, each containing one or more (peptide/protein) hits |
assume_sorted | Are hits sorted by score (best score first) already? This allows for faster query, since only the first hit needs to be looked at |
@except Exception::InvalidValue if the IDs have different score types (i.e. scores cannot be compared)
|
static |
|
static |
Filters peptide identifications keeping only the single best-scoring hit per ID.
peptides | Input/output |
strict | If set, keep the best hit only if its score is unique - i.e. ties are not allowed. (Otherwise all hits with the best score is kept.) |
|
inlinestatic |
Filters PeptideHits from PeptideIdentification by keeping only the best peptide hits for every peptide sequence.
|
inlinestatic |
|
inlinestatic |
Filters an MS/MS experiment according to the given proteins.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
Filters peptide or protein identifications according to the given proteins (positive).
Hits with no matching protein accession in accessions
are removed.
|
inlinestatic |
Keep items that satisfy a condition in a container (e.g. vector), removing all others.
|
inlinestatic |
Keep Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs)
|
inlinestatic |
Filters an MS/MS experiment by keeping the N best peptide hits for every spectrum.
References MSExperiment::begin(), MSExperiment::end(), and ExperimentalSettings::getProteinIdentifications().
|
inlinestatic |
Filters peptide or protein identifications according to the score of the hits, keeping the n
best hits per ID.
The score orientation (are higher scores better?) is taken into account.
Filters a Consensus/FeatureMap by keeping the N best peptide hits for every spectrum.
|
static |
Keeps only peptide hits that have at least one of the given modifications.
|
static |
Removes all peptide hits with a sequence that does not match one in good_peptides
.
If ignore_mods
is set, unmodified sequences are generated and compared to the given ones.
|
static |
Removes all peptides that are not annotated as unique for a protein (by PeptideIndexer)
Referenced by UTILProteomicsLFQ::main_().
|
inlinestatic |
Removes hits annotated as decoys from peptide or protein identifications.
Checks for meta values named "target_decoy" and "isDecoy", and removes protein/peptide hits if the values are "decoy" and "true", respectively.
Referenced by UTILProteomicsLFQ::quantifyFraction_().
|
static |
Referenced by NucleicAcidSearchEngine::calculateAndFilterFDR_().
|
static |
Removes duplicate peptide hits from each peptide identification, keeping only unique hits (per ID).
By default, hits are considered duplicated if they compare as equal using PeptideHit::operator==. However, if seq_only
is set, only the sequences (incl. modifications) are compared. In both cases, the first occurrence of each hit in a peptide ID is kept, later ones are removed.
|
inlinestatic |
|
inlinestatic |
Removes peptide or protein identifications that have no hits in them.
Referenced by UTILProteomicsLFQ::quantifyFraction_().
|
inlinestatic |
Filters peptide or protein identifications according to the given proteins (negative).
Hits with a matching protein accession in accessions
are removed.
|
inlinestatic |
Remove items that satisfy a condition from a container (e.g. vector)
|
inlinestatic |
Remove Hit items that satisfy a condition in one of our ID containers (e.g. vector of Peptide or ProteinIDs)
|
inlinestatic |
|
static |
Removes all peptide hits that have at least one of the given modifications.
|
static |
Removes all peptide hits with a sequence that matches one in bad_peptides
.
If ignore_mods
is set, unmodified sequences are generated and compared to the given ones.
|
static |
Removes protein hits from proteins
that are not referenced by a peptide in peptides
.
Referenced by Epifany::main_(), UTILProteomicsLFQ::main_(), and UTILProteomicsLFQ::quantifyFraction_().
|
inlinestatic |
Updates the hit ranks on all peptide or protein IDs.
|
static |
Update protein groups after protein hits were filtered.
groups | Input/output protein groups |
hits | Available protein hits (all others are removed from the groups) |
Referenced by Epifany::main_(), and UTILProteomicsLFQ::main_().
|
static |
Removes references to missing proteins.
Only PeptideEvidence entries that reference protein hits in proteins
are kept in the peptide hits.
If remove_peptides_without_reference
is set, peptide hits without any remaining protein reference are removed.
|
static |
Removes references to missing proteins.
Only PeptideEvidence entries that reference protein hits in proteins
are kept in the peptide hits.
If remove_peptides_without_reference
is set, peptide hits without any remaining protein reference are removed.
Referenced by UTILProteomicsLFQ::main_().