OpenMS
PeptideProteinResolution Class Reference

Resolves shared peptides based on protein scores. More...

#include <OpenMS/ANALYSIS/ID/PeptideProteinResolution.h>

Collaboration diagram for PeptideProteinResolution:
[legend]

Public Member Functions

 PeptideProteinResolution (bool statistics=false)
 
void buildGraph (ProteinIdentification &protein, const std::vector< PeptideIdentification > &peptides, bool skip_sort=false)
 
void resolveGraph (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides)
 
ConnectedComponent findConnectedComponent (Size &root_prot_grp)
 
void resolveConnectedComponent (ConnectedComponent &conn_comp, ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides)
 

Static Public Member Functions

static void resolve (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides, bool resolve_ties, bool targets_first)
 
static void run (std::vector< ProteinIdentification > &inferred_protein_id, std::vector< PeptideIdentification > &inferred_peptide_ids)
 

Private Types

typedef std::map< Size, std::set< Size > > IndexMap_
 

Private Attributes

IndexMap_ indist_prot_grp_to_pep_
 if the protein group at index i contains a target (first) and/or decoy (second) More...
 
IndexMap_ pep_to_indist_prot_grp_
 mapping indist. protein group indices <- peptide identification indices More...
 
std::map< String, Sizeprot_acc_to_indist_prot_grp_
 
bool statistics_
 log debug information? More...
 

Detailed Description

Resolves shared peptides based on protein scores.

Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current best indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the protein with larger number of peptides.

Improvement:
The class could provide iterator for ConnectedComponents in the future. One could extend the graph to include all PeptideHits (not only the best). It becomes a tripartite graph with larger connected components then. Maybe extend it to work with MS1 features. Separate resolution and adding groups to output.

Member Typedef Documentation

◆ IndexMap_

typedef std::map<Size, std::set<Size> > IndexMap_
private

to build bipartite graph as two maps (adjacency "lists"): ProtGroups-Indices <-> PepID-Indices so we get bidirectional connectivity We always take first PepHit from PepID, because those are usually used for inference

Constructor & Destructor Documentation

◆ PeptideProteinResolution()

PeptideProteinResolution ( bool  statistics = false)

Constructor

Parameters
statisticsSpecifies if the class stores/outputs info about statistics

Member Function Documentation

◆ buildGraph()

void buildGraph ( ProteinIdentification protein,
const std::vector< PeptideIdentification > &  peptides,
bool  skip_sort = false 
)

Initialize and store the graph (= maps), needs sorted groups for correct functionality. Therefore sorts the indist. protein groups if not skipped.

Parameters
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins
skip_sortSkips sorting of groups, nothing is modified then.

◆ findConnectedComponent()

ConnectedComponent findConnectedComponent ( Size root_prot_grp)

Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step.

Parameters
root_prot_grpStarts the BFS at this protein group index
Returns
Returns a Connected Component as set of group and peptide indices.

◆ resolve()

static void resolve ( ProteinIdentification protein,
std::vector< PeptideIdentification > &  peptides,
bool  resolve_ties,
bool  targets_first 
)
static

A peptide-centric reimplementation of the resolution process. Can be used statically without building a bipartite graph first.

Parameters
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins
resolve_tiesIf ties should be resolved or multiple best groups reported
targets_firstIf target groups should get picked first no matter the posterior
Todo:
warning: all peptides are used (not filtered for matching protein ID run yet).

◆ resolveConnectedComponent()

void resolveConnectedComponent ( ConnectedComponent conn_comp,
ProteinIdentification protein,
std::vector< PeptideIdentification > &  peptides 
)

Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties resolved by taking protein with largest number of peptides.

Parameters
conn_compThe component to be resolved
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins

◆ resolveGraph()

void resolveGraph ( ProteinIdentification protein,
std::vector< PeptideIdentification > &  peptides 
)

Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method.

Parameters
proteinProteinIdentification object storing IDs and groups
peptidesvector of ProteinIdentifications with links to the proteins
Todo:
warning: all peptides are used (not filtered for matching protein ID run yet).

◆ run()

static void run ( std::vector< ProteinIdentification > &  inferred_protein_id,
std::vector< PeptideIdentification > &  inferred_peptide_ids 
)
static

Convenience function that performs graph building and group resolution. After resolution, all unreferenced proteins are removed and groups updated.

Parameters
inferred_protein_idProteinIdentification object storing IDs and groups
inferred_peptide_idsVector of ProteinIdentifications with links to the proteins

Member Data Documentation

◆ indist_prot_grp_to_pep_

IndexMap_ indist_prot_grp_to_pep_
private

if the protein group at index i contains a target (first) and/or decoy (second)

mapping indist. protein group indices -> peptide identification indices

◆ pep_to_indist_prot_grp_

IndexMap_ pep_to_indist_prot_grp_
private

mapping indist. protein group indices <- peptide identification indices

◆ prot_acc_to_indist_prot_grp_

std::map<String, Size> prot_acc_to_indist_prot_grp_
private

represents the middle layer of an implicit tripartite graph: consists of single protein accessions and their mapping to the (indist.) group's indices

◆ statistics_

bool statistics_
private

log debug information?