OpenMS
2.7.0
|
Resolves shared peptides based on protein scores. More...
#include <OpenMS/ANALYSIS/ID/PeptideProteinResolution.h>
Public Member Functions | |
PeptideProteinResolution (bool statistics=false) | |
void | buildGraph (ProteinIdentification &protein, const std::vector< PeptideIdentification > &peptides, bool skip_sort=false) |
void | resolveGraph (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides) |
ConnectedComponent | findConnectedComponent (Size &root_prot_grp) |
void | resolveConnectedComponent (ConnectedComponent &conn_comp, ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides) |
Static Public Member Functions | |
static void | resolve (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides, bool resolve_ties, bool targets_first) |
static void | run (std::vector< ProteinIdentification > &inferred_protein_id, std::vector< PeptideIdentification > &inferred_peptide_ids) |
Private Types | |
typedef std::map< Size, std::set< Size > > | IndexMap_ |
Private Attributes | |
IndexMap_ | indist_prot_grp_to_pep_ |
if the protein group at index i contains a target (first) and/or decoy (second) More... | |
IndexMap_ | pep_to_indist_prot_grp_ |
mapping indist. protein group indices <- peptide identification indices More... | |
std::map< String, Size > | prot_acc_to_indist_prot_grp_ |
bool | statistics_ |
log debug information? More... | |
Resolves shared peptides based on protein scores.
Resolves connected components of the bipartite protein-peptide graph based on protein probabilities/scores and adds them as additional protein_groups to the protein identification run processed. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current best indistinguishable protein group, until every peptide is uniquely assigned. This effectively allows more peptides to be used in ProteinQuantifier at the cost of potentially additional noise in the peptides quantities. In accordance with most state-of-the-art protein inference tools, only the best hit (PSM) for a peptide ID is considered. Probability ties are currently resolved by taking the protein with larger number of peptides.
to build bipartite graph as two maps (adjacency "lists"): ProtGroups-Indices <-> PepID-Indices so we get bidirectional connectivity We always take first PepHit from PepID, because those are usually used for inference
PeptideProteinResolution | ( | bool | statistics = false | ) |
Constructor
statistics | Specifies if the class stores/outputs info about statistics |
void buildGraph | ( | ProteinIdentification & | protein, |
const std::vector< PeptideIdentification > & | peptides, | ||
bool | skip_sort = false |
||
) |
Initialize and store the graph (= maps), needs sorted groups for correct functionality. Therefore sorts the indist. protein groups if not skipped.
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
skip_sort | Skips sorting of groups, nothing is modified then. |
ConnectedComponent findConnectedComponent | ( | Size & | root_prot_grp | ) |
Does a BFS on the two maps (= two parts of the graph; indist. prot. groups and peptides), switching from one to the other in each step.
root_prot_grp | Starts the BFS at this protein group index |
|
static |
A peptide-centric reimplementation of the resolution process. Can be used statically without building a bipartite graph first.
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
resolve_ties | If ties should be resolved or multiple best groups reported |
targets_first | If target groups should get picked first no matter the posterior |
void resolveConnectedComponent | ( | ConnectedComponent & | conn_comp, |
ProteinIdentification & | protein, | ||
std::vector< PeptideIdentification > & | peptides | ||
) |
Resolves connected components based on posterior probabilities and adds them as additional protein_groups to the output idXML. Thereby greedily assigns shared peptides in this component uniquely to the proteins of the current BEST INDISTINGUISHABLE protein group, ready to be used in ProteinQuantifier then. This is achieved by removing all other evidence from the input PeptideIDs and iterating until each peptide is uniquely assigned. In accordance with Fido only the best hit (PSM) for an ID is considered. Probability ties resolved by taking protein with largest number of peptides.
conn_comp | The component to be resolved |
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
void resolveGraph | ( | ProteinIdentification & | protein, |
std::vector< PeptideIdentification > & | peptides | ||
) |
Applies resolveConnectedComponent to every component of the graph and is able to write statistics when specified. Parameters will both be mutated in this method.
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
|
static |
Convenience function that performs graph building and group resolution. After resolution, all unreferenced proteins are removed and groups updated.
protein | ProteinIdentification object storing IDs and groups |
peptides | vector of ProteinIdentifications with links to the proteins |
|
private |
if the protein group at index i contains a target (first) and/or decoy (second)
mapping indist. protein group indices -> peptide identification indices
|
private |
mapping indist. protein group indices <- peptide identification indices
represents the middle layer of an implicit tripartite graph: consists of single protein accessions and their mapping to the (indist.) group's indices
|
private |
log debug information?