OpenMS
IDBoostGraph Class Reference

Creates and maintains a boost graph based on the OpenMS ID datastructures. More...

#include <OpenMS/ANALYSIS/ID/IDBoostGraph.h>

Collaboration diagram for IDBoostGraph:
[legend]

Classes

class  dfs_ccsplit_visitor
 A boost dfs visitor that copies connected components into a vector of graphs. More...
 
class  GetPosteriorVisitor
 Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type gets the score (usually the posterior) More...
 
class  GetScoreTgTVisitor
 Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type gets the score (usually the posterior) plus if it is a decoy or a target. If not known or not defined, returns (-1.0, false) More...
 
class  LabelVisitor
 Visits nodes in the boost graph (ptrs to an ID Object) and depending on their type creates a label e.g. for printing to dot format. More...
 
class  PrintAddressVisitor
 Visits nodes in the boost graph (ptrs to an ID Object) and depending on their type prints the address. For debugging purposes only. More...
 
struct  ProteinGroup
 indistinguishable protein groups (size, nr targets, score) More...
 
class  SetPosteriorVisitor
 Visits nodes in the boost graph (either ptrs to an ID Object or some lightweight surrogates) and depending on their type sets the posterior Don't forget to set higherScoreBetter and score names in the parent ID objects. More...
 

Public Types

typedef boost::variant< ProteinHit *, ProteinGroup, PeptideCluster, Peptide, RunIndex, Charge, PeptideHit * > IDPointer
 
typedef boost::variant< const ProteinHit *, const ProteinGroup *, const PeptideCluster *, const Peptide, const RunIndex, const Charge, const PeptideHit * > IDPointerConst
 
typedef boost::adjacency_list< boost::setS, boost::vecS, boost::undirectedS, IDPointerGraph
 
typedef std::vector< GraphGraphs
 
typedef boost::adjacency_list< boost::setS, boost::vecS, boost::undirectedS, IDPointerGraphConst
 
typedef boost::graph_traits< Graph >::vertex_descriptor vertex_t
 
typedef boost::graph_traits< Graph >::edge_descriptor edge_t
 
typedef std::set< IDBoostGraph::vertex_tProteinNodeSet
 
typedef std::set< IDBoostGraph::vertex_tPeptideNodeSet
 

Public Member Functions

 BOOST_STRONG_TYPEDEF (boost::blank, PeptideCluster)
 placeholder for peptides with the same parent proteins or protein groups More...
 
 BOOST_STRONG_TYPEDEF (String, Peptide)
 an (currently unmodified) peptide sequence More...
 
 BOOST_STRONG_TYPEDEF (Size, RunIndex)
 in which run a PSM was observed More...
 
 BOOST_STRONG_TYPEDEF (int, Charge)
 in which charge state a PSM was observed More...
 
 IDBoostGraph (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, bool use_run_info, bool best_psms_annotated, const std::optional< const ExperimentalDesign > &ed=std::optional< const ExperimentalDesign >())
 Constructors. More...
 
 IDBoostGraph (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_run_info, bool use_unassigned_ids, bool best_psms_annotated, const std::optional< const ExperimentalDesign > &ed=std::optional< const ExperimentalDesign >())
 
void applyFunctorOnCCs (const std::function< unsigned long(Graph &, unsigned int)> &functor)
 Do sth on connected components (your functor object has to inherit from std::function or be a lambda) More...
 
void applyFunctorOnCCsST (const std::function< void(Graph &)> &functor)
 Do sth on connected components single threaded (your functor object has to inherit from std::function or be a lambda) More...
 
void clusterIndistProteinsAndPeptides ()
 
void clusterIndistProteinsAndPeptidesAndExtendGraph ()
 
void annotateIndistProteins (bool addSingletons=true)
 
void calculateAndAnnotateIndistProteins (bool addSingletons=true)
 
void computeConnectedComponents ()
 Splits the initialized graph into connected components and clears it. More...
 
void resolveGraphPeptideCentric (bool removeAssociationsInData=true)
 
Size getNrConnectedComponents ()
 Zero means the graph was not split yet. More...
 
const GraphgetComponent (Size cc)
 Returns a specific connected component of the graph as a graph itself. More...
 
const ProteinIdentificationgetProteinIDs ()
 Returns the underlying protein identifications for viewing. More...
 
void getUpstreamNodesNonRecursive (std::queue< vertex_t > &q, const Graph &graph, int lvl, bool stop_at_first, std::vector< vertex_t > &result)
 Searches for all upstream nodes from a (set of) start nodes that are lower or equal than a given level. The ordering is the same as in the IDPointer variant typedef. More...
 
void getDownstreamNodesNonRecursive (std::queue< vertex_t > &q, const Graph &graph, int lvl, bool stop_at_first, std::vector< vertex_t > &result)
 Searches for all downstream nodes from a (set of) start nodes that are higher or equal than a given level. The ordering is the same as in the IDPointer variant typedef. More...
 
void getProteinScores_ (ScoreToTgtDecLabelPairs &scores_and_tgt)
 
void getProteinGroupScoresAndTgtFraction (ScoreToTgtDecLabelPairs &scores_and_tgt_fraction)
 
void getProteinGroupScoresAndHitchhikingTgtFraction (ScoreToTgtDecLabelPairs &scores_and_tgt_fraction)
 

Static Public Member Functions

static void printGraph (std::ostream &out, const Graph &fg)
 Prints a graph (component or if not split, the full graph) in graphviz (i.e. dot) format. More...
 

Private Member Functions

vertex_t addVertexWithLookup_ (const IDPointer &ptr, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map)
 
void annotateIndistProteins_ (const Graph &fg, bool addSingletons)
 internal function to annotate the underlying ID structures based on the given Graph More...
 
void calculateAndAnnotateIndistProteins_ (const Graph &fg, bool addSingletons)
 
void buildGraph_ (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, bool best_psms_annotated=false)
 
void buildGraph_ (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_unassigned_ids, bool best_psms_annotated=false)
 
void addPeptideIDWithAssociatedProteins_ (PeptideIdentification &spectrum, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map, const std::unordered_map< std::string, ProteinHit * > &accession_map, Size use_top_psms, bool best_psms_annotated)
 Used during building. More...
 
void addPeptideAndAssociatedProteinsWithRunInfo_ (PeptideIdentification &spectrum, std::unordered_map< unsigned, unsigned > &indexToPrefractionationGroup, std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &vertex_map, std::unordered_map< std::string, ProteinHit * > &accession_map, Size use_top_psms)
 
void buildGraphWithRunInfo_ (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_unassigned_ids, const ExperimentalDesign &ed)
 
void buildGraphWithRunInfo_ (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, const ExperimentalDesign &ed)
 
void resolveGraphPeptideCentric_ (Graph &fg, bool removeAssociationsInData)
 see equivalent public method More...
 
template<class NodeType >
void getDownstreamNodes (const vertex_t &start, const Graph &graph, std::vector< NodeType > &result)
 
template<class NodeType >
void getUpstreamNodes (const vertex_t &start, const Graph graph, std::vector< NodeType > &result)
 

Private Attributes

ProteinIdentificationprotIDs_
 
Graph g
 the initial boost Graph (will be cleared when split into CCs) More...
 
Graphs ccs_
 the Graph split into connected components More...
 
std::unordered_map< vertex_t, SizepepHitVtx_to_run_
 
Size nrPrefractionationGroups_ = 0
 

Detailed Description

Creates and maintains a boost graph based on the OpenMS ID datastructures.

For finding connected components and applying functions to them. Currently assumes that all PeptideIdentifications are from the ProteinID run that is given. Please make sure this is right. VERY IMPORTANT NOTE: If you add Visitors here, make sure they do not touch members of the underlying ID objects that are responsible for the graph structure. E.g. the (protein/peptide)_hits vectors or the lists in ProteinGroups. You can set information like scores or metavalues, though.


Class Documentation

◆ OpenMS::Internal::IDBoostGraph::ProteinGroup

struct OpenMS::Internal::IDBoostGraph::ProteinGroup

indistinguishable protein groups (size, nr targets, score)

Collaboration diagram for IDBoostGraph::ProteinGroup:
[legend]
Class Members
double score
int size
int tgts

Member Typedef Documentation

◆ edge_t

typedef boost::graph_traits<Graph>::edge_descriptor edge_t

◆ Graph

typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, IDPointer> Graph

◆ GraphConst

typedef boost::adjacency_list<boost::setS, boost::vecS, boost::undirectedS, IDPointer> GraphConst

◆ Graphs

typedef std::vector<Graph> Graphs

◆ IDPointer

typedef boost::variant<ProteinHit*, ProteinGroup, PeptideCluster, Peptide, RunIndex, Charge, PeptideHit*> IDPointer

◆ IDPointerConst

typedef boost::variant<const ProteinHit*, const ProteinGroup*, const PeptideCluster*, const Peptide, const RunIndex, const Charge, const PeptideHit*> IDPointerConst

◆ PeptideNodeSet

◆ ProteinNodeSet

◆ vertex_t

typedef boost::graph_traits<Graph>::vertex_descriptor vertex_t

Constructor & Destructor Documentation

◆ IDBoostGraph() [1/2]

IDBoostGraph ( ProteinIdentification proteins,
std::vector< PeptideIdentification > &  idedSpectra,
Size  use_top_psms,
bool  use_run_info,
bool  best_psms_annotated,
const std::optional< const ExperimentalDesign > &  ed = std::optional< const ExperimentalDesign >() 
)

Constructors.

◆ IDBoostGraph() [2/2]

IDBoostGraph ( ProteinIdentification proteins,
ConsensusMap cmap,
Size  use_top_psms,
bool  use_run_info,
bool  use_unassigned_ids,
bool  best_psms_annotated,
const std::optional< const ExperimentalDesign > &  ed = std::optional< const ExperimentalDesign >() 
)

Member Function Documentation

◆ addPeptideAndAssociatedProteinsWithRunInfo_()

void addPeptideAndAssociatedProteinsWithRunInfo_ ( PeptideIdentification spectrum,
std::unordered_map< unsigned, unsigned > &  indexToPrefractionationGroup,
std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &  vertex_map,
std::unordered_map< std::string, ProteinHit * > &  accession_map,
Size  use_top_psms 
)
private

◆ addPeptideIDWithAssociatedProteins_()

void addPeptideIDWithAssociatedProteins_ ( PeptideIdentification spectrum,
std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &  vertex_map,
const std::unordered_map< std::string, ProteinHit * > &  accession_map,
Size  use_top_psms,
bool  best_psms_annotated 
)
private

Used during building.

◆ addVertexWithLookup_()

vertex_t addVertexWithLookup_ ( const IDPointer ptr,
std::unordered_map< IDPointer, vertex_t, boost::hash< IDPointer >> &  vertex_map 
)
private

helper function to add a vertex if it is not present yet, otherwise return the present one needs a temporary filled vertex_map that is modifiable

◆ annotateIndistProteins()

void annotateIndistProteins ( bool  addSingletons = true)

Annotate indistinguishable proteins by adding the groups to the underlying ProteinIdentification::ProteinGroups object. This has no effect on the graph itself.

Precondition
Graph must contain ProteinGroup nodes (e.g. with clusterIndistProteinsAndPeptides). Otherwise it does nothing and you should use calculateAndAnnotateIndistProteins instead.
Parameters
addSingletonsif you want to annotate groups with just one protein entry

◆ annotateIndistProteins_()

void annotateIndistProteins_ ( const Graph fg,
bool  addSingletons 
)
private

internal function to annotate the underlying ID structures based on the given Graph

◆ applyFunctorOnCCs()

void applyFunctorOnCCs ( const std::function< unsigned long(Graph &, unsigned int)> &  functor)

Do sth on connected components (your functor object has to inherit from std::function or be a lambda)

◆ applyFunctorOnCCsST()

void applyFunctorOnCCsST ( const std::function< void(Graph &)> &  functor)

Do sth on connected components single threaded (your functor object has to inherit from std::function or be a lambda)

◆ BOOST_STRONG_TYPEDEF() [1/4]

BOOST_STRONG_TYPEDEF ( boost::blank  ,
PeptideCluster   
)

placeholder for peptides with the same parent proteins or protein groups

◆ BOOST_STRONG_TYPEDEF() [2/4]

BOOST_STRONG_TYPEDEF ( int  ,
Charge   
)

in which charge state a PSM was observed

◆ BOOST_STRONG_TYPEDEF() [3/4]

BOOST_STRONG_TYPEDEF ( Size  ,
RunIndex   
)

in which run a PSM was observed

◆ BOOST_STRONG_TYPEDEF() [4/4]

BOOST_STRONG_TYPEDEF ( String  ,
Peptide   
)

an (currently unmodified) peptide sequence

◆ buildGraph_() [1/2]

void buildGraph_ ( ProteinIdentification proteins,
ConsensusMap cmap,
Size  use_top_psms,
bool  use_unassigned_ids,
bool  best_psms_annotated = false 
)
private

◆ buildGraph_() [2/2]

void buildGraph_ ( ProteinIdentification proteins,
std::vector< PeptideIdentification > &  idedSpectra,
Size  use_top_psms,
bool  best_psms_annotated = false 
)
private

Initialize and store the graph IMPORTANT: Once the graph is built, editing members like (protein/peptide)_hits_ will invalidate it!

Parameters
proteinsProteinIdentification object storing IDs and groups
idedSpectravector of ProteinIdentifications with links to the proteins and PSMs in its PeptideHits
use_top_psmsNr of top PSMs used per spectrum (<= 0 means all)
best_psms_annotatedAre the PSMs annotated with the "best_per_peptide" meta value. Otherwise all are taken into account.
Todo:
we could include building the graph in important "main" functions like inferPosteriors to make the methods safer, but it is also nice to be able to reuse the graph

◆ buildGraphWithRunInfo_() [1/2]

void buildGraphWithRunInfo_ ( ProteinIdentification proteins,
ConsensusMap cmap,
Size  use_top_psms,
bool  use_unassigned_ids,
const ExperimentalDesign ed 
)
private

Initialize and store the graph. Also stores run information to later group peptides more efficiently. IMPORTANT: Once the graph is built, editing members like (protein/peptide)_hits_ will invalidate it!

use_top_psms is the number of top PSMs used per spectrum (<= 0 means all)

Todo:
we could include building the graph in important "main" functions like inferPosteriors to make the methods safer, but it is also nice to be able to reuse the graph

◆ buildGraphWithRunInfo_() [2/2]

void buildGraphWithRunInfo_ ( ProteinIdentification proteins,
std::vector< PeptideIdentification > &  idedSpectra,
Size  use_top_psms,
const ExperimentalDesign ed 
)
private

◆ calculateAndAnnotateIndistProteins()

void calculateAndAnnotateIndistProteins ( bool  addSingletons = true)

Annotate indistinguishable proteins by adding the groups to the underlying ProteinIdentification::ProteinGroups object. This has no effect on the graph itself.

Parameters
addSingletonsif you want to annotate groups with just one protein entry

◆ calculateAndAnnotateIndistProteins_()

void calculateAndAnnotateIndistProteins_ ( const Graph fg,
bool  addSingletons 
)
private

◆ clusterIndistProteinsAndPeptides()

void clusterIndistProteinsAndPeptides ( )

Add intermediate nodes to the graph that represent indist. protein groups and peptides with the same parents this will save computation time and oscillations later on.

◆ clusterIndistProteinsAndPeptidesAndExtendGraph()

void clusterIndistProteinsAndPeptidesAndExtendGraph ( )

(under development) As above but adds charge, replicate and sequence layer of nodes (untested)

Todo:
needs to be finished, updated with latest additions (i.e. check clusterIndistProteinsAndPeptides), and tested

◆ computeConnectedComponents()

void computeConnectedComponents ( )

Splits the initialized graph into connected components and clears it.

◆ getComponent()

const Graph& getComponent ( Size  cc)

Returns a specific connected component of the graph as a graph itself.

Parameters
ccthe index of the component
Returns
the component as graph

◆ getDownstreamNodes()

void getDownstreamNodes ( const vertex_t start,
const Graph graph,
std::vector< NodeType > &  result 
)
inlineprivate

◆ getDownstreamNodesNonRecursive()

void getDownstreamNodesNonRecursive ( std::queue< vertex_t > &  q,
const Graph graph,
int  lvl,
bool  stop_at_first,
std::vector< vertex_t > &  result 
)

Searches for all downstream nodes from a (set of) start nodes that are higher or equal than a given level. The ordering is the same as in the IDPointer variant typedef.

Parameters
qa queue of start nodes
graphthe graph to look in (q has to be part of it)
lvlthe level to start reporting from
stop_at_firstdo you want to stop at the first node >= lvl or also report its upstream "predecessors"
resultvector of reported nodes

◆ getNrConnectedComponents()

Size getNrConnectedComponents ( )

Zero means the graph was not split yet.

◆ getProteinGroupScoresAndHitchhikingTgtFraction()

void getProteinGroupScoresAndHitchhikingTgtFraction ( ScoreToTgtDecLabelPairs scores_and_tgt_fraction)

◆ getProteinGroupScoresAndTgtFraction()

void getProteinGroupScoresAndTgtFraction ( ScoreToTgtDecLabelPairs scores_and_tgt_fraction)

Gets the scores and target decoy fraction from groups and score + binary values for singleton proteins. This function is usually used to create input for FDR calculations

◆ getProteinIDs()

const ProteinIdentification& getProteinIDs ( )

Returns the underlying protein identifications for viewing.

Returns
const ref to the protein ID run in this graph (can only be one)

◆ getProteinScores_()

void getProteinScores_ ( ScoreToTgtDecLabelPairs scores_and_tgt)

Gets the scores from the proteins included in the graph. The difference to querying the underlying ProteinIdentification structure is that not all proteins might be included in the graph due to using only the best psm per peptide

◆ getUpstreamNodes()

void getUpstreamNodes ( const vertex_t start,
const Graph  graph,
std::vector< NodeType > &  result 
)
inlineprivate

◆ getUpstreamNodesNonRecursive()

void getUpstreamNodesNonRecursive ( std::queue< vertex_t > &  q,
const Graph graph,
int  lvl,
bool  stop_at_first,
std::vector< vertex_t > &  result 
)

Searches for all upstream nodes from a (set of) start nodes that are lower or equal than a given level. The ordering is the same as in the IDPointer variant typedef.

Parameters
qa queue of start nodes
graphthe graph to look in (q has to be part of it)
lvlthe level to start reporting from
stop_at_firstdo you want to stop at the first node <= lvl or also report its upstream "predecessors"
resultvector of reported nodes

◆ printGraph()

static void printGraph ( std::ostream &  out,
const Graph fg 
)
static

Prints a graph (component or if not split, the full graph) in graphviz (i.e. dot) format.

Parameters
outan ostream to print to
fgthe graph to print

◆ resolveGraphPeptideCentric()

void resolveGraphPeptideCentric ( bool  removeAssociationsInData = true)
Todo:
untested Removes all edges from a peptide (and its PSMs) to its parent protein groups (and its proteins) except for the best protein group.
Precondition
Graph must contain PeptideCluster nodes (e.g. with clusterIndistProteinsAndPeptides).
Parameters
removeAssociationsInDataAlso removes the corresponding PeptideEvidences in the underlying ID data structure. Only deactivate if you know what you are doing.

◆ resolveGraphPeptideCentric_()

void resolveGraphPeptideCentric_ ( Graph fg,
bool  removeAssociationsInData 
)
private

see equivalent public method

Member Data Documentation

◆ ccs_

Graphs ccs_
private

the Graph split into connected components

◆ g

Graph g
private

the initial boost Graph (will be cleared when split into CCs)

◆ nrPrefractionationGroups_

Size nrPrefractionationGroups_ = 0
private

this basically stores the number of different values in the pepHitVtx_to_run a Prefractionation group (previously called run) is a unique combination of all non-fractionation related entries in the exp. design i.e. one (sub-)experiment before fractionation

◆ pepHitVtx_to_run_

std::unordered_map<vertex_t, Size> pepHitVtx_to_run_
private

if a graph is built with run information, this will store the run, each peptide hit vertex belongs to. Important for extending the graph.

◆ protIDs_

ProteinIdentification& protIDs_
private