OpenMS
Todo List
Member BasicProteinInferenceAlgorithm::run (ConsensusMap &cmap, ProteinIdentification &prot_id, bool include_unassigned) const
allow checking matching IDs
Member BayesianProteinInferenceAlgorithm::BayesianProteinInferenceAlgorithm (unsigned int debug_lvl=0)
is there a better way to pass the debug level from TOPPBase?
Member BayesianProteinInferenceAlgorithm::checkConvertAndFilterPepHits_
extend to allow filtering only for the current run
Member BayesianProteinInferenceAlgorithm::inferPosteriorProbabilities (std::vector< ProteinIdentification > &proteinIDs, std::vector< PeptideIdentification > &peptideIDs, bool greedy_group_resolution, std::optional< const ExperimentalDesign > exp_des=std::optional< const ExperimentalDesign >())
loop over all runs
Member ConsensusIDAlgorithm::apply (std::vector< PeptideIdentification > &ids, const std::map< String, String > &se_info, Size number_of_runs=0)
we could pass the score_types that we want to carry over in the map as well (right now it always takes main)
Class ConsensusMapMergerAlgorithm

This could be merged in the future with the general IDMergerAlgorithm since it shares a lot. IDMergerAlgorithm needs additional methods to have multiple runs as output. It also needs to store an extended mapping internally to distribute the PeptideIDs to the right output run according to origin and label. And should have non-copying/moving overloads for inserting PeptideIDs since we probably do not want to distribute the PeptideIDs to the features again. In general detaching IDs from features would be of great help here.

Untested for TMT/iTraq data where you usually have one Identification run per File but in one File you might have multiple conditions multiplexed, that you might want to split for inference. Problem: There is only one PeptideIdentification object per Feature that is representative for all "sub maps" (in this case the labels/reporter ions). -> A lookup is necessary if the reporter ion had non-zero intensity and if so, the peptide ID needs to be duplicated for every new (condition-based) IdentificationRun it is supposed to be used in, according to the mapping.

Member ConsensusMapMergerAlgorithm::mergeProteinIDRuns (ConsensusMap &cmap, const std::map< unsigned, unsigned > &mapIdx_to_new_protIDRun) const
Do we need to consider the old IDRun identifier in addition to the sub map index
Class ConsensusXMLFile
Take care that unique ids are assigned properly by TOPP tools before calling ConsensusXMLFile::store(). There will be a message on OPENMS_LOG_INFO but we will make no attempt to fix the problem in this class. (all developers)
Class ConsensusXMLHandler
Take care that unique ids are assigned properly by TOPP tools before calling ConsensusXMLFile::store(). There will be a message on OPENMS_LOG_INFO but we will make no attempt to fix the problem in this class. (all developers)
Member EnzymaticDigestion::digestUnmodified (const StringView &sequence, std::vector< std::pair< Size, Size >> &output, Size min_length=1, Size max_length=0) const
could be set of pairs.
Class FeatureFinderAlgorithmPicked

Fix output in parallel mode, change assignment of charges to threads, add parallel TOPP test (Marc)

Implement user-specified seed lists support (Marc)

Member FeatureFinderIdentificationAlgorithm::addPeptideToMap_ (PeptideIdentification &peptide, PeptideMap &peptide_map, bool external=false)
find better solution
Class FeatureXMLFile
Take care that unique ids are assigned properly by TOPP tools before calling FeatureXMLFile::store(). There will be a message on OPENMS_LOG_INFO but we will make no attempt to fix the problem in this class. (all developers)
Class FeatureXMLHandler
Take care that unique ids are assigned properly by TOPP tools before calling FeatureXMLFile::store(). There will be a message on OPENMS_LOG_INFO but we will make no attempt to fix the problem in this class. (all developers)
Module FileIO

Implement reading of pepXML and protXML (Andreas)

Allow reading of zipped XML files (David, Hiwi)

Class GaussTraceFitter
More docu
Member IDBoostGraph::buildGraph_ (ProteinIdentification &proteins, std::vector< PeptideIdentification > &idedSpectra, Size use_top_psms, bool best_psms_annotated=false)
we could include building the graph in important "main" functions like inferPosteriors to make the methods safer, but it is also nice to be able to reuse the graph
Member IDBoostGraph::buildGraphWithRunInfo_ (ProteinIdentification &proteins, ConsensusMap &cmap, Size use_top_psms, bool use_unassigned_ids, const ExperimentalDesign &ed)
we could include building the graph in important "main" functions like inferPosteriors to make the methods safer, but it is also nice to be able to reuse the graph
Member IDBoostGraph::clusterIndistProteinsAndPeptidesAndExtendGraph ()
needs to be finished, updated with latest additions (i.e. check clusterIndistProteinsAndPeptides), and tested
Member IDBoostGraph::resolveGraphPeptideCentric (bool removeAssociationsInData=true)
untested Removes all edges from a peptide (and its PSMs) to its parent protein groups (and its proteins) except for the best protein group.
Class Identification
docu (Andreas)
Class IdentificationHit
docu (Andreas)
Class IDMergerAlgorithm
allow filtering for peptide sequence to supersede the IDMerger tool. Make it keep the best PSMs though.
Member IDScoreGetterSetter::getPeptideScoresFromMap_ (ScoreToTgtDecLabelPairs &scores_labels, const ConsensusMap &cmap, bool include_unassigned_peptides, Args &&... args)
allow FeatureMap?
Member IDScoreGetterSetter::getPickedProteinGroupScores_ (const std::unordered_map< String, ScoreToTgtDecLabelPair > &picked_scores, ScoreToTgtDecLabelPairs &scores_labels, const std::vector< ProteinIdentification::ProteinGroup > &grps, const String &decoy_string, bool decoy_prefix)
describe more
Class InclusionExclusionList
allow modifications (fixed?)
Class InspectOutfile
Handle Modifications (Andreas)
Class IsotopeMarker
implement a real isotope marking here with isotopedistributions and fitting (Andreas)
Class LabeledPairFinder

Implement support for labeled MRM experiments, Q1 m/z value and charges. (Andreas)

Implement support for more than one mass delta, e.g. from missed cleavages and so on (Andreas)

Page MascotAdapter
This adapter is using antiquated internal methods and needs to be updated! E.g. use MascotGenericFile.h instead of MascotInfile.h....
Member MessagePasserFactory< Label >::chgLLhoods
could be calculated from IDPEP if we do per charge state fitting) or empirically estimated from the input PSMs
Member ModificationsDB::searchModifications (std::set< const ResidueModification * > &mods, const String &mod_name, const String &residue="", ResidueModification::TermSpecificity term_spec=ResidueModification::NUMBER_OF_TERM_SPECIFICITY) const
use set as return value. Would be more efficient in pyopenms
Class MzMLHandler
replace hardcoded cv stuff with more flexible handling via obo r/w.
Member MzMLSpectrumDecoder::decodeBinaryDataChrom_ (std::vector< BinaryData > &data) const
Duplicated code from MzMLHandler, need to clean up see MzMLHandler::fillData_()
Member MzMLSpectrumDecoder::decodeBinaryDataSpectrum_ (std::vector< BinaryData > &data) const
Duplicated code from MzMLHandler, need to clean up see MzMLHandler::fillData_()
Member PeptideProteinResolution::resolve (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides, bool resolve_ties, bool targets_first)
warning: all peptides are used (not filtered for matching protein ID run yet).
Member PeptideProteinResolution::resolveGraph (ProteinIdentification &protein, std::vector< PeptideIdentification > &peptides)
warning: all peptides are used (not filtered for matching protein ID run yet).
Class PlotWidget
Add support to store the displayed data as SVG image (HiWi)
Class PosteriorErrorProbabilityModel

test performance and make fitGumbelGauss available via parameters.

allow charge state based fitting

allow semi-supervised by using decoy annotations

allow non-parametric via kernel density estimation

Class ProductModel< D >
This class provides new member functions, which makes Factory<BaseModel<2> >::create("ProductModel2D") pretty much useless! (Clemens)
Class ProteinIdentification
Add MetaInfoInterface to modifications => update IdXMLFile and ProteinIdentificationVisualizer (Andreas)
Page ProteinInference
possibly integrate parsimony approach from OpenMS::PSProteinInference class The command line parameters of this tool are:
ProteinInference -- Protein inference based on an aggregation of the scores of the identified peptides.
Full documentation: http://www.openms.de/doxygen/release/3.0.0/html/TOPP_ProteinInference.html
Version: 3.0.0 Jul 14 2023, 11:57:33, Revision: be787e9
To cite OpenMS:
 + Rost HL, Sachsenberg T, Aiche S, Bielow C et al.. OpenMS: a flexible open-source software platform for 
   mass spectrometry data analysis. Nat Meth. 2016; 13, 9: 741-748. doi:10.1038/nmeth.3959.

Usage:
  ProteinInference <options>

Options (mandatory options marked with '*'):
  -in <file>*                                                 Input file(s) (valid formats: 'idXML', 'consens
                                                              usXML')
  -out <file>*                                                Output file (valid formats: 'idXML', 'consensus
                                                              XML')
  -out_type <file>                                            Output file type (valid: 'idXML', 'consensusXML
                                                              ')
  -merge_runs <choice>                                        If your idXML contains multiple runs, merge 
                                                              them beforehand? Otherwise performs inference 
                                                              separately per run. (default: 'all') (valid: 
                                                              'no', 'all')
  -protein_fdr <option>                                       Additionally calculate the target-decoy FDR on 
                                                              protein-level after inference (default: 'false'
                                                              ) (valid: 'true', 'false')
                                                              

Merging:
  -Merging:annotate_origin <choice>                           If true, adds a map_index MetaValue to the Pept
                                                              ideIDs to annotate the IDRun they came from. 
                                                              (default: 'true') (valid: 'true', 'false')
  -Merging:allow_disagreeing_settings                         Force merging of disagreeing runs. Use at your 
                                                              own risk.

Algorithm:
  -Algorithm:min_peptides_per_protein <number>                Minimal number of peptides needed for a protein
                                                               identification. If set to zero, unmatched prot
                                                              eins get a score of -Infinity. If bigger than 
                                                              zero, proteins with less peptides are filtered 
                                                              and evidences removed from the PSMs. PSMs that 
                                                              do not reference any proteins anymore are remov
                                                              ed but the spectrum info is kept. (default: 
                                                              '1') (min: '0')
  -Algorithm:score_aggregation_method <choice>                How to aggregate scores of peptides matching 
                                                              to the same protein? (default: 'best') (valid: 
                                                              'best', 'product', 'sum', 'maximum')
  -Algorithm:treat_charge_variants_separately <choice>        If this is true, different charge variants of 
                                                              the same peptide sequence count as individual 
                                                              evidences. (default: 'true') (valid: 'true', 
                                                              'false')
  -Algorithm:treat_modification_variants_separately <choice>  If this is true, different modification variant
                                                              s of the same peptide sequence count as individ
                                                              ual evidences. (default: 'true') (valid: 'true'
                                                              , 'false')
  -Algorithm:use_shared_peptides <choice>                     If this is true, shared peptides are used as 
                                                              evidences. Note: shared_peptides are not delete
                                                              d and potentially resolved in postprocessing 
                                                              as well. (default: 'true') (valid: 'true', 'fal
                                                              se')
  -Algorithm:skip_count_annotation                            If this is set, peptide counts won't be annotat
                                                              ed at the proteins.
  -Algorithm:annotate_indistinguishable_groups <choice>       If this is true, calculates and annotates indis
                                                              tinguishable protein groups. (default: 'true') 
                                                              (valid: 'true', 'false')
  -Algorithm:greedy_group_resolution                          If this is true, shared peptides will be associ
                                                              ated to best proteins only (i.e. become potenti
                                                              ally quantifiable razor peptides).

                                                              
Common TOPP options:
  -ini <file>                                                 Use the given TOPP INI file
  -threads <n>                                                Sets the number of threads allowed to be used 
                                                              by the TOPP tool (default: '1')
  -write_ini <file>                                           Writes the default configuration file
  --help                                                      Shows options
  --helphelp                                                  Shows all options (including advanced)

INI file documentation of this tool:
Legend:
required parameter
advanced parameter
+ProteinInferenceProtein inference based on an aggregation of the scores of the identified peptides.
version3.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'ProteinInference'
in[] input file(s)input file*.idXML, *.consensusXML
out output fileoutput file*.idXML, *.consensusXML
out_type output file typeidXML, consensusXML
merge_runsall If your idXML contains multiple runs, merge them beforehand? Otherwise performs inference separately per run.no, all
protein_fdrfalse Additionally calculate the target-decoy FDR on protein-level after inferencetrue, false
conservative_fdrtrue Use (D+1)/(T) instead of (D+1)/(T+D) for reporting protein FDRs.true, false
picked_fdrtrue Use picked protein FDRs.true, false
picked_decoy_string If using picked protein FDRs, which decoy string was used? Leave blank for auto-detection.
picked_decoy_prefixprefix If using picked protein FDRs, was the decoy string a prefix or suffix? Ignored during auto-detection.prefix, suffix
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue, false
forcefalse Overrides tool-specific checkstrue, false
testfalse Enables the test mode (needed for internal use only)true, false
+++Merging
annotate_origintrue If true, adds a map_index MetaValue to the PeptideIDs to annotate the IDRun they came from.true, false
allow_disagreeing_settingsfalse Force merging of disagreeing runs. Use at your own risk.true, false
+++Algorithm
min_peptides_per_protein1 Minimal number of peptides needed for a protein identification. If set to zero, unmatched proteins get a score of -Infinity. If bigger than zero, proteins with less peptides are filtered and evidences removed from the PSMs. PSMs that do not reference any proteins anymore are removed but the spectrum info is kept.0:∞
score_aggregation_methodbest How to aggregate scores of peptides matching to the same protein?best, product, sum, maximum
treat_charge_variants_separatelytrue If this is true, different charge variants of the same peptide sequence count as individual evidences.true, false
treat_modification_variants_separatelytrue If this is true, different modification variants of the same peptide sequence count as individual evidences.true, false
use_shared_peptidestrue If this is true, shared peptides are used as evidences. Note: shared_peptides are not deleted and potentially resolved in postprocessing as well.true, false
skip_count_annotationfalse If this is set, peptide counts won't be annotated at the proteins.true, false
annotate_indistinguishable_groupstrue If this is true, calculates and annotates indistinguishable protein groups.true, false
greedy_group_resolutionfalse If this is true, shared peptides will be associated to best proteins only (i.e. become potentially quantifiable razor peptides).true, false
Class ProtXMLFile

Document which metavalues of Protein/PeptideHit are filled when reading ProtXML (Chris)

Writing of protXML is currently not supported

Class QTCluster
This implementation may benefit from two separate implementations (one considering IDs/annotations one without). The current implementation most likely hinders speed/memory of both by trying to do both in one. The ID-based implementation could additionally benefit from ID scores and make use of ConsensusID functions.
Page RTPredict
This needs serious clean up! Combining certain input and output options will result in strange behaviour, especially when using text output/input.
Class SequestOutfile

Handle Modifications (Andreas)

Complete rewrite of the parser (and those of InsPecT and PepNovo), the code is bullshit... (Andreas)

Class SpectrumIdentification
docu (Andreas)
Class TOPPBase
: replace writeLog_, writeDebug_ with a logger concept we'd need something like -VLevels [LOGGERS] to specify which loggers shall print something the '-log' flag should clone all output to the log-file (maybe with custom [LOGGERS]), which can either be specified directly or is equal to '-out' (if present) with a ".log" suffix maybe a new LOGGER type (TOPP), which is only usable on TOPP level?
Member TOPPViewBase::lastActiveSubwindow_
).
Class TraceFitter
docu needs update
Class TwoDOptimization
Works only with defined types due to pointers to the data in the optimization namespace! Change that or remove templates (Alexandra)