All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
ExperimentalDesign Class Reference

Representation of an experimental design in OpenMS. Instances can be loaded with the ExperimentalDesignFile class. More...

#include <OpenMS/METADATA/ExperimentalDesign.h>

Collaboration diagram for ExperimentalDesign:
[legend]

Classes

class  MSFileSectionEntry
 
class  SampleSection
 

Public Types

using MSFileSection = std::vector< MSFileSectionEntry >
 

Public Member Functions

 ExperimentalDesign ()=default
 
 ExperimentalDesign (const MSFileSection &msfile_section, const SampleSection &sample_section)
 
const MSFileSectiongetMSFileSection () const
 
void setMSFileSection (const MSFileSection &msfile_section)
 
const ExperimentalDesign::SampleSectiongetSampleSection () const
 
void setSampleSection (const SampleSection &sample_section)
 
std::map< std::vector< String >, std::set< String > > getUniqueSampleRowToSampleMapping () const
 
std::map< String, unsigned > getSampleToPrefractionationMapping () const
 
std::map< unsigned int, std::vector< String > > getFractionToMSFilesMapping () const
 return fraction index to file paths (ordered by fraction_group) More...
 
std::vector< std::vector< std::pair< String, unsigned > > > getConditionToPathLabelVector () const
 
std::map< std::vector< String >, std::set< unsigned > > getConditionToSampleMapping () const
 return a condition (unique combination of sample section values except replicate) to Sample index mapping More...
 
std::map< std::pair< String, unsigned >, unsigned > getPathLabelToPrefractionationMapping (bool use_basename_only) const
 
std::map< std::pair< String, unsigned >, unsigned > getPathLabelToConditionMapping (bool use_basename_only) const
 
std::map< String, unsigned > getSampleToConditionMapping () const
 
std::map< std::pair< String, unsigned >, unsigned > getPathLabelToSampleMapping (bool use_basename_only) const
 return <file_path, label> to sample index mapping More...
 
std::map< std::pair< String, unsigned >, unsigned > getPathLabelToFractionMapping (bool use_basename_only) const
 return <file_path, label> to fraction mapping More...
 
std::map< std::pair< String, unsigned >, unsigned > getPathLabelToFractionGroupMapping (bool use_basename_only) const
 return <file_path, label> to fraction_group mapping More...
 
unsigned getNumberOfSamples () const
 
unsigned getNumberOfFractions () const
 
unsigned getNumberOfLabels () const
 
unsigned getNumberOfMSFiles () const
 
unsigned getNumberOfFractionGroups () const
 
unsigned getSample (unsigned fraction_group, unsigned label=1)
 
bool isFractionated () const
 
Size filterByBasenames (const std::set< String > &bns)
 
bool sameNrOfMSFilesPerFraction () const
 

Static Public Member Functions

static ExperimentalDesign fromConsensusMap (const ConsensusMap &c)
 Extract experimental design from consensus map. More...
 
static ExperimentalDesign fromFeatureMap (const FeatureMap &f)
 Extract experimental design from feature map. More...
 
static ExperimentalDesign fromIdentifications (const std::vector< ProteinIdentification > &proteins)
 Extract experimental design from identifications. More...
 

Private Member Functions

std::vector< StringgetFileNames_ (bool basename) const
 
std::vector< unsigned > getLabels_ () const
 
std::vector< unsigned > getFractions_ () const
 
std::map< std::pair< String, unsigned >, unsigned > pathLabelMapper_ (bool, unsigned(*f)(const ExperimentalDesign::MSFileSectionEntry &)) const
 Generic Mapper (Path, Label) -> f(row) More...
 
void sort_ ()
 
void isValid_ ()
 

Static Private Member Functions

template<typename T >
static void errorIfAlreadyExists (std::set< T > &container, T &item, const String &message)
 

Private Attributes

MSFileSection msfile_section_
 
SampleSection sample_section_
 

Detailed Description

Representation of an experimental design in OpenMS. Instances can be loaded with the ExperimentalDesignFile class.

Experimental designs can be provided in two formats: the one-table format and the two-table format.

The one-table format is simpler but slightly more redundant.

The one-table format consists of mandatory (file columns) and optional sample metadata (sample columns).

The mandatory file columns are Fraction_Group, Fraction, Spectra_Filepath and Label. These columns capture the mapping of quantitative values to files for label-free and multiplexed experiments and enables fraction-aware data processing.

  • Fraction_Group: a numeric identifier that indicates which fractions are grouped together. Please do NOT reuse the same identifiers across samples! Assign identifiers continuously.
  • Fraction: a numeric identifier that indicates which fraction was measured in this file. In the case of unfractionated data, the fraction identifier is 1 for all samples. Make sure the same identifiers are used across different Fraction_Groups, as this determines which fractions correspond to each other.
  • Label: a numeric identifier for the label. 1 for label-free, 1 and 2 for SILAC light/heavy, e.g., 1-10 for TMT10Plex
  • Spectra_Filepath: a filename or path as string representation (e.g., SILAC_file.mzML)

For processing with MSstats, the optional sample columns are typically MSstats_Condition and MSstats_BioReplicate with an additional MSstats_Mixture column in the case of TMT labeling. They capture the experimental factors and conditions associated with a sample.

  • MSstats_Condition: a string that indicates the condition (e.g., control or 1000 mMol). Will be forwarded to MSstats and can then be used to specify test contrasts.
  • MSstats_BioReplicate: a string identifier to indicate biological replication of a sample. Entries with the same Sample/Condition/BioReplicate but different Filepath (and therefore FractionGroup number) will be treated as technical replicates.
  • MSstats_Mixture: (for TMT labeling only): a string identifier to indicate the mixture of samples labeled with different TMT reagents, which can be analyzed in a single mass spectrometry experiment. E.g., same samples labeled with different TMT reagents have a different mixture identifier. Technical replicates need to have the same mixture identifier.

For details on the MSstats columns please refer to the MSstats manual for details (https://www.bioconductor.org/packages/release/bioc/vignettes/MSstats/inst/doc/MSstats.html).

Fraction_Group Fraction Spectra_Filepath Label MSstats_Condition MSstats_BioReplicate
1 1 UPS1_12500amol_R1.mzML 1 12500 amol 1
2 1 UPS1_12500amol_R2.mzML 1 12500 amol 2
3 1 UPS1_12500amol_R3.mzML 1 12500 amol 3
... ...
... ...
...
...
22 1 UPS1_500amol_R1.mzML 1 500 amol 1
23 1 UPS1_500amol_R2.mzML 1 500 amol 2
24 1 UPS1_500amol_R3.mzML 1 500 amol 3

Alternatively, the experimental design can be specified with a file consisting of two tables whose headers are separated by a blank line. The two tables are:

  • The file section table and the sample section table.
  • The file section consists of columns Fraction_Group, Fraction, Spectra_Filepath, Label and Sample

The sample section consists of columns Sample, MSstats_Condition and MSstats_BioReplicate.

The content is the same as described for the one table format, except that the additional numeric sample column allows referencing between file and sample section.

Fraction_Group Fraction Spectra_Filepath Label Sample
1 1 UPS1_12500amol_R1.mzML 1 1
2 1 UPS1_12500amol_R2.mzML 1 2
... ...
... ...
...
22 1 UPS1_500amol_R1.mzML 1 22
Sample MSstats_Condition MSstats_BioReplicate
1 12500 amol 1
2 12500 amol 2
... ...
...
22 500 amol 3

Member Typedef Documentation

◆ MSFileSection

using MSFileSection = std::vector<MSFileSectionEntry>

Constructor & Destructor Documentation

◆ ExperimentalDesign() [1/2]

ExperimentalDesign ( )
default

◆ ExperimentalDesign() [2/2]

ExperimentalDesign ( const MSFileSection msfile_section,
const SampleSection sample_section 
)

Member Function Documentation

◆ errorIfAlreadyExists()

static void errorIfAlreadyExists ( std::set< T > &  container,
T &  item,
const String message 
)
staticprivate

◆ filterByBasenames()

Size filterByBasenames ( const std::set< String > &  bns)

filters the MSFileSection to only include a given subset of files whose basenames are given with bns

Returns
number of files that have been filtered

◆ fromConsensusMap()

static ExperimentalDesign fromConsensusMap ( const ConsensusMap c)
static

Extract experimental design from consensus map.

◆ fromFeatureMap()

static ExperimentalDesign fromFeatureMap ( const FeatureMap f)
static

Extract experimental design from feature map.

◆ fromIdentifications()

static ExperimentalDesign fromIdentifications ( const std::vector< ProteinIdentification > &  proteins)
static

Extract experimental design from identifications.

◆ getConditionToPathLabelVector()

std::vector<std::vector<std::pair<String, unsigned> > > getConditionToPathLabelVector ( ) const

return vector of filepath/label combinations that share the same conditions after removing replicate columns in the sample section (e.g. for merging across replicates)

◆ getConditionToSampleMapping()

std::map<std::vector<String>, std::set<unsigned> > getConditionToSampleMapping ( ) const

return a condition (unique combination of sample section values except replicate) to Sample index mapping

◆ getFileNames_()

std::vector< String > getFileNames_ ( bool  basename) const
private

◆ getFractions_()

std::vector<unsigned> getFractions_ ( ) const
private

◆ getFractionToMSFilesMapping()

std::map<unsigned int, std::vector<String> > getFractionToMSFilesMapping ( ) const

return fraction index to file paths (ordered by fraction_group)

◆ getLabels_()

std::vector<unsigned> getLabels_ ( ) const
private

◆ getMSFileSection()

const MSFileSection& getMSFileSection ( ) const

◆ getNumberOfFractionGroups()

unsigned getNumberOfFractionGroups ( ) const

◆ getNumberOfFractions()

unsigned getNumberOfFractions ( ) const

◆ getNumberOfLabels()

unsigned getNumberOfLabels ( ) const

◆ getNumberOfMSFiles()

unsigned getNumberOfMSFiles ( ) const

◆ getNumberOfSamples()

unsigned getNumberOfSamples ( ) const

◆ getPathLabelToConditionMapping()

std::map< std::pair< String, unsigned >, unsigned> getPathLabelToConditionMapping ( bool  use_basename_only) const

return <file_path, label> to condition mapping (a condition is a unique combination of all columns in the sample section, except for replicates.

◆ getPathLabelToFractionGroupMapping()

std::map< std::pair< String, unsigned >, unsigned> getPathLabelToFractionGroupMapping ( bool  use_basename_only) const

return <file_path, label> to fraction_group mapping

◆ getPathLabelToFractionMapping()

std::map< std::pair< String, unsigned >, unsigned> getPathLabelToFractionMapping ( bool  use_basename_only) const

return <file_path, label> to fraction mapping

◆ getPathLabelToPrefractionationMapping()

std::map< std::pair< String, unsigned >, unsigned> getPathLabelToPrefractionationMapping ( bool  use_basename_only) const

return <file_path, label> to prefractionation mapping (a prefractionation group is a unique combination of all columns in the sample section, except for replicates.

◆ getPathLabelToSampleMapping()

std::map< std::pair< String, unsigned >, unsigned> getPathLabelToSampleMapping ( bool  use_basename_only) const

return <file_path, label> to sample index mapping

◆ getSample()

unsigned getSample ( unsigned  fraction_group,
unsigned  label = 1 
)

◆ getSampleSection()

const ExperimentalDesign::SampleSection& getSampleSection ( ) const

◆ getSampleToConditionMapping()

std::map<String, unsigned> getSampleToConditionMapping ( ) const

return Sample name to condition mapping (a condition is a unique combination of all columns in the sample section, except for replicates. Numbering of conditions is alphabetical due to map.

◆ getSampleToPrefractionationMapping()

std::map<String, unsigned> getSampleToPrefractionationMapping ( ) const

uses getUniqueSampleRowToSampleMapping to get the reversed map mapping sample ID to a real unique sample

◆ getUniqueSampleRowToSampleMapping()

std::map<std::vector<String>, std::set<String> > getUniqueSampleRowToSampleMapping ( ) const

returns a map from a sample section row to sample id for clustering duplicate sample rows (e.g. to find all fractions of the same "sample")

◆ isFractionated()

bool isFractionated ( ) const
Returns
whether we have a fractionated design

◆ isValid_()

void isValid_ ( )
private

◆ pathLabelMapper_()

std::map< std::pair< String, unsigned >, unsigned> pathLabelMapper_ ( bool  ,
unsigned(*)(const ExperimentalDesign::MSFileSectionEntry &)  f 
) const
private

Generic Mapper (Path, Label) -> f(row)

◆ sameNrOfMSFilesPerFraction()

bool sameNrOfMSFilesPerFraction ( ) const
Returns
whether all fraction groups have the same number of fractions

◆ setMSFileSection()

void setMSFileSection ( const MSFileSection msfile_section)

◆ setSampleSection()

void setSampleSection ( const SampleSection sample_section)

◆ sort_()

void sort_ ( )
private

Member Data Documentation

◆ msfile_section_

MSFileSection msfile_section_
private

◆ sample_section_

SampleSection sample_section_
private