![]() |
OpenMS
|
Various statistical functions. More...
Classes | |
| struct | KernelDensityEstimation |
| Kernel Density Estimation utilities using FFT-based methods. More... | |
| struct | MultipleTesting |
| Statistical functions for multiple testing correction. More... | |
Functions | |
| template<typename IteratorType > | |
| static void | checkIteratorsNotNULL (IteratorType begin, IteratorType end) |
| Helper function checking if two iterators are not equal. | |
| template<typename IteratorType > | |
| static void | checkIteratorsEqual (IteratorType begin, IteratorType end) |
| Helper function checking if two iterators are equal. | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static void | checkIteratorsAreValid (IteratorType1 begin_b, IteratorType1 end_b, IteratorType2 begin_a, IteratorType2 end_a) |
| Helper function checking if an iterator and a co-iterator both have a next element. | |
| template<typename IteratorType > | |
| static double | sum (IteratorType begin, IteratorType end) |
| Calculates the sum of a range of values. | |
| template<typename IteratorType > | |
| static double | mean (IteratorType begin, IteratorType end) |
| Calculates the mean of a range of values. | |
| template<typename IteratorType > | |
| static double | median (IteratorType begin, IteratorType end, bool sorted=false) |
| Calculates the median of a range of values. | |
| template<typename IteratorType > | |
| double | MAD (IteratorType begin, IteratorType end, double median_of_numbers) |
| median absolute deviation (MAD) | |
| template<typename IteratorType > | |
| double | MeanAbsoluteDeviation (IteratorType begin, IteratorType end, double mean_of_numbers) |
| mean absolute deviation (MeanAbsoluteDeviation) | |
| template<typename IteratorType > | |
| static double | quantile1st (IteratorType begin, IteratorType end, bool sorted=false) |
| Calculates the first quantile of a range of values. | |
| template<typename IteratorType > | |
| static double | quantile3rd (IteratorType begin, IteratorType end, bool sorted=false) |
| Calculates the third quantile of a range of values. | |
| template<typename IteratorType > | |
| static double | quantile (IteratorType begin, IteratorType end, double q) |
| Calculates the q-quantile (0 <= q <= 1) of a sorted range of values. | |
| template<typename IteratorType > | |
| static double | sd (IteratorType begin, IteratorType end, double mean=std::numeric_limits< double >::max()) |
| Calculates the standard deviation of a range of values. | |
| template<typename IteratorType > | |
| static double | absdev (IteratorType begin, IteratorType end, double mean=std::numeric_limits< double >::max()) |
| Calculates the absolute deviation of a range of values. | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | covariance (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the covariance of two ranges of values. | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | meanSquareError (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the mean square error for the values in [begin_a, end_a) and [begin_b, end_b) | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | rootMeanSquareError (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the root mean square error (RMSE) for the values in [begin_a, end_a) and [begin_b, end_b) | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | classificationRate (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the classification rate for the values in [begin_a, end_a) and [begin_b, end_b) | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | matthewsCorrelationCoefficient (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the Matthews correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b) | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | pearsonCorrelationCoefficient (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| Calculates the Pearson correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b) | |
| template<typename IteratorType1 , typename IteratorType2 > | |
| static double | rankCorrelationCoefficient (IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b) |
| calculates the rank correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b) | |
| static double | bwNrd0 (const std::vector< double > &x) |
| Bandwidth selector using the "nrd0" rule-of-thumb for kernel density estimation. | |
| static std::vector< double > | linBin (const std::vector< double > &x, double xmin, double xmax, std::size_t nbins, const std::vector< double > *weights) |
| Linear binning of data onto an equally-spaced grid. | |
| static std::vector< double > | forRt (const std::vector< double > &X, std::size_t M=0) |
| Forward FFT of real-valued data using Munro-packed format. | |
| static std::vector< double > | revRt (const std::vector< double > &Xp, std::size_t M=0) |
| Inverse FFT of Munro-packed data to real-valued output. | |
| static std::vector< double > | silvermanKernelFFT (double bw, std::size_t M, double RANGE) |
| Compute the FFT of a Gaussian kernel in Munro-packed format. | |
| static std::pair< std::vector< double >, std::vector< double > > | gridKdeFFT (const std::vector< double > &x, double bw, std::size_t gridsize=512, double cut=3.0) |
| Fast kernel density estimation on a regular grid using FFT convolution. | |
| static std::vector< double > | kdeFFTEval (const std::vector< double > &x, double bw, std::size_t gridsize=512, double cut=3.0) |
| Evaluate kernel density estimates at the data points themselves. | |
Various statistical functions.
These functions are defined in OpenMS/MATH/StatisticFunctions.h .
|
static |
Calculates the absolute deviation of a range of values.
| Exception::InvalidRange | is thrown if the range is empty |
References OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::mean().
|
static |
Bandwidth selector using the "nrd0" rule-of-thumb for kernel density estimation.
Computes an appropriate bandwidth for Gaussian kernel density estimation using Silverman's rule-of-thumb (also known as "normal reference distribution" or "nrd0"). This is a quick, robust bandwidth estimator suitable for unimodal distributions.
The bandwidth is calculated as: bw = 0.9 * min(sd, IQR/1.34) * n^(-1/5)
where sd is standard deviation, IQR is interquartile range, and n is sample size.
Reference: Silverman BW. (1986) "Density Estimation for Statistics and Data Analysis." Chapman & Hall/CRC. ISBN 978-0412246203
| x | Vector of data values for which bandwidth is computed |
|
static |
Helper function checking if an iterator and a co-iterator both have a next element.
| Exception::InvalidRange | is thrown if the iterator do not end simultaneously |
Referenced by OpenMS::Math::classificationRate(), OpenMS::Math::covariance(), OpenMS::Math::matthewsCorrelationCoefficient(), OpenMS::Math::meanSquareError(), OpenMS::Math::pearsonCorrelationCoefficient(), and OpenMS::Math::rankCorrelationCoefficient().
|
static |
Helper function checking if two iterators are equal.
| Exception::InvalidRange | is thrown if the iterators are not equal |
Referenced by OpenMS::Math::classificationRate(), OpenMS::Math::covariance(), OpenMS::Math::matthewsCorrelationCoefficient(), OpenMS::Math::meanSquareError(), OpenMS::Math::pearsonCorrelationCoefficient(), and OpenMS::Math::rankCorrelationCoefficient().
|
static |
Helper function checking if two iterators are not equal.
| Exception::InvalidRange | is thrown if the range is NULL |
Referenced by OpenMS::Math::absdev(), OpenMS::Math::classificationRate(), OpenMS::Math::covariance(), OpenMS::Math::matthewsCorrelationCoefficient(), OpenMS::Math::mean(), OpenMS::Math::meanSquareError(), OpenMS::Math::median(), OpenMS::Math::pearsonCorrelationCoefficient(), OpenMS::Math::quantile(), OpenMS::Math::quantile1st(), OpenMS::Math::quantile3rd(), OpenMS::Math::rankCorrelationCoefficient(), OpenMS::Math::sd(), and OpenMS::Math::variance().
|
static |
Calculates the classification rate for the values in [begin_a, end_a) and [begin_b, end_b)
Calculates the classification rate for the data given by the two iterator ranges.
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or empty. |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), and OpenMS::Math::checkIteratorsNotNULL().
|
static |
Calculates the covariance of two ranges of values.
Note that the two ranges must be of equal size.
| Exception::InvalidRange | is thrown if the range is empty |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::mean().
|
static |
Forward FFT of real-valued data using Munro-packed format.
Performs a forward Fast Fourier Transform on real-valued input data, returning the result in a compact Munro-packed format. This format takes advantage of the Hermitian symmetry property of real FFTs to store all unique frequency information in a real-valued vector of the same length as the input.
The output format is: [Re(Y_0), Re(Y_1), ..., Re(Y_{M/2}), Im(Y_1), ..., Im(Y_{M/2-1})] where Y_k are the complex FFT coefficients. The result is UNSCALED (no normalization factor applied); the caller must handle any necessary scaling.
Reference: Munro WD. (1976) "Efficient computation of the discrete Fourier transform." Also described in: Press WH et al. (2007) "Numerical Recipes" 3rd Ed. Section 12.3
| X | Real-valued input vector |
| M | Length of FFT (if 0, uses X.size() and rounds up to next power of 2) |
|
static |
Fast kernel density estimation on a regular grid using FFT convolution.
Computes Gaussian kernel density estimates at equally-spaced grid points using the FFT-based convolution algorithm. This approach scales as O(n + M*log(M)) compared to O(n*M) for direct evaluation, making it highly efficient for large datasets or fine grids.
The Silverman Algorithm: Uses the convolution theorem to compute KDE efficiently in the frequency domain: density(x) = IFFT(FFT(binned_data) * FFT(kernel))
Silverman's analytical kernel representation: Instead of FFT-ing a spatial Gaussian kernel, the algorithm computes the kernel analytically in the frequency domain using: K(f) = exp(-2*pi^2*sigma^2*f^2) / (1 - f^2*pi^2/3) This is both more numerically stable and efficient than FFT-ing the spatial kernel, and avoids the need for an additional FFT operation.
Implementation steps:
The grid extends from min(x) - cut*bw to max(x) + cut*bw.
Reference: Silverman BW. (1982) "Algorithm AS 176: Kernel density estimation using the Fast Fourier Transform." J. R. Statist. Soc. C 31(1):93-99. DOI: 10.2307/2347084
| x | Data values for which to estimate density |
| bw | Bandwidth of the Gaussian kernel |
| gridsize | Number of equally-spaced grid points (M). Should be power of 2 for FFT efficiency |
| cut | Extension factor: grid extends cut*bw beyond data range on each side |
gridsize
|
static |
Evaluate kernel density estimates at the data points themselves.
Computes KDE values at each input data point using the FFT-grid method followed by cubic spline interpolation. This is more efficient than direct evaluation for large datasets, as the FFT-grid computation scales as O(n + M*log(M)) followed by O(n*log(M)) for spline interpolation, compared to O(n^2) for direct methods.
The function first computes KDE on a regular grid using gridKdeFFT(), then interpolates these grid values to the query points using cubic spline interpolation.
| x | Data values at which to evaluate the density |
| bw | Bandwidth of the Gaussian kernel |
| gridsize | Number of grid points for FFT computation (default 512) |
| cut | Extension factor for grid range (default 3.0) |
x
|
static |
Linear binning of data onto an equally-spaced grid.
Distributes data points onto a regular grid using linear interpolation, which is a key preprocessing step for FFT-based kernel density estimation. Each data point is allocated to its two nearest grid points proportionally based on distance.
If weights are provided, weighted counts are computed; otherwise uniform weights of 1.0 are used.
Reference: Scott DW. (1992) "Multivariate Density Estimation: Theory, Practice, and Visualization." Wiley. ISBN 978-0471547709 (Section on binned kernel estimation)
| x | Vector of data values to be binned |
| xmin | Minimum value of grid (inclusive) |
| xmax | Maximum value of grid (inclusive) |
| nbins | Number of bins in the grid |
| weights | Optional vector of weights for each data point. If nullptr, uniform weights are used. |
nbins containing (weighted) counts at each grid point| double MAD | ( | IteratorType | begin, |
| IteratorType | end, | ||
| double | median_of_numbers | ||
| ) |
median absolute deviation (MAD)
Computes the MAD, defined as
MAD = median( | x_i - median(x) | ) for a vector x with indices i in [1,n].
Sortedness of the input is not required (nor does it provide a speedup). For efficiency, you must provide the median separately, in order to avoid potentially duplicate efforts (usually one computes the median anyway externally).
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | median_of_numbers | The precomputed median of range begin - end. |
References OpenMS::Math::median().
|
static |
Calculates the Matthews correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b)
Calculates the Matthews correlation coefficient for the data given by the two iterator ranges. The values in [begin_a, end_a) have to be the predicted labels and the values in [begin_b, end_b) have to be the real labels.
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or empty. |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), and OpenMS::Math::checkIteratorsNotNULL().
|
static |
Calculates the mean of a range of values.
| Exception::InvalidRange | is thrown if the range is NULL |
References OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::sum().
Referenced by OpenMS::Math::absdev(), OpenMS::Math::covariance(), OpenMS::makePeakPositionUnique(), BasicStatistics< RealT >::normalDensity_sqrt2pi(), OpenMS::Math::sd(), BasicStatistics< RealT >::setMean(), SummaryStatistics< T >::SummaryStatistics(), and OpenMS::Math::variance().
| double MeanAbsoluteDeviation | ( | IteratorType | begin, |
| IteratorType | end, | ||
| double | mean_of_numbers | ||
| ) |
mean absolute deviation (MeanAbsoluteDeviation)
Computes the MeanAbsoluteDeviation, defined as
MeanAbsoluteDeviation = mean( | x_i - mean(x) | ) for a vector x with indices i in [1,n].
For efficiency, you must provide the mean separately, in order to avoid potentially duplicate efforts (usually one computes the mean anyway externally).
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | mean_of_numbers | The precomputed mean of range begin - end. |
|
static |
Calculates the mean square error for the values in [begin_a, end_a) and [begin_b, end_b)
Calculates the mean square error for the data given by the two iterator ranges.
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or empty. |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), and OpenMS::Math::checkIteratorsNotNULL().
Referenced by OpenMS::Math::rootMeanSquareError().
|
static |
Calculates the median of a range of values.
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | sorted | Is the range already sorted? If not, it will be sorted. |
| Exception::InvalidRange | is thrown if the range is NULL |
References OpenMS::Math::checkIteratorsNotNULL().
Referenced by NucleicAcidSearchEngine::generateLFQInput_(), OpenMS::Math::MAD(), OpenMS::makePeakPositionUnique(), OpenMS::Math::quantile1st(), OpenMS::Math::quantile3rd(), and SummaryStatistics< T >::SummaryStatistics().
|
static |
Calculates the Pearson correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b)
Calculates the linear correlation coefficient for the data given by the two iterator ranges.
If one of the ranges contains only the same values 'nan' is returned.
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or empty. |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), and OpenMS::Math::checkIteratorsNotNULL().
|
static |
Calculates the q-quantile (0 <= q <= 1) of a sorted range of values.
Assumes the range [begin, end) is already sorted in non-decreasing order. Uses the common "Type 7" definition (linear interpolation):
pos = q * (n - 1) idx = floor(pos), frac = pos - idx quantile = (1 - frac) * x[idx] + frac * x[idx + 1]
Exact endpoints:
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | q | Quantile in [0, 1] |
| Exception::InvalidRange | is thrown if the range is NULL or empty. |
| Exception::InvalidValue | is thrown if q is outside [0, 1]. |
References OpenMS::Math::checkIteratorsNotNULL(), and OPENMS_PRECONDITION.
|
static |
Calculates the first quantile of a range of values.
The range is divided into half and the median for the first half is returned.
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | sorted | Is the range already sorted? If not, it will be sorted. |
| Exception::InvalidRange | is thrown if the range is NULL |
References OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::median().
Referenced by SummaryStatistics< T >::SummaryStatistics().
|
static |
Calculates the third quantile of a range of values.
The range is divided into half and the median for the second half is returned.
| [in] | begin | Start of range |
| [in] | end | End of range (past-the-end iterator) |
| [in] | sorted | Is the range already sorted? If not, it will be sorted. |
| Exception::InvalidRange | is thrown if the range is NULL |
References OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::median().
Referenced by SummaryStatistics< T >::SummaryStatistics().
|
static |
calculates the rank correlation coefficient for the values in [begin_a, end_a) and [begin_b, end_b)
Calculates the rank correlation coefficient for the data given by the two iterator ranges.
If one of the ranges contains only the same values 'nan' is returned.
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or empty. |
References OpenMS::Math::checkIteratorsAreValid(), OpenMS::Math::checkIteratorsEqual(), OpenMS::Math::checkIteratorsNotNULL(), and OpenMS::Math::computeRank().
|
static |
Inverse FFT of Munro-packed data to real-valued output.
Performs the inverse operation of forRt(), reconstructing a real-valued signal from its Munro-packed frequency domain representation. The output is scaled by multiplying by M (applied by the evergreen::real_ifft function).
The input must be in Munro-packed format: [Re(Y_0), Re(Y_1), ..., Re(Y_{M/2}), Im(Y_1), ..., Im(Y_{M/2-1})]
Reference: Munro WD. (1976) "Efficient computation of the discrete Fourier transform." Also described in: Press WH et al. (2007) "Numerical Recipes" 3rd Ed. Section 12.3
| Xp | Munro-packed real-valued frequency coefficients |
| M | Length of output (if 0, uses Xp.size()) |
|
static |
Calculates the root mean square error (RMSE) for the values in [begin_a, end_a) and [begin_b, end_b)
Computes the square root of the mean of the squared differences between the two iterator ranges (i.e., RMSE = sqrt(MSE)). .
| Exception::InvalidRange | is thrown if the iterator ranges are not of the same length or are empty. |
References OpenMS::Math::meanSquareError().
|
static |
Calculates the standard deviation of a range of values.
The mean can be provided explicitly to save computation time. If left at default, it will be computed internally.
| Exception::InvalidRange | is thrown if the range is empty |
References OpenMS::Math::checkIteratorsNotNULL(), OpenMS::Math::mean(), and OpenMS::Math::variance().
|
static |
Compute the FFT of a Gaussian kernel in Munro-packed format.
Generates the frequency-domain representation of a Gaussian kernel with specified bandwidth, suitable for convolution-based kernel density estimation via FFT. The result is in Munro-packed format compatible with forRt() and revRt().
This implementation uses the Silverman transform, which efficiently computes the Gaussian kernel in frequency space as: K(f) = exp(-2*pi^2*sigma^2*f^2) where sigma = bw/4.
Reference: Silverman BW. (1982) "Algorithm AS 176: Kernel density estimation using the Fast Fourier Transform." Journal of the Royal Statistical Society. Series C (Applied Statistics) 31(1):93-99. DOI: 10.2307/2347084
Also implemented in: Python statsmodels.nonparametric.kdetools.silverman_transform
| bw | Bandwidth of the Gaussian kernel (standard deviation) |
| M | Length of the FFT grid (number of frequency bins) |
| RANGE | Range of the spatial domain over which the kernel will be applied |
|
static |
Calculates the sum of a range of values.
Referenced by OpenMS::makePeakPositionUnique(), OpenMS::Math::mean(), BasicStatistics< RealT >::normalApproximationHelper_(), BasicStatistics< RealT >::normalApproximationHelper_(), and BasicStatistics< RealT >::setSum().