dstats.summary
Summary statistics such as mean, median, sum, variance, skewness, kurtosis.
Except for median and median absolute deviation, which cannot be calculated
online, all
summary
statistics have both an input range interface and an
output range interface.
Notes:
The put method on the structs defined in this module returns this by
ref. The use case for returning this is to enable these structs
to be used with std.algorithm.reduce. The rationale for returning
by ref is that the return value usually won't be used, and the
overhead of returning a large struct by value should be avoided.
BUGS:
This whole module assumes that input will be doubles or types implicitly
convertible to double. No allowances are made for user-defined numeric
types such as BigInts. This is necessary for simplicity. However,
if you have a function that converts your data to doubles, most of
these functions work with any input range, so you can simply map
this function onto your range.
Author:
David Simcha
- double
median
(T)(T data);
- Finds
median
of an input range in O(N) time on average. In the case of an
even number of elements, the mean of the two middle elements is returned.
This is a convenience founction designed specifically for numeric types,
where the averaging of the two middle elements is desired. A more general
selection algorithm that can handle any type with a total ordering, as well
as selecting any position in the ordering, can be found at
dstats.sort.quickSelect() and dstats.sort.partitionK().
Allocates memory, does not reorder input data.
- double
medianPartition
(T)(T data);
- Median finding as in median(), but will partition input data such that
elements less than the median will have smaller indices than that of the
median, and elements larger than the median will have larger indices than
that of the median. Useful both for its partititioning and to avoid
memory allocations. Requires a random access range with swappable
elements.
- struct
MedianAbsDev
;
- Plain old data holder struct for median, median absolute deviation.
Alias this'd to the median absolute deviation member.
- MedianAbsDev
medianAbsDev
(T)(T data);
- Calculates the median absolute deviation of a dataset. This is the median
of all absolute differences from the median of the dataset.
Returns:
A MedianAbsDev struct that contains the median (since it is
computed anyhow) and the median absolute deviation.
Notes:
No bias correction is used in this implementation, since using
one would require assumptions about the underlying distribution of the data.
- double
interquantileRange
(R)(R data, double quantile = 0.25);
- Computes the interquantile range of data at the given quantile value in O(N)
time complexity. For example, using a quantile value of either 0.25 or 0.75
will give the interquartile range. (This is the default since it is
apparently the most common interquantile range in common usage.)
Using a quantile value of 0.2 or 0.8 will give the interquntile range.
If the quantile point falls between two indices, linear interpolation is
used.
This function is somewhat more efficient than simply finding the upper and
lower quantile and subtracting them.
Tip:
A quantile of 0 or 1 is handled as a special case and will compute the
plain old range of the data in a single pass.
- struct
Mean
;
- Output range to calculate the mean online. Getter for mean costs a branch to
check for N == 0. This struct uses O(1) space and does *NOT* store the
individual elements.
Note:
This struct can implicitly convert to the value of the mean.
Examples:
Mean summ;
summ.put(1);
summ.put(2);
summ.put(3);
summ.put(4);
summ.put(5);
assert(summ.mean == 3);
- pure nothrow @safe void
put
(double element);
- pure nothrow @safe void
put
(typeof(this) rhs);
- Adds the contents of rhs to this instance.
Examples:
Mean mean1, mean2, combined;
foreach(i; 0..5) {
mean1.put(i);
}
foreach(i; 5..10) {
mean2.put(i);
}
mean1.put(mean2);
foreach(i; 0..10) {
combined.put(i);
}
assert(approxEqual(combined.mean, mean1.mean));
- const double
sum
();
- const double
mean
();
- const double
N
();
- const Mean
toMean
();
- Simply returns this. Useful in generic programming contexts.
- const string
toString
();
- Mean
mean
(T)(T data);
- Finds the arithmetic
mean
of any input range whose elements are implicitly
convertible to double.
- struct
GeometricMean
;
- pure nothrow @safe void
put
(double element);
- pure nothrow @safe void
put
(typeof(this) rhs);
- Combine two GeometricMean's.
- const double
geoMean
();
- const double
N
();
- const string
toString
();
- double
geometricMean
(T)(T data);
- U
sum
(T, U = Unqual!(IterType!(T)))(T data);
- Finds the
sum
of an input range whose elements implicitly convert to double.
User has option of making U a different type than T to prevent overflows
on large array summing operations. However, by default, return type is
T (same as input type).
- struct
MeanSD
;
- Output range to compute mean, stdev, variance online. Getter methods
for stdev, var cost a few floating point ops. Getter for mean costs
a single branch to check for N == 0. Relatively expensive floating point
ops, if you only need mean, try Mean. This struct uses O(1) space and
does *NOT* store the individual elements.
Note:
This struct can implicitly convert to a Mean struct.
References:
Computing Higher-Order Moments Online.
http:
//people.xiph.org/~tterribe/notes/homs.html
Examples:
MeanSD summ;
summ.put(1);
summ.put(2);
summ.put(3);
summ.put(4);
summ.put(5);
assert(summ.mean == 3);
assert(summ.stdev == sqrt(2.5));
assert(summ.var == 2.5);
- pure nothrow @safe void
put
(double element);
- pure nothrow @safe void
put
(typeof(this) rhs);
- Combine two MeanSD's.
- const double
sum
();
- const double
mean
();
- const double
stdev
();
- const double
var
();
- const double
mse
();
- Mean squared error. In other words, a biased estimate of variance.
- const double
N
();
- const Mean
toMean
();
- Converts this struct to a Mean struct. Also called when an
implicit conversion via alias this takes place.
- const const MeanSD
toMeanSD
();
- Simply returns this. Useful in generic programming contexts.
- const string
toString
();
- MeanSD
meanStdev
(T)(T data);
- Puts all elements of data into a MeanSD struct,
then returns this struct. This can be faster than doing this manually
due to ILP optimizations.
- double
variance
(T)(T data);
- Finds the
variance
of an input range with members implicitly convertible
to doubles.
- double
stdev
(T)(T data);
- Calculate the standard deviation of an input range with members
implicitly converitble to double.
- struct
Summary
;
- Output range to compute mean, stdev, variance, skewness, kurtosis, min, and
max online. Using this struct is relatively expensive, so if you just need
mean and/or stdev, try MeanSD or Mean. Getter methods for stdev,
var cost a few floating point ops. Getter for mean costs a single branch to
check for N == 0. Getters for skewness and kurtosis cost a whole bunch of
floating point ops. This struct uses O(1) space and does *NOT* store the
individual elements.
Note:
This struct can implicitly convert to a MeanSD.
References:
Computing Higher-Order Moments Online.
http:
//people.xiph.org/~tterribe/notes/homs.html
Examples:
Summary summ;
summ.put(1);
summ.put(2);
summ.put(3);
summ.put(4);
summ.put(5);
assert(summ.N == 5);
assert(summ.mean == 3);
assert(summ.stdev == sqrt(2.5));
assert(summ.var == 2.5);
assert(approxEqual(summ.kurtosis, -1.9120));
assert(summ.min == 1);
assert(summ.max == 5);
assert(summ.sum == 15);
- pure nothrow @safe void
put
(double element);
- pure nothrow @safe void
put
(typeof(this) rhs);
- Combine two Summary's.
- const double
sum
();
- const double
mean
();
- const double
stdev
();
- const double
var
();
- const double
mse
();
- Mean squared error. In other words, a biased estimate of variance.
- const double
skewness
();
- const double
kurtosis
();
- const double
N
();
- const double
min
();
- const double
max
();
- const MeanSD
toMeanSD
();
- Converts this struct to a MeanSD. Called via alias this when an
implicit conversion is attetmpted.
- const string
toString
();
- double
kurtosis
(T)(T data);
- Excess
kurtosis
relative to normal distribution. High
kurtosis
means that
the variance is due to infrequent, large deviations from the mean. Low
kurtosis
means that the variance is due to frequent, small deviations from
the mean. The normal distribution is defined as having
kurtosis
of 0.
Input must be an input range with elements implicitly convertible to double.
- double
skewness
(T)(T data);
- Skewness is a measure of symmetry of a distribution. Positive
skewness
means that the right tail is longer/fatter than the left tail. Negative
skewness
means the left tail is longer/fatter than the right tail. Zero
skewness
indicates a symmetrical distribution. Input must be an input
range with elements implicitly convertible to double.
- Summary
summary
(T)(T data);
- Convenience function. Puts all elements of data into a Summary struct,
and returns this struct.
- struct
ZScore
(T) if (isForwardRange!(T) && is(ElementType!(T) : double));
- double
front
();
- void
popFront
();
- bool
empty
();
- typeof(this)
save
();
- double
opIndex
(size_t index);
- double
back
();
- void
popBack
();
- size_t
length
();
- ZScore!(T)
zScore
(T)(T range);
- Returns a range with whatever properties T has (forward range, random
access range, bidirectional range, hasLength, etc.),
of the z-scores of the underlying
range. A z-score of an element in a range is defined as
(element - mean(range)) / stdev(range).
Notes:
If the data contained in the range is a sample of a larger population,
rather than an entire population, then technically, the results output
from the ZScore range are T statistics, not Z statistics. This is because
the sample mean and standard deviation are only estimates of the population
parameters. This does not affect the mechanics of using this range,
but it does affect the interpretation of its output.
Accessing elements of this range is fairly expensive, as a
floating point multiply is involved. Also, constructing this range is
costly, as the entire input range has to be iterated over to find the
mean and standard deviation.
- ZScore!(T)
zScore
(T)(T range, double mean, double sd);
- Allows the construction of a ZScore range with precomputed mean and
stdev.
|