dstats.summary

Summary statistics such as mean, median, sum, variance, skewness, kurtosis. Except for median and median absolute deviation, which cannot be calculated online, all summary statistics have both an input range interface and an output range interface.

Notes:
The put method on the structs defined in this module returns this by ref. The use case for returning this is to enable these structs to be used with std.algorithm.reduce. The rationale for returning by ref is that the return value usually won't be used, and the overhead of returning a large struct by value should be avoided.

BUGS:
This whole module assumes that input will be doubles or types implicitly convertible to double. No allowances are made for user-defined numeric types such as BigInts. This is necessary for simplicity. However, if you have a function that converts your data to doubles, most of these functions work with any input range, so you can simply map this function onto your range.

Author:
David Simcha

double median (T)(T data);
Finds median of an input range in O(N) time on average. In the case of an even number of elements, the mean of the two middle elements is returned. This is a convenience founction designed specifically for numeric types, where the averaging of the two middle elements is desired. A more general selection algorithm that can handle any type with a total ordering, as well as selecting any position in the ordering, can be found at dstats.sort.quickSelect() and dstats.sort.partitionK(). Allocates memory, does not reorder input data.

double medianPartition (T)(T data);
Median finding as in median(), but will partition input data such that elements less than the median will have smaller indices than that of the median, and elements larger than the median will have larger indices than that of the median. Useful both for its partititioning and to avoid memory allocations. Requires a random access range with swappable elements.

struct MedianAbsDev ;
Plain old data holder struct for median, median absolute deviation. Alias this'd to the median absolute deviation member.

MedianAbsDev medianAbsDev (T)(T data);
Calculates the median absolute deviation of a dataset. This is the median of all absolute differences from the median of the dataset.

Returns:
A MedianAbsDev struct that contains the median (since it is computed anyhow) and the median absolute deviation.

Notes:
No bias correction is used in this implementation, since using one would require assumptions about the underlying distribution of the data.

double interquantileRange (R)(R data, double quantile = 0.25);
Computes the interquantile range of data at the given quantile value in O(N) time complexity. For example, using a quantile value of either 0.25 or 0.75 will give the interquartile range. (This is the default since it is apparently the most common interquantile range in common usage.) Using a quantile value of 0.2 or 0.8 will give the interquntile range.

If the quantile point falls between two indices, linear interpolation is used.

This function is somewhat more efficient than simply finding the upper and lower quantile and subtracting them.

Tip:
A quantile of 0 or 1 is handled as a special case and will compute the plain old range of the data in a single pass.

struct Mean ;
Output range to calculate the mean online. Getter for mean costs a branch to check for N == 0. This struct uses O(1) space and does *NOT* store the individual elements.

Note:
This struct can implicitly convert to the value of the mean.

Examples:
 Mean summ;
 summ.put(1);
 summ.put(2);
 summ.put(3);
 summ.put(4);
 summ.put(5);
 assert(summ.mean == 3);


pure nothrow @safe void put (double element);


pure nothrow @safe void put (typeof(this) rhs);
Adds the contents of rhs to this instance.

Examples:
 Mean mean1, mean2, combined;
 foreach(i; 0..5) {
     mean1.put(i);
 }

 foreach(i; 5..10) {
     mean2.put(i);
 }

 mean1.put(mean2);

 foreach(i; 0..10) {
     combined.put(i);
 }

 assert(approxEqual(combined.mean, mean1.mean));


const double sum ();


const double mean ();


const double N ();


const Mean toMean ();
Simply returns this. Useful in generic programming contexts.

const string toString ();


Mean mean (T)(T data);
Finds the arithmetic mean of any input range whose elements are implicitly convertible to double.

struct GeometricMean ;


pure nothrow @safe void put (double element);


pure nothrow @safe void put (typeof(this) rhs);
Combine two GeometricMean's.

const double geoMean ();


const double N ();


const string toString ();


double geometricMean (T)(T data);


U sum (T, U = Unqual!(IterType!(T)))(T data);
Finds the sum of an input range whose elements implicitly convert to double. User has option of making U a different type than T to prevent overflows on large array summing operations. However, by default, return type is T (same as input type).

struct MeanSD ;
Output range to compute mean, stdev, variance online. Getter methods for stdev, var cost a few floating point ops. Getter for mean costs a single branch to check for N == 0. Relatively expensive floating point ops, if you only need mean, try Mean. This struct uses O(1) space and does *NOT* store the individual elements.

Note:
This struct can implicitly convert to a Mean struct.

References:
Computing Higher-Order Moments Online.

http:
//people.xiph.org/~tterribe/notes/homs.html

Examples:
 MeanSD summ;
 summ.put(1);
 summ.put(2);
 summ.put(3);
 summ.put(4);
 summ.put(5);
 assert(summ.mean == 3);
 assert(summ.stdev == sqrt(2.5));
 assert(summ.var == 2.5);


pure nothrow @safe void put (double element);


pure nothrow @safe void put (typeof(this) rhs);
Combine two MeanSD's.

const double sum ();


const double mean ();


const double stdev ();


const double var ();


const double mse ();
Mean squared error. In other words, a biased estimate of variance.

const double N ();


const Mean toMean ();
Converts this struct to a Mean struct. Also called when an implicit conversion via alias this takes place.

const const MeanSD toMeanSD ();
Simply returns this. Useful in generic programming contexts.

const string toString ();


MeanSD meanStdev (T)(T data);
Puts all elements of data into a MeanSD struct, then returns this struct. This can be faster than doing this manually due to ILP optimizations.

double variance (T)(T data);
Finds the variance of an input range with members implicitly convertible to doubles.

double stdev (T)(T data);
Calculate the standard deviation of an input range with members implicitly converitble to double.

struct Summary ;
Output range to compute mean, stdev, variance, skewness, kurtosis, min, and max online. Using this struct is relatively expensive, so if you just need mean and/or stdev, try MeanSD or Mean. Getter methods for stdev, var cost a few floating point ops. Getter for mean costs a single branch to check for N == 0. Getters for skewness and kurtosis cost a whole bunch of floating point ops. This struct uses O(1) space and does *NOT* store the individual elements.

Note:
This struct can implicitly convert to a MeanSD.

References:
Computing Higher-Order Moments Online.

http:
//people.xiph.org/~tterribe/notes/homs.html

Examples:
 Summary summ;
 summ.put(1);
 summ.put(2);
 summ.put(3);
 summ.put(4);
 summ.put(5);
 assert(summ.N == 5);
 assert(summ.mean == 3);
 assert(summ.stdev == sqrt(2.5));
 assert(summ.var == 2.5);
 assert(approxEqual(summ.kurtosis, -1.9120));
 assert(summ.min == 1);
 assert(summ.max == 5);
 assert(summ.sum == 15);


pure nothrow @safe void put (double element);


pure nothrow @safe void put (typeof(this) rhs);
Combine two Summary's.

const double sum ();


const double mean ();


const double stdev ();


const double var ();


const double mse ();
Mean squared error. In other words, a biased estimate of variance.

const double skewness ();


const double kurtosis ();


const double N ();


const double min ();


const double max ();


const MeanSD toMeanSD ();
Converts this struct to a MeanSD. Called via alias this when an implicit conversion is attetmpted.

const string toString ();


double kurtosis (T)(T data);
Excess kurtosis relative to normal distribution. High kurtosis means that the variance is due to infrequent, large deviations from the mean. Low kurtosis means that the variance is due to frequent, small deviations from the mean. The normal distribution is defined as having kurtosis of 0. Input must be an input range with elements implicitly convertible to double.

double skewness (T)(T data);
Skewness is a measure of symmetry of a distribution. Positive skewness means that the right tail is longer/fatter than the left tail. Negative skewness means the left tail is longer/fatter than the right tail. Zero skewness indicates a symmetrical distribution. Input must be an input range with elements implicitly convertible to double.

Summary summary (T)(T data);
Convenience function. Puts all elements of data into a Summary struct, and returns this struct.

struct ZScore (T) if (isForwardRange!(T) && is(ElementType!(T) : double));


double front ();


void popFront ();


bool empty ();


typeof(this) save ();


double opIndex (size_t index);


double back ();


void popBack ();


size_t length ();


ZScore!(T) zScore (T)(T range);
Returns a range with whatever properties T has (forward range, random access range, bidirectional range, hasLength, etc.), of the z-scores of the underlying range. A z-score of an element in a range is defined as (element - mean(range)) / stdev(range).

Notes:
If the data contained in the range is a sample of a larger population, rather than an entire population, then technically, the results output from the ZScore range are T statistics, not Z statistics. This is because the sample mean and standard deviation are only estimates of the population parameters. This does not affect the mechanics of using this range, but it does affect the interpretation of its output.

Accessing elements of this range is fairly expensive, as a floating point multiply is involved. Also, constructing this range is costly, as the entire input range has to be iterated over to find the mean and standard deviation.

ZScore!(T) zScore (T)(T range, double mean, double sd);
Allows the construction of a ZScore range with precomputed mean and stdev.

Page was generated with on Wed May 25 22:15:55 2011