16.2 Statistics Methods

The CÆSAR Statistics Class is used to keep track of the statistics associated with a sample population or distribution. Currently, the class stores only conglomerate information about the population and does not store the entire distribution. Therefore, certain types of statistical information, such as the median and the mode, are not available. The following values are calculated by the class.

Given a set of N variables denoted by $ \left\{\vphantom{ x_i }\right.$xi$ \left.\vphantom{ x_i }\right\}$, the arithmetic mean, or average, of the distribution is given by

$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle$ = $\displaystyle {\frac{{1}}{{N}}}$$\displaystyle \sum_{{i=1}}^{N}$xi  . (16.1)

The geometric mean of the distribution (calculated only if all xi are positive) is given by

$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle_{G}^{}$ = $\displaystyle \sqrt[N]{{\prod\nolimits_{i=1}^N x_i}}$  , (16.2)
or equivalently by

$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle_{G}^{}$ = exp$\displaystyle \left(\vphantom{ \frac{1}{N} \sum_{i=1}^N \ln x_i }\right.$$\displaystyle {\frac{{1}}{{N}}}$$\displaystyle \sum_{{i=1}}^{N}$ln xi$\displaystyle \left.\vphantom{ \frac{1}{N} \sum_{i=1}^N \ln x_i }\right)$  . (16.3)

The harmonic mean of the distribution is given by:

$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle_{H}^{}$ = $\displaystyle \left[\vphantom{ \frac{1}{N} \sum_{i=1}^N \frac{1}{x_i} }\right.$$\displaystyle {\frac{{1}}{{N}}}$$\displaystyle \sum_{{i=1}}^{N}$$\displaystyle {\frac{{1}}{{x_i}}}$$\displaystyle \left.\vphantom{ \frac{1}{N} \sum_{i=1}^N \frac{1}{x_i} }\right]^{{-1}}_{}$  . (16.4)

The standard deviation of the distribution is given by

s = $\displaystyle \sqrt{{\frac{1}{N-1} \sum_{i=1}^N \left( x_i - \left\langle x \right\rangle \right)^2}}$  , (16.5)
or equivalently by

s = $\displaystyle \sqrt{{\frac{1}{N-1} \left[ \sum_{i=1}^N \left( x_i^2 \right)
- N \left\langle x \right\rangle ^2 \right]}}$  . (16.6)
Note that the factor N - 1, rather than N, is required in the denominator to account for the fact that the parameter $ \left\langle\vphantom{ x }\right.$x$ \left.\vphantom{ x }\right\rangle$ has been determined from the distribution and not independently. This formula for the standard deviation is sometimes called the sample standard deviation. The limits of the sample mean and the sample standard deviation give the true values:
$\displaystyle \mu$ = $\displaystyle \lim_{{N\to\infty}}^{}$$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle$  , (16.7)
$\displaystyle \sigma$ = $\displaystyle \lim_{{N\to\infty}}^{}$s  . (16.8)

The Statistics Class also calculates minimum and maximum values for the distribution. The Valid_State procedure verifies that all of the means lie within the extremum bounds, and that the following mathematical relationship holds:

$\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle_{H}^{}$ $\displaystyle \leq$ $\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle_{G}^{}$ $\displaystyle \leq$ $\displaystyle \left\langle\vphantom{ x }\right.$x$\displaystyle \left.\vphantom{ x }\right\rangle$  . (16.9)

Michael L. Hall