Mathematically Analytic Distributions Can Explain Radioanalytic Results
Allen
Brodsky*
Science
Applications International Corporation
1410
Spring Hill Road, Mail Stop SH2-1
McLean,
VA 22102
In
a poster at last year’s BAER meeting (Brodsky and Barss 1999) and in other
presentations (Brodsky et al. 1999;
Brodsky 2000) we reported on the characterization of a distribution of
radioanalytic results by the use of a mathematical convolution of probability
distribution functions fit to (observed) lognormal distributions of fission
tracks from plutonium in population urine samples, and in simulated (reagent)
urine blanks. The same type of
mathematical derivation can be used to characterize and explain the shapes of
other distributions of data from bioassay or environmental samples. (The
difference between two variates, each lognormally distributed, is not
lognormally distributed; on the other hand, the difference between two normally
distributed variates is normally distributed and the resulting distribution can
be immediately written down in functional form.) The rigorous mathematical derivation of the distribution function
of net results explained, in the fission track analysis case, the reasons that
a plot of the net urine concentrations on probability vs. log scale coordinates
does seem to fit the lognormal straight line above median concentrations, but
curves to the left on the logarithmic scale below the median as urine
concentrations of plutonium approach those of the blanks.
Further
examination of the literature shows other distributions, particularly of
concentrations of radionuclides in human tissue samples at very low levels,
that can be explained by a similar mathematical analysis. Actual distributions
of radionuclides in samples that might be truly lognormal will appear at low levels to deviate from
their actual distribution because the subtraction of blank measurements having
random fluctuations produces an increased population of small and negative
values at low analyte levels. The net
results thus force the line on a probability vs. log concentration plot to
curve to the left, usually below the median concentration.
Since
the logic of the mathematical derivations for the resulting “interpreted”
distributions can be applied to the
subtraction of sample and blank distributions represented by any two continuous
functions (an algorithm applied to subtraction of two discrete distributions
has been presented in Brodsky (1992), the derivation of the interpreted
distribution for subtraction of lognormally-distributed blanks from lognormally
distributed population urine samples, as applied and verified in fission track
analysis (FTA) is presented in detail in the Appendix to this paper.
*Also,
Adjunct Professor, Health Physics Graduate Program, Georgetown University. The views expressed in this paper are those
of the author alone, and do not necessarily reflect those of any employer. Also, the data presented are not necessarily
representative of any appropriate sample of the population, and do not represent
an actual data sample used in any official project.
For
the well-known (constant) blank subtraction:
The cumulative distribution function
("well-known" blank) for the interpreted net plutonium activity in
urines from a control population unexposed to plutonium other than from fallout
is:
w=X+b/m
Fx(X) = òw®0 (1/sg,fyÖ2p)(1/w)exp[- (ln mw - ln mg,fy)2/2sg,fy2] dw,
where the parameters for the combined controls are
m = 1.073; mg,fy = 32; sg,fy = 0.72732; sg,fy2 = 0.529
Here,
m is the slope of the calibration plot of blanks and spikes (in tracks/aCi);
the remaining parameters are the median, standard deviation of the natural
logarithms, and variance of the natural logarithms, of the corrected track (fy)
distributions before blank subtraction.
For
the case where a random, paired, blank is subtracted from each sample in a
stable analytic process, the cumulative distribution of the urine activities
becomes:
B X+(b/m)
F(X) = ò p(b) {ò pw(w) dw} db,
e e
where (as for the first FxX) the value
of e for the lower limit of
integration of each integral must be close enough to zero for convergence of
the integration (but not = 0), and the upper limit B of the integration over db
must be high enough to include the range over which the probability is very
close to 1 of obtaining a value of b.
Here, the probability that a
plutonium activity, before subtraction of the paired blank activity, will be in
the interval dw is:
pw(w) dw = (1/sg,fyÖ2p)(1/w)exp[- (ln mw - ln mg,fy)2/2sg,fy2 dw
The probability that a blank will have the number of
tracks b in the interval db is, using the maximum likelihood parameters to fit
a lognormal function to the blank distribution,
p(b)db = (1/sbÖ2p)(1/b)exp[- (ln b - ln mgb)2/2sb2 db
The
parameter mgb is the median of the blank
activity distribution, and sb is the standard deviation
of the distribution of the natural logarithms of the blank activities. The analytically
fit cumulative distributions F(X) were found in both cases to closely agree
with the interpreted results that were reported as net activity of plutonium in
the urine samples (See Appendix, Figures 4 and 5, and Exhibits A and B).
1.
The
derivation of the probability
distributions of “interpreted” or “reported” results, when blanks are
subtracted from variable measurements that range down near the blank values,
can explain the shapes of underlying distributions of radioactivity in human
samples.
2.
Lognormal
distributions of activity in human samples are more “normal” than normal
distributions, but the true distribution shapes must be inferred from
measurements well above the levels of the appropriate blanks.
3.
Comparisons
of the data in Figures 1-5 with other data in the literature indicate that the
extrapolation of the straight line portions in Figures 4 and 5 would, in the
absence of further data, yield a reasonable estimate of the distribution (about
1998) of urinary excretion of plutonium (239) in the general U. S. population
exposed only to fallout plutonium. (Errors of measurement average out to some
degree in the fitting of this overall distribution function.) This distribution
could then be a reasonable assumed prior distribution for Bayesian analyses
regarding whether a population is exposed to a given alternative increment of
plutonium above that of the general population.
APPENDIX – DETAILED DERIVATION OF “INTERPRETED” DISTRIBUTIONS
Calibration Data
The
calibration data in fission track analysis of plutonium (FTA) (Moorthy et al.
1988) consist of a series of measurements of fission tracks on quartz
slides. These slides contain residues
from simulated synthetic urine blanks, and residues from “spikes” of these
blanks to which known quantities of Pu-239 have been added. Checks are also
made on the cleaned slide themselves, which usually produce only a few tracks.
The slides are then irradiated with neutrons in a nuclear reactor, to produce
tracks of fission fragments from the fission of the Pu-239 in each of the
residues (Moorthy et al. 1988). The
chemical processes include detailed procedures for removing, as much as
possible, any uranium or other fissionable material that would produce fission
tracks interfering with the specific identification of tracks originating from
plutonium fission. The blanks are composed of a mixture of chemicals that
simulate the constituents of urine; they are also subjected to the same
physico-chemical processes and deposited on the quartz slides in the same
manner as the chemical residue from human urine samples. The blanks thus yield residues chemically
similar to those deposited from the human urine samples. The blanks to which known quantities of
plutonium have been added will be denoted by the shorter term “spikes”. A known
mass quantity of plutonium is also deposited on one or two areas of the slides
of a batch irradiation as a neutron flux monitor, to normalize to a total
neutron fluence. The complex physico-chemical
processes leading to deposition of each blank and spike are standardized within
a homogeneous batch of data; the dates for a given procedure are documented.
A
simple (non-weighted) regresson line of tracks vs. amount of Pu in attocuries
(aCi) was used as the calibration data for interpreting plutonium in human
urine samples, since the absolute magnitude of the dispersion of y (track)
values about the regression line was about the same vs. amount of Pu added, and
were approximately normal (Gaussian) (see Figure 1).
The
linear regression equation for the calibration line will thus be of the form:
Y
= mX + b
Eq. 1
For
comparison, the slopes, intercepts and their 95% confidence limits obtained
from several sets of data are tabulated below:
SAMPLE |
SLOPE (with 95% CI in
parentheses) |
INTERCEPT (with 95% CI in
parentheses) |
191 blanks and spikes |
m = 1.073 (1.033, 1.114) |
b = 19.18 (14.95, 23.42) |
172 blanks and spikes |
m = 1.079 (1.036, 1.121) |
b = 20.04 (15.56, 24.51) |
115 spikes alone |
m = 1.071 (1.003, 1.139) |
b = 19.58 (10.47, 28.70) |
This
table shows that the calibration data provided a relatively stable line over
the period of analyses used in this illustration.
. Thus, for interpretation of human samples,
the interpreted one-day excretion of plutonium in a given urine sample is
obtained as:
xj
= (yj fj - b)/m
Eq.
2
where xj = the amount of plutonium in a
one-day excretion of urine, interpreted from
a sample of volume Vj (actually measured by the proportional mass
of the residue used for neutron irradiation),
yj = the number of
fission tracks observed on the standardized area of the slide
where the reduced sample
was deposited,
fj = a scaling factor to normalize the tracks from
a human urine sample to a 24-hour volume, and to normalize to a standard
neutron fluence,
b = the intercept of Equation 1 obtained from regression analysis,
m = the slope of Equation 1 obtained from regression analysis.
Derivation of the analytical cumulative distribution function of net plutonium activity (per 24 hour sample) calculated from Equation 2, when best estimates of b and m from prior calibration data are taken to be constants (CONSTANT BLANK CASE)
The
distribution of x values as interpreted for the control urine samples is, from
Equation 2, the same as the distribution of
yf/m - b/m. First, the density
function must be obtained for the distribution of yf. The cumulative distribution of data values is shown in Figure 2
for the products yf, corrected tracks per 24-hour urines, are presented for a
“control population” set of urines taken from persons living in several
geographic regions. The distribution of tracks from simulated 24-hour urine
blanks is seen in Figure 3 to also of lognormal shape. (Tests were made against
other fits.) Although regional
differences are evident, the pooled sample is seen to be fit well by a
lognormal cumulative line, which will be used in this example derivation.
(Other analytical functions fitting the data can be used in other cases.) (Note:
Although the counting of tracks would seem to yield discrete density functions,
the functions are converted to continuous variates by the continuously varying
correction factors and yields. Algorithms for the subtraction of two random
variates, both Poisson distributed, were illustrated in Brodsky (1992); here,
it is appropriate to consider the density functions of interest to be
continuous functions.)
Estimates
of the parameters of the lognormal fit can be made from graphs such as Figure
2.
Graphic estimate of mg = value on x axis where
ordinate is 0.50 or 50%
Eq. 3
Graphic estimate of sg
= x value for 0.8413/x value for 0.50 =
x0.8413/x0.50
or = x0.50/x0.1587 , or to obtain a better
estimate by averaging,
=
0.5{(x0.8413/x0.50) + (x0.50/x0.1587)
Eq. 4
Then, the variance in the natural
logarithm of x over the lognormal distribution is
(Brodsky 1982):
s2 = (ln sg)2
Eq. 5
The parameter sg has been given different
names in the literature (Brodsky 1982); it will be called the “standard
geometric deviation” here. It is a very
useful parameter, since it can be used to obtain a particular set of confidence
intervals about the median (geometric mean) read on the logarithmic scale of
the graphic plot); e.g., (mg/sg, mg
x sg) provides a 95% “confidence interval”. Note: The symbol s is not here used for the sample estimator of a true s,
but is defined above; to avoid a complexity of symbols, all parameters in this
derivation should be understood to be derived from a sample of data.
The
underlying probability density function for the distribution in corrected track
readings for all controls, denoted by pfy(u) with u = fy, is now
given by (using the formulation in Brodsky 1982):
pfy(u)
= (1/sg,fyÖ2p)(1/u)exp[-
(ln u - ln mg,fy)2/2sg,fy2]
Eq. 6
where the parameters mg,fy = 32, and sg,fy
= (0.529)0.5 = 0.727
are obtained from the data fit in Figure 2.
These parameters were obtained as maximum likelhood estimates (geometric
mean and variance of ln fy) and were in good agreement with parameters obtained
from the graph.
The
cumulative distribution function giving the probability that a value of fy will
be less than or equal to u is then:
z=u
Ffy(u) = òz®0 pfy(z)
dz
Eq. 7
Next,
the distribution functions for the terms fy/m are obtained as follows:
Suppose that variate z has a pdf of pz(z)
over the range z1 to z2.
Then, the probability that a value of z will occur in the differential
interval dz in this range is pz(z) dz. Now, consider a change in variate is made by multiplying z by a
constant to obtain variates w = cz. The
new pdf for w will then be such that the probability of obtaining a value of cz
in the interval d(cz) will be the same as that of obtaining z in the interval
dz from which cz was derived. That is,
pw(w) dw = pz(z)dz.
Therefore, pw(w) dw = pw(w)
d(cz) = c pw(w) dz, which equals pz(z)dz. Thus, pw(w) = (1/c) pz(z).
Therefore,
the pdf for (fy/m), letting w = fy/m, is
pfy/m(w) = m pfy(u),
and
the probability of obtaining a value of w in dw is m pfy(mw) dw.
The
cumulative distribution function for w = fy/m is obtained as follows:
The
running variable V in Equation 7 becomes = mw.
Then,
when V = 0, w=0, and when V
= u, w = u/m.
Replacing
V with
mw in Equation 7, the probability that a value less than fy/m will be obtained
is:
w=fy/m
Ffy/m (fv/m)
= òw®0
(1/sg,fyÖ2p)(1/mw)exp[-
(ln mw - ln mg,fy)2/2sg,fy2]
d(mw)
and
canceling the m in numerator and denominator,
w=fy/m
Ffy/m (fy/m) = òw®0
(1/sg,fyÖ2p)(1/w)exp[-
(ln mw - ln mg,fy)2/2sg,fy2]
dw Eq.
8
Now,
the probability that an interpreted value of x will be less than or equal to
some value X can be denoted by
Prob[ x = fy/m - b/m ≤ X], (CAUTION: Some printers apparently do not print the less than or
equals sign
that belongs to the immediate left of the X in these inequalities.)
which
is the same as
Prob[ fy/m ≤ X + (b/m)],
for
a procedure involving a constant subtraction of the same interpreted (“best estimate”)
blank b/m for each urine interpretation.
This
probability is obtained using Equation 8 as:
Prob[fy/m ≤ X + (b/m)] = Ffy/m (X + (b/m)),
Eq. 9
which
is now the desired cumulative probability Fx(X) that an interpreted
control population urine concentration will be less than or equal to X.
The
cumulative distribution function that best fits the combined urine concentrations
for the control populations is thus:
w=X+(b/m)
Fx(X) = òw®0
(1/sg,fyÖ2p)(1/w)exp[-
(ln mw - ln mg,fy)2/2sg,fy2]
dw,
Eq. 10
where
the parameters for the combined controls are again
m = 1.073; mg,fy
= 32; sg,fy = 0.72732; sg,fy2
= 0.529
Since
the lognormal functions of Equations 8-10 are not defined at their lower limits
of integration (as a result, e.g., of w in the denominator and non-convergence
of the integrand), the value of the
integrand must avoid zero. The
calculations of the cumulative probabilities using Equation 10 to obtain the cumulative distribution for interpreted
control population urine concentrations were carried out using MATHCAD (see
Exhibit A) for the case where b and m are maintained constant for each
interpretation of urine concentration.
The lower limit of integration was reduced until the integral converged.
The resulting cumulative distribution F(x) is seen in Figure 4 to agree very
closely with the plot of the cumulative urinary concentration distribution for
all controls that was provided as results from the analytical laboratory (BNL).
Figure 5 shows the original and clearer hand plot of the results in Figure 4,
using log-probability paper (which is no longer available for sale,
unfortunately).
Derivation of the analytical
cumulative distribution function of net plutonium activity (per 24 hour sample)
in combined controls, when m is the constant slope of the calibration line, but
the intercept b is taken as the number of tracks on a paired blank that varies
randomly as the lognormal distribution
in Figure 3 (PAIRED BLANK CASE)
A
convolution integral in which b is a random variable is evaluated in MATHCAD
(Exhibit B) to obtain the desired
analytical expression for the cumulative distribution function (CDF) for
combined control urines. As shown in
Figures 4and 5, the resulting cumulative distribution function represents the
interpreted activities even better than the CDF obtained in the previous
section with constant values of b and m.
However, the treatment of b as a random variable, rather than as a
constant, does not appreciably increase the spread in the resulting
distribution of interpreted values, probably because the distribution of b is
narrower than the distribution in control urines, and also because the positive
and negative variations of b about the central value would tend to cancel in
the data or in the theory describing the resulting urine activity
distribution.
In
order to illustrate the concepts in deriving the cumulative distribution
functions, the convolution integral in Exhibit B is derived below as an
example. The convolution is performed assuming the lognormal distribution in b
as a paired blank will be that observed in Figure 3, but with m as a constant.
The slope m obtained from cumulative calibration data in a well-controlled
analytical system varies much less than the number of tracks in a paired blank
In
order to derive the cumulative distribution function, which is the probability
that a control urine value x will be less than some chosen fixed value X, we
need to integrate (add up) for each value of b in the (infinitesimal) interval
db the probability that the associated (paired) value of fy/m will be such that
fy/m ≤ X + (b/m); for such an
inequality, by subtracting (b/m) from both sides, we see that the interpreted
value will be fy/m - b/m ≤ X. Thus, the integral over w for a given value of b must be from
infinitesimally close to zero up to X + b/m.
Then, after obtaining the sum for each value of b, we need to add over
all possible values of b, from near zero to effectively a high-enough value of
b so that the integral of probability over b is essentially the same as if the
computer covered a range of b up to infinity.
Convergence can be tested by determining whether further increasing
width of the integration intervals produces any further increases in the
result.
The
probability density function (pdf) for the distribution of fy/m is obtained
from the integrand of Equation 8, using a notation of pw(w) for convenience of
notation in MATHCAD, where w is the dummy variable representing fy/m, and pw is
the specific functional form for the pdf of fy/m. Thus, the probability that a plutonium activity, before
subtraction of the paired blank activity, will be in the interval dw is:
pw(w) dw = (1/sg,fyÖ2p)(1/w)exp[-
(ln mw - ln mg,fy)2/2sg,fy2
dw Eq. 11
The
probability that a blank will have the number of tracks b in the interval db
is, using the maximum likelihood parameters of Figure 3 to fit a lognormal function to the blank
distribution,
p(b)db =
(1/sbÖ2p)(1/b)exp[-
(ln b - ln mgb)2/2sb2
db
Eq. 12
(Since
no further pdfs will be used in this calculation, simplified symbols are used
here for p(b), and for the geometric mean and standard deviation of ln b in the
fitted distribution.)
The
cumulative distribution function for interpreted urine values then becomes the
integral over dw for each b value, followed by the integral over all b values,
giving the probability of pairs of tracks from the urine sample analysis and
the paired control such that the interpreted value is less than X:
B X+(b/m)
F(X) = ò
p(b) {ò pw(w) dw} db, Eq.
13
e e
where
the value of e for the lower limit of integration of each
integral must be close enough to zero for convergence of the integration, and
the upper limit B of the integration over db must be high enough to include the
range over which the probability is very close to 1 of obtaining a value of b.
The values of the parameters for each distribution are given in Exhibit B. The resulting values of F(X), for
interpreted values of X from 5, 10, 15....100 aCi plutonium per 24-hour sample,
are printed in Exhibit B to the lower right of the formula for Fk,
which is the representation of F(X) in Equation 13 in the MATHCAD notation required
for computation for a range of values of X.
These values are seen to almost exactly overlap the interpreted control
urine cumulative distribution as reported and as plotted in Figures 4 and 5.
These
results now explain why some determinations of what might be natural lognormal
distributions in human urine or tissue samples (Schubert et al. 1967; McInroy
et al. 1979), curve to the left away from the straight-line representation of a
lognormal distribution on a probability vs. log plot. The abundant tissue samples of McInroy et al. (1979) all present
this type of plot, probably because for very low level measurements, the
subtraction of a random or fixed blank from varying and low sample measurements
tends to provide small or negative results that tend toward negative infinity
on a log x-axis.
When
the systemic burdens were calculated from the data of McInroy by Jackson (2000)
using the straight line portions of the lognormal plots, and then scaled back
and forward in time using intake retention functions for the system and for
urinary excretion of Lessard et al. (1987) for periods of intake and excretion
of plutonium from fallout, respectively, the urinary excretion predicted for
the U. S. population was in reasonable agreement with the mean excretion of the
control population represented in Figures 4 and 5. Although the limited population utilized to illustrate methods in
this paper can not be assumed to be representative of the U. S. population for
any particular purpose, these results and the consistent behavior of the FTA
data during the analytical period correspond to this data, indicate that FTA
analysis is capable under good quality assurance conditions of measuring
plutonium in human urine samples.
The data of Figures 1-3 indicate, using the
lognormal shape, that the decision level (0.05 probability of Type I error)
would be about 37 aCi (for this
pre-October 1998 data) to decide that any sample measurement is positive with
respect to a blank. An a priori “minimum detectable
amount” (MDA) obtained by the simple
formula based on normal statistics as derived by Currie (Brodsky 1986), for a
Type II error probability of 0.05 would be about 74 aCi, close to that
estimated in Boecker et al. (1991). However, any further definition of MDAs must
take into account the actual shape of error distributions and the shapes of
human sample distributions, and carefully consider what question is asked: 1) Is a single sample measurement greater than a blank?; 2) Is a single sample
measurement greater than that from an unexposed control population of
interest?; or 3) Is a distribution of measurements different (greater in
distribution) than a comparison distribution of interest?….
1.
B. Boecker, R. Hall, K. Inn, J. Lawrence, P. Ziemer, G.
Eisele, B. Wachholz and W. Burr, Jr., “Current Status of Bioassay Procedures to
Detect and Quantify Previous Exposures to Radioactive Materials,” Health Phys.
60, Supplement 1, 45-100, 1991.
2.
A. Brodsky, “Statistical Methods of Data Analysis,” in
A. Brodsky, editor, Handbook of Radiation Measurement and Protection, Vol.
II, Biological and Mathematical Information, CRC Press, Boca Raton,
Florida, 1982, pp. 261-329.
3.
A. Brodsky, “Accuracy and Detection Limits for Bioassay
Measurements in Radiation Protection – Statistical Considerations,” NUREG-1156,
U. S. Nuclear Regulatory Commission, Washington, DC, 1986.
4.
A. Brodsky and N. M. Barss, “Mathematical Analytic
Distributions to Fit Radioanalytic Results,” Poster presented at the 45th
Conference on Bioassay, Analytical and Environmental Radiochemistry, National
Institute of Standards and Technology, Gaithersburg, Maryland, October 18-22,
1999.
5.
A. Brodsky, D. M. Schaefer, S. O’Toole, E. Kaplan, N. M.
Barss, J. Dancz, W. J. Klemm, D. A. Raine, III and J. Stiver, “Statistical
Model for Fission Track Analysis of Plutonium in Human Samples,” Health Phys.
76(6), S175, 1999.
6.
A. Brodsky, “Cumulative Lognormal Distributions of
Dose-Response vs. Dose,” in the proceedings of the 10th
International Congress of the International Radiation Protection Association,
Hiroshima, Japan, May 14-19, 2000.
7.
A.
Brodsky, “Exact Calculation of Probabilities of False Positives and False
Negatives for Low Background Counting,” Health Phys. 63(2), 198-204,1992.
8.
E.
Jackson, “Assessing the U. S. Population Dose from Fallout Plutonium Using
Physiological Models of Uptake,
Retention and Excretion,” Master of Science Thesis submitted to the Faculty of
the Graduate School of Arts and Sciences, Georgetown University, Washington,
DC, July 20, 2000.
9.
E.
T. Lessard, X. Yihua, K. W. Skrable, G. E. Chabot, C. S. French, T. R. Labone,
J. R. Johnson, D. R. Fisher, R. Belanger and J. Landmann-Lipzstein,
“Interpretation of Bioassay Measurements,” NUREG/CR-4884, U. S. Nuclear
Regulatory Commission, Washington, DC, 1987.
10.
A.
R. Moorthy, C. J. Shopfer and S. Banerjee, “Plutonium from Atmospheric Weapons
Testing: Fission Track Analysis of Urine Samples,” Anal. Chem. 60, 857A, 1988.
11.
J.
F. McInroy, E. E. Campbell, W. D. Moss, G. L. Tietgen, B. C. Eutsler and H. A.
Boyd, “Plutonium in Autopsy Tissue,” Health Phys. 37, 1-136, 1979.
12.
J.
Schubert, A. Brodsky and S. Tyler, “The Log-Normal Function as a Stochastic
Model of the Distribution of Sr-90 and Other Fission Products in Humans,”
Health Phys. 13, 1187-1204, 1967.