Calculating the probability of random distribution in Python

Python Calculating the Probability of Random Distribution, with the incomplete gamma function gamma and the complete gamma function Gamma_Half, we can calculate the CDF value of x^2. This value indicates whether a given value is random or has some correlation.

The function itself is quite short:

def cdf(x: Union[Fraction, float], k: int) -> Fraction:
"""X2 cumulative distribution function.
:param x: X2 value, sum (obs[i]-exp[i])**2/exp[i]
for parallel sequences of observed and expected values.
:param k: degrees of freedom >= 1; often len(data)-1
"""
return (
1 -
gamma(Fraction(k, 2), Fraction(x/2)) /
Gamma_Half(Fraction(k, 2))
)

The function includes some docstring comments explaining the parameters. We create the appropriate Fraction object using the degrees of freedom and chi-square value x. The x argument can be either a float value or a Fraction object; this flexibility is useful for matching examples that use a full floating-point approximation.

The x/2Fraction method can be limited to a relatively small number of digits using Fraction(x/2).limit_denominator(1000) . This yields a correct CDF value rather than a large fraction with dozens of digits.

The following is some sample data taken from the x^2 table. For more information, see the Wikipedia entry on chi-squared distribution.

Execute the following command to calculate the correct CDF value:

>>> round(float(cdf(0.004, 1)), 2)
0.95
>>> cdf(0.004, 1).limit_denominator(100)
Fraction(94, 99)
>>> round(float(cdf(10.83, 1)), 3)
0.001
>>> cdf(10.83, 1).limit_denominator(1000)
Fraction(1, 1000)
>>> round(float(cdf(3.94, 10)), 2)
0.95
>>> cdf(3.94, 10).limit_denominator(100)
Fraction(19, 20)
>>> round(float(cdf(29.59, 10)), 3)
0.001
>>> cdf(29.59, 10).limit_denominator(10000)
Fraction(8, 8005)

Given x^2 and multiple degrees of freedom, the CDF function produces values consistent with those in widely used tables. The first example shows the probability that x^2 is 0.004 with 1 degree of freedom. The second example shows the probability that x^2 is 10.38 with 1 degree of freedom. Small values of x^2 mean that there is little difference between the expected and observed results.

Here is a whole row from the x^2 table, computed by a simple generator expression:

>>> chi2 = [0.004, 0.02, 0.06, 0.15, 0.46, 1.07, 1.64, ...2.71, 3.84, 6.64, 10.83]
>>> act = [round(float(x), 3)
... for x in map(cdf, chi2, [1]*len(chi2))]
>>> act
[0.95, 0.888, 0.806, 0.699, 0.498, 0.301, 0.2, 0.1, 0.05, 0.01, 0.001]

These values show the relative likelihood of a given x^2 value and the resulting value, given 1 degree of freedom. There are slight differences in the calculated results up to three decimal places compared to the published results. This means that the calculated CDF value can be used instead of the x^2 value found in standard statistical references.

The function CDF() gives the probability of obtaining x^2 by chance.

From the published table, the x^2 value for a probability of 0.05 with 6 degrees of freedom is 12.5916. The output of the CDF() function is as follows, showing good agreement with published results:

>>> round(float(cdf(12.5916, 6)), 2)
0.05

Recalling the previous example, when the actual value of x^2 was calculated to be 19.18, the probability that this value is random is:

>>> round(float(cdf(19.18, 6)), 5)
0.00387

When the denominator is restricted to 1000, the probability is 3/775, indicating that these data are unlikely to be random. This means we can reject the null hypothesis and conduct further analysis to identify possible causes of the discrepancy.

Leave a Reply

Your email address will not be published. Required fields are marked *