The Z-Test measures whether the difference in averages between two groups is statistically significant. Z-Tests are used when sample sizes are large. Z-Tests can be used to compare the distribution of loan outcomes for a protected group to the overall population or to assess whether the distribution of credit scores differs
significantly between groups.
An output of a Z-Test is a p-value which reflects the probability than an observed difference in averages is random or statistically significant. Common thresholds for significance are p-values of 0.05, 0.01, or 0.001.
A p-value less than 0.05 (often used as a standard threshold) suggests that there is less than a 5% probability that the observed difference in distributions is due to random chance. Thus, the result is considered statistically significant, indicating potential issues in fairness between the two groups. Lower thresholds (like 0.01 or 0.001) indicate even higher confidence that a difference in outcomes between groups is not random.
Although there are no concrete fairness thresholds, regulators may find: