Sample Page

In statistical hypothesis testing, a two-sample test is a test performed on the data of two random samples, each independently obtained from a different given population. The purpose of the test is to determine whether the difference between these two populations is statistically significant. When the only knowledge of the two populations is given through the samples, two-sample hypothesis testing becomes an instance of universal hypothesis testing.

There are a large number of statistical tests that can be used in a two-sample test. Which one(s) are appropriate depend on a variety of factors, such as:

  • Which assumptions (if any) may be made a priori about the distributions from which the data have been sampled? For example, in many situations it may be assumed that the underlying distributions are normal distributions. In other cases the data are categorical, coming from a discrete distribution over a nominal scale, such as which entry was selected from a menu.
  • Does the hypothesis being tested apply to the distributions as a whole, or just some population parameter, for example the mean or the variance?
  • Is the hypothesis being tested merely that there is a difference in the relevant population characteristics (in which case a two-sided test may be indicated), or does it involve a specific bias (“A is better than B”), so that a one-sided test can be used?

Two-sample testing does not mean that only two random variables can be used for inference (as is sometimes incorrectly assumed). Instead, it gets its name because the problem involves data from two populations, distinct from one-sample hypothesis testing in which samples from a single population are used to make a decision[1].

Relevant tests

Statistical tests that may apply for two-sample testing include:

Mathematical Definition

Given samples taken independently and identically distributed (iid) from a population , and taken iid from a population , the goal of a two sample test is to determine which of the two hypothesis are correct In many instances of two-sample hypothesis testing, the only information that we have about the populations comes from the samples, meaning that any test needs to be non-parametric.


Optimal Error Exponent

When considering tests with fixed (or vanishing) false positive rate, the best achievable exponent for the false negative rate is the log of the Bhattacharya distance[4] or equivalently with the Renyi divergence[3]. The first test known to achieve this optimal error exponent was based on maximum mean discrepency.[5]

See also

References

  1. ^ Berberyan, Toros; Nguyen, Tracy; Swan, Alfie (Jan 8, 2024). “4.2: Two-Sample t-Test”. Statistics LibreTexts. LibreTexts Statistics. Retrieved 14 June 2026.
  2. ^ Lopez-Paz, David; Oquab, Maxime (2016). “Revisiting Classifier Two-Sample Tests”. arXiv:1610.06545 [stat.ML].
  3. ^ a b Grootveld, Arick; Chen, Biao; Gandikota, Venkata (2026-06-10). “Asymptotically Optimal Tests for One- and Two-Sample Problems”. arXiv:2601.11727 [cs.IT].
  4. ^ Harsha, K V; Ravi, Jithin; Koch, Tobias (2026-01-14). “Second-Order Asymptotics of Two-Sample Tests”. arXiv:2601.09196 [cs.IT].
  5. ^ Zhu, Shengyu; Chen, Biao; Chen, Zhitang; Yang, Pengfei (April 2021). “Asymptotically Optimal One- and Two-Sample Testing With Kernels”. IEEE Transactions on Information Theory. 67 (4): 2074–2092. arXiv:1908.10037. Bibcode:2021ITIT…67.2074Z. doi:10.1109/TIT.2021.3059267. ISSN 1557-9654.