ElyxAI

HYPGEOM.DIST: The Complete Excel Guide to Hypergeometric Distribution Analysis

Advanced
=HYPGEOM.DIST(sample_s, number_sample, population_s, number_pop, cumulative)

The HYPGEOM.DIST function is a powerful statistical tool in Excel that calculates probabilities based on the hypergeometric distribution. This distribution models scenarios where you're sampling without replacement from a finite population containing two types of items—successes and failures. Unlike the binomial distribution, which assumes replacement, the hypergeometric distribution accounts for the changing probability as items are removed from the population. This advanced statistical function is essential for quality control analysts, researchers, and data scientists who need to determine the probability of obtaining a specific number of successes in a sample drawn from a finite population. Common applications include lottery probability calculations, defect detection in manufacturing batches, and survey sampling analysis. Understanding HYPGEOM.DIST enables you to make data-driven decisions in scenarios where population size is limited and sampling without replacement is the reality. Whether you're conducting statistical analysis for academic research, performing quality assurance testing, or analyzing survey results, HYPGEOM.DIST provides the mathematical foundation for accurate probability calculations in real-world sampling situations.

Syntax & Parameters

The HYPGEOM.DIST function requires five parameters that work together to calculate hypergeometric probabilities. The first parameter, sample_s, represents the number of successes you observe in your sample—this is the specific outcome you're analyzing. The second parameter, number_sample, defines the total size of your sample, which must be less than or equal to the population size. The third parameter, population_s, indicates how many successes exist in the entire population before sampling begins. The fourth parameter, number_pop, specifies the total population size from which you're sampling. These four parameters define your sampling scenario completely. The fifth parameter, cumulative, is a logical value that determines whether you want the probability mass function (FALSE) or cumulative distribution function (TRUE). When cumulative is FALSE, the function returns the probability of getting exactly sample_s successes. When cumulative is TRUE, it returns the probability of getting at most sample_s successes. All parameters must contain non-negative integers, and sample_s cannot exceed either number_sample or population_s. The population_s cannot exceed number_pop. Understanding these constraints prevents errors and ensures accurate calculations for your statistical analysis.

sample_s
Successes in sample
number_sample
Sample size
population_s
Successes in population
number_pop
Population size
cumulative
TRUE for cumulative, FALSE for mass

Practical Examples

Quality Control: Defective Items in Manufacturing Batch

=HYPGEOM.DIST(2, 20, 10, 100, FALSE)

This formula calculates the exact probability of finding precisely 2 defective components in your sample of 20, given that 10 out of 100 total components are defective. The FALSE parameter ensures we get the probability mass function rather than cumulative probability.

Lottery Verification: Matching Numbers

=HYPGEOM.DIST(4, 6, 6, 49, FALSE)

This formula determines the probability of matching exactly 4 numbers when 6 are drawn from a pool of 49, given that the player selected 6 numbers. This demonstrates how hypergeometric distribution applies to lottery and gambling probability scenarios.

Survey Sampling: Customer Satisfaction Analysis

=HYPGEOM.DIST(30, 50, 350, 500, TRUE)

This formula calculates the cumulative probability of getting 30 or fewer satisfied customers in a random sample of 50 from a population of 500 where 350 are satisfied. The TRUE parameter gives cumulative probability, useful for determining confidence in satisfaction levels.

Key Takeaways

  • HYPGEOM.DIST calculates probabilities for sampling without replacement from finite populations, making it ideal for quality control, lottery analysis, and survey sampling
  • Five parameters must satisfy strict mathematical constraints: sample_s ≤ number_sample, population_s ≤ number_pop, and all values must be non-negative integers
  • The cumulative parameter fundamentally changes results: FALSE gives exact probability while TRUE gives cumulative probability, requiring careful selection based on your analytical question
  • HYPGEOM.DIST differs from BINOM.DIST primarily in assuming finite populations and sampling without replacement, making it more accurate for real-world scenarios with limited populations
  • Combining HYPGEOM.DIST with other functions like IF, SUMPRODUCT, or data tables enables sophisticated statistical analysis, sensitivity testing, and decision-support applications

Pro Tips

Always validate your parameters before using HYPGEOM.DIST by creating a small verification table. List sample_s, number_sample, population_s, and number_pop in separate cells with formulas checking: sample_s ≤ number_sample, population_s ≤ number_pop, and all values ≥ 0.

Impact : Prevents #NUM! errors and saves debugging time. Catches data entry mistakes immediately rather than producing incorrect results.

Use named ranges for your parameters to make formulas more readable and maintainable. Instead of =HYPGEOM.DIST(A1, B1, C1, D1, FALSE), use =HYPGEOM.DIST(Successes_Sample, Sample_Size, Successes_Pop, Population_Size, FALSE).

Impact : Improves formula clarity, reduces errors when sharing workbooks, and makes it easier to update parameters across multiple formulas.

For cumulative probability ranges, subtract two HYPGEOM.DIST results with TRUE: =HYPGEOM.DIST(upper, n_sample, pop_s, n_pop, TRUE) - HYPGEOM.DIST(lower-1, n_sample, pop_s, n_pop, TRUE) to get P(lower ≤ X ≤ upper).

Impact : Enables sophisticated probability analysis without needing to calculate individual probabilities. Essential for confidence intervals and hypothesis testing.

Create a sensitivity table varying sample_s from 0 to number_sample to visualize the entire probability distribution. This reveals the most likely outcomes and helps identify unusual results.

Impact : Provides visual understanding of probability distribution shape, helps identify outliers, and supports presentation of statistical findings to non-technical stakeholders.

Useful Combinations

Confidence Level Analysis with IF and HYPGEOM.DIST

=IF(HYPGEOM.DIST(2, 20, 10, 100, TRUE)>0.95, "High Confidence", "Low Confidence")

Combines HYPGEOM.DIST with IF to create decision logic based on probability thresholds. This formula evaluates whether the cumulative probability exceeds 95%, returning a confidence assessment. Useful for quality control decisions.

Sensitivity Analysis with HYPGEOM.DIST and Data Table

=HYPGEOM.DIST($A$1, $B$1, C2, $D$1, FALSE) across multiple population_s values

Uses absolute references ($) for fixed parameters while varying population_s to see how probability changes. Create a data table with different population values to understand sensitivity. Essential for scenario planning and risk assessment.

Probability Range Calculation with HYPGEOM.DIST

=HYPGEOM.DIST(5, 20, 10, 100, TRUE) - HYPGEOM.DIST(2, 20, 10, 100, TRUE)

Calculates the probability of getting between 3 and 5 successes by subtracting two cumulative probabilities. Useful for determining probability ranges rather than exact values or cumulative totals.

Common Errors

#NUM!

Cause: Parameters violate mathematical constraints: sample_s > number_sample, population_s > number_pop, or sample_s > population_s. For example: =HYPGEOM.DIST(25, 20, 10, 100, FALSE) where sample_s exceeds number_sample.

Solution: Verify that sample_s ≤ number_sample, population_s ≤ number_pop, and all values are non-negative integers. Check your data range references for incorrect cell selections.

#VALUE!

Cause: Non-numeric values passed to the function or cumulative parameter is not TRUE/FALSE. Example: =HYPGEOM.DIST(2, 20, 10, 100, "yes") or =HYPGEOM.DIST("two", 20, 10, 100, FALSE).

Solution: Ensure all parameters are numeric values. For the cumulative parameter, use only TRUE or FALSE (or 1 and 0). Convert text numbers to actual numbers using VALUE() function if needed.

#REF!

Cause: Cell references in the formula point to deleted columns or rows. Example: =HYPGEOM.DIST(A2, B2, C2, D2, FALSE) where column C was deleted, breaking the reference.

Solution: Verify all cell references are valid and point to existing data. Use absolute references ($A$2) for fixed parameters to prevent reference breaking when copying formulas.

Troubleshooting Checklist

  • 1.Verify all parameters are integers: use INT() function to convert decimals if necessary
  • 2.Confirm sample_s ≤ number_sample and sample_s ≤ population_s (no value can exceed its parent set)
  • 3.Check that population_s ≤ number_pop (successes cannot exceed total population)
  • 4.Ensure cumulative parameter is exactly TRUE or FALSE (not text strings like "true" or 1/0 unless using numeric equivalents)
  • 5.Validate cell references are not broken: use Ctrl+` to show formulas and verify ranges exist
  • 6.Test with simple known values first: =HYPGEOM.DIST(1, 2, 1, 2, FALSE) should return 0.5 to confirm function works correctly

Edge Cases

sample_s equals 0 (looking for probability of zero successes)

Behavior: HYPGEOM.DIST correctly calculates the probability of getting no successes in the sample. For example: =HYPGEOM.DIST(0, 20, 10, 100, FALSE) returns valid probability.

This is a valid use case. The function handles zero correctly without errors.

population_s equals number_pop (entire population is successes, no failures exist)

Behavior: HYPGEOM.DIST returns 1 if sample_s equals number_sample (you must get all successes), and 0 for any sample_s < number_sample (impossible to get fewer successes).

Solution: Verify your data; this represents a degenerate distribution where outcome is certain.

Mathematically correct but represents a scenario where sampling adds no uncertainty.

number_sample equals number_pop (sampling entire population)

Behavior: HYPGEOM.DIST returns 1 only if sample_s equals population_s (must get exactly the successes in population), otherwise returns 0.

Represents sampling without replacement where you examine every item; outcome is deterministic, not probabilistic.

Limitations

  • HYPGEOM.DIST requires all parameters to be integers; decimal values cause #NUM! errors. This limits its use with continuous data or averaged values that haven't been rounded.
  • The function cannot handle populations larger than 2^31-1 (approximately 2.1 billion) due to Excel's integer limitations, though practical applications rarely exceed this threshold.
  • HYPGEOM.DIST assumes a fixed, known population size and composition before sampling begins. It cannot adapt to changing population parameters or sequential sampling scenarios where composition changes during the process.
  • The function provides point probabilities or cumulative probabilities but doesn't directly calculate confidence intervals, p-values, or other inferential statistics commonly needed in hypothesis testing without additional formulas.

Alternatives

Simpler to use for large populations where sampling with replacement is a reasonable assumption. Requires fewer parameters and faster computation.

When: Use when population is very large (>1000) or when replacement occurs naturally in your process. Provides similar results to HYPGEOM.DIST for large populations.

Provides complete control over calculations and helps understand the mathematical foundation. Allows custom probability adjustments.

When: Use for educational purposes or when you need to customize the probability calculation. Formula: =COMBIN(population_s, sample_s)*COMBIN(number_pop-population_s, number_sample-sample_s)/COMBIN(number_pop, number_sample)

Models the number of failures before achieving a fixed number of successes. Different perspective on the same sampling scenario.

When: Use when you want to know how many trials are needed to achieve a target number of successes rather than the probability of successes in a fixed sample.

Compatibility

Excel

Since Excel 2010

=HYPGEOM.DIST(sample_s, number_sample, population_s, number_pop, cumulative)

Google Sheets

=HYPGEOM.DIST(sample_s, number_sample, population_s, number_pop, cumulative)

Fully compatible with identical syntax and parameters. Results are consistent with Excel versions.

LibreOffice

=HYPGEOM.DIST(sample_s, number_sample, population_s, number_pop, cumulative)

Frequently Asked Questions

Need help mastering hypergeometric distributions and other advanced Excel formulas? Discover how ElyxAI can accelerate your statistical analysis with intelligent formula suggestions and real-time error detection. Start optimizing your spreadsheet workflows today with ElyxAI's AI-powered Excel assistant.

Explore Statistical

Related Formulas