Chi-Square Test — Test Associations Between Categorical Variables

You have two categorical variables and a simple question: are they related, or independent? Does marketing channel affect whether customers convert? Does department predict employee satisfaction level? Does treatment type relate to patient outcome? The chi-square test of independence gives you a definitive answer — a p-value, an effect size, and a visual map of exactly where the association lives. Upload a CSV and find out in under 60 seconds.

What Is the Chi-Square Test of Independence?

The chi-square test of independence answers one question: are two categorical variables associated, or are they statistically independent? "Independent" means that knowing the value of one variable tells you nothing about the other. If marketing channel and conversion status are independent, then the conversion rate is the same regardless of whether a customer arrived through paid search, social media, or email. If they are not independent, at least one channel converts at a meaningfully different rate.

The test works by building a contingency table — a grid that counts how many observations fall into each combination of categories. Suppose you survey 400 employees and ask about their department (Sales, Engineering, Support) and their satisfaction level (Satisfied, Neutral, Dissatisfied). The contingency table would have 3 rows and 3 columns, with 9 cells. Each cell holds a count — for example, 45 salespeople who said "Satisfied."

The chi-square test then compares these observed counts to the counts you would expect if the two variables were completely independent. Expected counts are calculated from the row and column totals. If 60% of all employees are Satisfied and 30% work in Sales, then under independence you would expect 18% of the total (0.60 x 0.30) to be Satisfied salespeople. The test measures how far the observed counts deviate from these expected counts across every cell. Large deviations produce a large chi-square statistic and a small p-value, meaning the association is unlikely to be random noise.

When to Use the Chi-Square Test

Use the chi-square test whenever both of your variables are categorical and you want to know if they are related. The most common business scenarios include:

Survey analysis. Does satisfaction level differ by department? Does product preference vary by age group? Does the distribution of responses to one question depend on responses to another? Surveys produce categorical data by nature, making them a natural fit for chi-square testing.

Marketing and A/B testing. Does marketing channel affect whether someone converts (yes/no)? Does ad creative relate to click outcome? When your outcome is categorical — converted or did not, clicked or did not, churned or stayed — chi-square is the right test. It handles any number of categories on either side: three channels versus two outcomes, five ad variants versus three engagement levels.

Healthcare and clinical research. Is treatment type associated with patient outcome (recovered, improved, no change)? Does a risk factor relate to disease status? Chi-square tests are standard in epidemiology and clinical trials for testing associations between categorical exposures and categorical outcomes.

Retail and operations. Does product category relate to return rate (returned vs. kept)? Does store location affect which payment method customers choose? Does day of week influence whether an order ships on time? Any time you cross-tabulate two categorical dimensions and wonder if the pattern is real, chi-square gives you the answer.

A key advantage over running separate proportion tests: chi-square handles any number of categories on both variables in a single test, keeping your false positive rate controlled. If you have five departments and four satisfaction levels, that is 20 cells — running individual tests for each would inflate your error rate. Chi-square tests the whole table at once.

What Data Do You Need?

You need a CSV with at least two categorical columns. When you upload, you will map your columns to the analysis: one variable becomes the row variable and the other becomes the column variable. You also select a primary target — the variable whose distribution you are most interested in explaining. For example, if you want to know whether department predicts satisfaction, satisfaction is the target and department is the grouping variable.

Both variables must be categorical (text labels or discrete categories, not continuous numbers). Common examples: department names, satisfaction ratings (High/Medium/Low), yes/no flags, product categories, regions, survey response options. If you have a numeric variable like revenue, you will need to bin it into categories first (e.g., Low/Medium/High revenue bands).

For reliable results, the chi-square approximation requires that most expected cell frequencies are at least 5. This is not about observed counts — it is about what the counts would be under independence. With very small samples or rare categories, some expected counts may fall below 5, which makes the chi-square approximation unreliable. The report checks this automatically and warns you if the assumption is violated. As a rough guideline, aim for at least 100 total observations spread reasonably across categories. The module also accepts a min_group_size parameter to filter out categories with too few observations before running the test.

The module runs every categorical variable against the primary target by default. If your dataset has five categorical columns and you set one as the target, you get chi-square tests for all four remaining variables against the target — no need to run the analysis multiple times.

How to Read the Report

The report contains nine sections, each building on the last to give you a complete picture of the association between your variables.

Analysis Overview. A summary of the dataset: how many rows survived preprocessing, which variables were tested, and the significance level used (default 0.05). This is your orientation — confirm that the right columns were analyzed and that the sample size is adequate.

Executive Summary (TL;DR). A plain-language summary of the key findings generated by the AI. Which variable pairs showed significant associations? How strong were they? This section is designed to be shared directly with stakeholders who do not need the statistical details.

Chi-Square Test Results. The core statistical output. For each variable pair tested, you see the chi-square statistic, degrees of freedom, and p-value. The chi-square statistic measures the total squared deviation between observed and expected counts, scaled by expected counts. Degrees of freedom depend on the table dimensions: (rows - 1) x (columns - 1). The p-value is the probability of seeing a chi-square statistic this large if the variables were truly independent. A p-value below 0.05 means you can reject independence — the variables are associated.

Effect Sizes. Statistical significance alone does not tell you how strong the association is. A large sample can produce a tiny p-value for a trivially small association. Cramer's V solves this — it ranges from 0 (no association) to 1 (perfect association). As a rough guide: V below 0.1 is negligible, 0.1-0.3 is small to medium, 0.3-0.5 is medium to large, and above 0.5 is very strong. The report also shows the phi coefficient for 2x2 tables, which is equivalent to Cramer's V when both variables have exactly two levels.

Contingency Table Heatmap. A color-coded grid of observed counts. Darker cells mean higher counts. This visualization immediately shows you where the data concentrates — which combinations of categories are most and least common. It is the visual version of the raw contingency table.

Standardized Residuals. This is where the real insight lives. Standardized residuals tell you which specific cells drive the association. A residual above +2 means that cell has significantly more observations than expected under independence. A residual below -2 means significantly fewer. For example, if the residual for "Engineering + Dissatisfied" is -2.8, then engineers are dissatisfied at a significantly lower rate than you would expect if department and satisfaction were unrelated. The heatmap makes these stand out visually — look for the darkest red and blue cells.

Group Distribution (Grouped Bar Chart). A bar chart showing the distribution of the target variable within each level of the grouping variable. This is the most intuitive visualization — you can see at a glance whether, for example, the proportion of satisfied employees differs across departments. Side-by-side bars make the comparison easy without needing to interpret numbers.

Observed vs. Expected (Contingency Details). The full contingency table with both observed counts and expected counts side by side. This is the data behind the chi-square calculation. Cells where observed counts diverge sharply from expected counts are the ones contributing most to the chi-square statistic. This table lets you verify the test by hand if needed.

When to Use Something Else

If your sample is very small — particularly if any expected cell frequency is below 5 — consider Fisher's exact test. It computes the exact probability rather than relying on the chi-square approximation, making it valid for small samples. Fisher's test is most commonly used for 2x2 tables but can handle larger tables with specialized algorithms. The chi-square report flags when expected frequencies are too low, so you will know when to switch.

If your outcome variable is numeric rather than categorical — for example, comparing revenue across departments rather than satisfaction categories — use ANOVA (three or more groups) or a t-test (two groups). These tests compare means of continuous variables, which chi-square cannot do.

If you want to go beyond testing association and actually predict which category a new observation will fall into, consider logistic regression or Naive Bayes classification. Chi-square tells you whether an association exists; classification models tell you the direction and let you make predictions on new data.

If your categorical variables are ordinal — that is, they have a natural order like "Low < Medium < High" — the standard chi-square test ignores that ordering. A Cochran-Armitage trend test or ordinal regression would be more appropriate because they incorporate the ordering and are more powerful at detecting linear trends. However, for a quick check of whether any association exists, chi-square still works — it is just not optimized for ordered categories.

If you have matched or paired data — for example, the same patients assessed before and after treatment — use McNemar's test instead. Chi-square assumes independent observations, and paired data violates that assumption.

The R Code Behind the Analysis

Every report includes the exact R code used to produce the results — reproducible, auditable, and citable. This is not AI-generated code that changes every run. The same data produces the same analysis every time.

The analysis uses chisq.test() from base R for the Pearson chi-square statistic, p-value, expected frequencies, and standardized residuals. Effect sizes are computed with cramersV() from the lsr package and cross-checked via the vcd package. The contingency table is built with table() and xtabs() — standard R functions used in every statistics textbook. Expected frequency validation (the "80% of cells above 5" rule) is checked programmatically before reporting results, and warnings are included in the output if the assumption fails. Every step is visible in the code tab of your report, so you or a statistician can verify exactly what was done.