📊
fundamentalsbeginner25 min

How to Interpret Regression Output: Coefficients, R-Squared, p-Values, and What They Mean

A practical guide to reading regression output from any statistical software — covering what each number means, how to interpret coefficients (slope and intercept), R² and adjusted R², the F-test, coefficient p-values, confidence intervals, and the mistakes students make when interpreting results on exams.

What You'll Learn

  • Read and interpret a standard regression output table from any statistical software
  • Explain what each coefficient means in context (units, direction, magnitude)
  • Distinguish between R², adjusted R², and their appropriate uses
  • Determine whether the overall model and individual predictors are statistically significant

1. The Direct Answer: The Coefficient Tells You the Effect, R² Tells You the Fit, p Tells You the Evidence

Regression output has three essential components. (1) Coefficients — the slope values that tell you how much Y changes for a one-unit increase in X, holding other variables constant. (2) R-squared — the proportion of variance in Y explained by the model (0 to 1, higher = better fit). (3) p-values — the probability of seeing results this extreme if the true effect were zero (lower = stronger evidence). Example output from a simple regression of exam score on hours studied: Variable | Coefficient | Std Error | t-stat | p-value Intercept | 52.3 | 4.1 | 12.76 | <0.001 Hours Studied | 4.8 | 0.9 | 5.33 | <0.001 R² = 0.58 | Adjusted R² = 0.56 | F(1,20) = 28.4, p < 0.001 Interpretation: each additional hour of studying is associated with a 4.8-point increase in exam score. A student who studies zero hours is predicted to score 52.3 (the intercept). The model explains 58% of the variance in exam scores. Both the overall model (F-test, p < 0.001) and the hours variable (t-test, p < 0.001) are statistically significant. Snap a photo of any regression output and StatsIQ walks through each number — coefficient interpretation, significance assessment, and the practical meaning of the results.

Key Points

  • Coefficient = the slope: how much Y changes per one-unit increase in X, holding other variables constant
  • R² = proportion of variance explained. 0.58 means the model accounts for 58% of the variation in Y.
  • p-value on a coefficient: probability of seeing this result if the true coefficient were zero. Low = significant.
  • F-test: tests whether the OVERALL model is significant (at least one predictor matters). Coefficient p-values test INDIVIDUAL predictors.

2. Interpreting Coefficients: What the Numbers Actually Mean

The coefficient (also called the slope, estimate, or beta) tells you the predicted change in Y for a one-unit change in X. The units matter — they come from the units of X and Y. In our example: coefficient of Hours Studied = 4.8. This means: for each additional hour of studying, the model predicts exam score increases by 4.8 points. The unit is "exam points per hour of studying." If a student studies 3 hours instead of 2, the predicted score increases by 4.8 points. If they study 5 hours instead of 2, the predicted increase is 3 × 4.8 = 14.4 points. The intercept (52.3) is the predicted Y when ALL X variables equal zero. In our example: a student who studies zero hours is predicted to score 52.3. This may or may not be meaningful — if no one in the data studied zero hours, the intercept is an extrapolation beyond the data range. Do not over-interpret the intercept unless zero is a realistic value for the predictor. Sign matters: positive coefficient = Y increases as X increases (positive relationship). Negative coefficient = Y decreases as X increases (negative relationship). In a multiple regression: "salary = 35,000 + 2,100×(years experience) - 1,500×(commute minutes)" means each year of experience adds $2,100 to predicted salary, while each additional minute of commute is associated with $1,500 less salary (probably not causal — longer commutes may proxy for lower cost-of-living areas). Magnitude comparison trap: in multiple regression, you CANNOT compare raw coefficient magnitudes across variables to determine which predictor is "more important." A coefficient of 2,100 on years experience and -1,500 on commute minutes does not mean experience matters more. The variables are in different units. To compare relative importance, use standardized coefficients (beta weights) — these are in standard deviation units and are comparable across variables. StatsIQ identifies the units, sign, and practical meaning of each coefficient — and flags when students incorrectly compare unstandardized coefficients across variables.

Key Points

  • Coefficient = predicted change in Y per one-unit change in X. Always state the units when interpreting.
  • Intercept = predicted Y when all X = 0. Only meaningful if X = 0 is within the data range.
  • Positive coefficient = positive relationship. Negative = inverse relationship.
  • NEVER compare raw coefficient sizes across variables in different units. Use standardized betas for relative importance.

3. R-Squared, Adjusted R-Squared, and the Overfitting Trap

R² (coefficient of determination) tells you what proportion of the variance in Y is explained by the model. R² = 0.58 means the model explains 58% of the variation in exam scores. The remaining 42% is unexplained — due to other factors not in the model (sleep, prior knowledge, test anxiety) or random variation. R² always increases when you add more predictors — even useless ones. Adding "shoe size" to the exam score model will increase R² slightly because of random correlation in the sample, even though shoe size has no real relationship with exam performance. This is the overfitting trap: you can get R² = 1.0 by adding enough predictors to any dataset, but the model will be useless for prediction because it has memorized the noise in your specific sample. Adjusted R² solves this by penalizing for additional predictors. The formula adjusts R² downward based on the number of predictors relative to the sample size. If a new predictor does not improve the model enough to justify its inclusion, adjusted R² decreases even as R² increases. Rule: use adjusted R² when comparing models with different numbers of predictors. Use R² when you just want to describe how well the model fits. What counts as a "good" R²? It depends entirely on the field. In physics or engineering: R² > 0.95 is expected. In social sciences and education: R² of 0.30-0.60 is good. In economics and finance: R² of 0.10-0.30 is common for cross-sectional data. An R² of 0.20 in a stock return model might be excellent. An R² of 0.20 in a bridge load model would be terrifying. Context determines what is good. The mistake students make: treating R² as a measure of model correctness. A model can have R² = 0.90 and still be wrong — the relationship might be nonlinear and your linear model is missing the curve. Or the model might have omitted variable bias (a lurking variable explains both X and Y). R² tells you how well the model fits the data, not whether the model is right. StatsIQ reports both R² and adjusted R² and explains what they mean for your specific model — including whether the R² is strong or weak for the field and whether adding predictors actually improved the model.

Key Points

  • R² = proportion of variance explained. 0.58 = model explains 58% of variation in Y.
  • R² ALWAYS increases with more predictors — even useless ones. This is why adjusted R² exists.
  • Adjusted R² penalizes for extra predictors. Use it to compare models with different numbers of variables.
  • A "good" R² depends on the field: 0.95 in physics, 0.30 in social science, 0.10 in finance. Context matters.

4. The F-Test, Coefficient p-Values, and Making Decisions

The regression output gives you two types of significance tests, and students constantly confuse them. The F-test (bottom of the output) tests the OVERALL model: "Is at least one predictor significantly related to Y?" The null hypothesis is that ALL coefficients equal zero (the model has no explanatory power). A significant F-test (p < 0.05) means the model as a whole explains a statistically significant amount of variance. If the F-test is not significant, the model is useless — none of the predictors contribute meaningful explanatory power. Coefficient p-values (in the table, one per predictor) test INDIVIDUAL predictors: "Is this specific variable significantly related to Y, after controlling for all other predictors in the model?" A significant coefficient p-value (p < 0.05) means that particular predictor adds explanatory power beyond the other variables. Crucial distinction: the F-test can be significant while individual coefficient p-values are not. This happens in multiple regression when predictors are correlated with each other (multicollinearity). Together, the predictors explain Y significantly, but individually, no single predictor is significant because their effects overlap. This is one of the most confusing results for students — the model works but no single variable is "the" important one. Confidence intervals complement p-values. A 95% confidence interval for the hours studied coefficient (4.8) might be [2.9, 6.7]. Interpretation: we are 95% confident that the true effect of hours studied on exam score is between 2.9 and 6.7 points per hour. If the interval does not contain zero, the coefficient is significant at α = 0.05 (equivalent to p < 0.05). Confidence intervals are more informative than p-values because they tell you the RANGE of plausible effect sizes, not just whether the effect is non-zero. The practical workflow: (1) Check the F-test — is the overall model significant? If no, stop. (2) Check individual coefficient p-values — which predictors are significant? (3) Look at the confidence intervals — how precise are the estimates? (4) Look at R² — how much variance does the model explain? (5) Interpret the significant coefficients in context — what do they mean practically? StatsIQ walks through this entire workflow for any regression output — snap a photo of the output table and it identifies the F-test result, flags significant and non-significant predictors, calculates confidence intervals, and writes the interpretation in plain language.

Key Points

  • F-test: tests the OVERALL model (all predictors jointly). Significant F = model has explanatory power.
  • Coefficient p-values: test INDIVIDUAL predictors. Significant p = this variable matters after controlling for others.
  • F-test significant but individual p-values not? → Multicollinearity. Predictors are correlated with each other.
  • Confidence intervals > p-values for interpretation. They show the range of plausible effect sizes, not just yes/no significance.

Key Takeaways

  • Coefficient = change in Y per one-unit change in X. Always state units. Never compare raw coefficients across different variables.
  • R² always increases with more predictors. Adjusted R² penalizes for extra predictors. Use adjusted R² to compare models.
  • F-test = overall model significance. Coefficient p-value = individual predictor significance. These are different tests.
  • A "good" R² depends on the field: 0.95 in physics, 0.30 in social science, 0.10 in finance.
  • Significant F-test + non-significant individual predictors = multicollinearity (predictors correlated with each other).

Practice Questions

1. A regression of salary ($) on years of experience gives: Intercept = 35,000 (p < 0.001), Years = 3,200 (p < 0.001), R² = 0.45, F(1,98) = 80.2 (p < 0.001). Interpret the coefficient, R², and overall model significance.
Each additional year of experience is associated with a $3,200 increase in predicted salary. A person with zero years of experience has a predicted salary of $35,000. The model explains 45% of the variance in salary. Both the individual coefficient (p < 0.001) and the overall model (F = 80.2, p < 0.001) are statistically significant. The remaining 55% of salary variance is explained by factors not in the model (education, industry, location, etc.).
2. Two regression models predicting GPA: Model A has R² = 0.52, Adjusted R² = 0.50, with 2 predictors. Model B has R² = 0.55, Adjusted R² = 0.48, with 6 predictors. Which model is better and why?
Model A is better. Although Model B has higher R² (0.55 vs 0.52), its adjusted R² (0.48) is LOWER than Model A's (0.50). The 4 additional predictors in Model B do not contribute enough explanatory power to justify their inclusion — they are overfitting the data. Adjusted R² correctly penalizes Model B for its extra predictors. Always prefer the simpler model when adjusted R² is equal or higher.

Study with AI

Get personalized help and instant answers anytime.

Download StatsIQ

FAQs

Common questions about this topic

No. Regression shows association (correlation), not causation. A significant coefficient means X and Y are systematically related in the data after controlling for other variables. But the relationship could be driven by a confounding variable not in the model, or the direction of causation could be reversed. Causal claims require experimental design (randomized controlled trials), not regression analysis.

Yes. Snap a photo of any regression output table and StatsIQ identifies each component — coefficients, R², adjusted R², F-test, coefficient p-values, and confidence intervals. It writes the interpretation in context, flags non-significant predictors, checks for multicollinearity indicators, and explains what the numbers mean for your specific research question.

More Study Guides