Simple Linear Regression
Fit a simple linear regression model to predict a response variable from a single predictor and interpret the slope, intercept, and R-squared.
Problem Scenario
A marketing analyst wants to understand the relationship between monthly advertising spend (in thousands of dollars) and monthly sales revenue (in thousands of dollars). She collects data from 6 months: Ad Spend: 2, 4, 6, 8, 10, 12 and Sales: 15, 25, 32, 40, 48, 55. Fit a simple linear regression line and interpret the results.
Given Data
Requirements
- Calculate the slope (b_1) and intercept (b_0) of the regression line
- Calculate R-squared and interpret the model fit
- Use the model to predict sales when ad spend is $7,000
Solution
Step 1:
Calculate the necessary sums. Sum(x) = 42, Sum(y) = 215, Sum(xy) = 2x15 + 4x25 + 6x32 + 8x40 + 10x48 + 12x55 = 30 + 100 + 192 + 320 + 480 + 660 = 1782. Sum(x^2) = 4 + 16 + 36 + 64 + 100 + 144 = 364.
Step 2:
Calculate the slope: b_1 = [n*Sum(xy) - Sum(x)*Sum(y)] / [n*Sum(x^2) - (Sum(x))^2] = [6(1782) - 42(215)] / [6(364) - 42^2] = [10692 - 9030] / [2184 - 1764] = 1662 / 420 = 3.957.
Step 3:
Calculate the intercept: b_0 = y-bar - b_1 * x-bar = 35.833 - 3.957(7) = 35.833 - 27.700 = 8.133.
Step 4:
The regression equation is: y-hat = 8.133 + 3.957x. Interpretation: For each additional $1,000 in advertising spend, sales revenue increases by approximately $3,957. When ad spend is $0, the model estimates baseline sales of about $8,133.
Step 5:
Calculate R-squared. SS_total = Sum(y - y-bar)^2 = (15 - 35.833)^2 + (25 - 35.833)^2 + (32 - 35.833)^2 + (40 - 35.833)^2 + (48 - 35.833)^2 + (55 - 35.833)^2 = 433.61 + 117.36 + 14.69 + 17.36 + 148.03 + 367.36 = 1098.41. SS_regression = b_1^2 * Sum(x - x-bar)^2 = (3.957)^2 * 70 = 15.658 * 70 = 1096.03. R^2 = 1096.03 / 1098.41 = 0.998.
Step 6:
Prediction: When ad spend = 7 ($7,000), y-hat = 8.133 + 3.957(7) = 8.133 + 27.700 = 35.833 (approximately $35,833 in sales).
Final Answer
The regression equation is y-hat = 8.133 + 3.957x. The slope indicates that each additional $1,000 in ad spend is associated with approximately $3,957 more in monthly sales. R-squared = 0.998, meaning 99.8% of the variation in sales is explained by advertising spend. Predicted sales at $7,000 ad spend = $35,833.
Key Takeaways
- โR-squared measures the proportion of variability in the response variable explained by the predictor. An R-squared near 1 suggests a very strong linear fit.
- โThe slope represents the average change in y for a one-unit increase in x. Always state slope interpretations in the context of the variables.
- โBe cautious about extrapolating beyond the range of the observed data. Predictions outside the range x = 2 to x = 12 may be unreliable.
Common Errors to Avoid
- โReversing the roles of x and y when computing the slope. Regressing y on x gives a different line than regressing x on y.
- โInterpreting R-squared as a measure of causation. A high R-squared only indicates a strong linear association, not that x causes changes in y.
- โExtrapolating the model far beyond the observed data range, which can produce misleading predictions.
Practice More Problems with AI
Snap a photo of any problem and get instant explanations.
Download StatsIQFAQs
Common questions about this problem type
R-squared is the square of the correlation coefficient r. While r tells you the direction and strength of the linear relationship (ranging from -1 to 1), R-squared tells you the proportion of variance in y explained by x (ranging from 0 to 1). For example, r = 0.999 and R-squared = 0.998 both indicate a very strong positive linear relationship.
Check the residual plot (residuals vs. fitted values). If the residuals show a random scatter with no pattern, the linear model is appropriate. If you see curvature, fanning, or clusters, a different model (polynomial, log transformation, etc.) may be needed.