Finding Sx and Sy- Statistical Methods Explained
What the Heck Are Sx and Sy?
In statistics, Sx and Sy are just the standard deviations of your x-variable and y-variable respectively. That's it. Nothing fancy.
You might see them called sx (sample standard deviation of x) and sy (sample standard deviation of y) in your textbook or calculator. They're the same thing.
These values show how spread out your data points are around the mean for each variable. A larger Sx means x-values are more scattered. A smaller Sy means y-values cluster tighter around their average.
The Formula Nobody Remembers
Here's the actual formula for calculating Sx:
Sx = √[Σ(xi - x̄)² / (n-1)]
And for Sy:
Sy = √[Σ(yi - ȳ)² / (n-1)]
Where:
- xi = each individual x value
- x̄ = the mean of all x values
- yi = each individual y value
- ȳ = the mean of all y values
- n = number of data points
- Σ = sum of all values
The n-1 in the denominator is there because you're working with a sample, not an entire population. This correction (called Bessel's correction) gives you a more accurate estimate.
Step-by-Step: How to Find Sx and Sy
Example Data
Let's say you have 5 data points:
| Point | x | y |
|---|---|---|
| 1 | 2 | 3 |
| 2 | 4 | 5 |
| 3 | 6 | 7 |
| 4 | 8 | 9 |
| 5 | 10 | 11 |
Step 1: Calculate the Means
x̄ = (2+4+6+8+10) / 5 = 6
ȳ = (3+5+7+9+11) / 5 = 7
Step 2: Find Each Deviation from the Mean
For x-values: subtract 6 from each
- 2 - 6 = -4
- 4 - 6 = -2
- 6 - 6 = 0
- 8 - 6 = 2
- 10 - 6 = 4
For y-values: subtract 7 from each
- 3 - 7 = -4
- 5 - 7 = -2
- 7 - 7 = 0
- 9 - 7 = 2
- 11 - 7 = 4
Step 3: Square the Deviations
For x: 16, 4, 0, 4, 16 → sum = 40
For y: 16, 4, 0, 4, 16 → sum = 40
Step 4: Divide by (n-1)
n = 5, so n-1 = 4
40 / 4 = 10
Step 5: Take the Square Root
Sx = √10 = 3.16
Sy = √10 = 3.16
Your standard deviations are both 3.16. Makes sense here since x and y have the exact same spread.
How to Get Sx and Sy on a Calculator
Doing this by hand is tedious. Here's how to get these values fast.
TI-84 Calculator
- Press STAT
- Select 1: Edit
- Enter your x-values in L1 and y-values in L2
- Press STAT again
- Go to CALC
- Select 1-Var Stats
- Enter L1 (for Sx) or L2 (for Sy) and press Enter
The output shows Sx (sample) and σx (population if you need that instead).
Casio fx-9750GIII
- Go to STAT mode
- Enter data in columns
- Press CALC
- Select 1-Variable
- Choose your column
Using Sx and Sy to Find Correlation
Sx and Sy become useful when calculating the Pearson correlation coefficient (r). Here's the formula that uses them:
r = Σ(xi - x̄)(yi - ȳ) / √[Σ(xi - x̄)² × Σ(yi - ȳ)²]
This simplifies to:
r = Σ(xi - x̄)(yi - ȳ) / [(n-1) × Sx × Sy]
Once you have r, you can find it squared (r²) to get your coefficient of determination — which tells you what percentage of y's variance is explained by x.
Sx and Sy in Linear Regression
In simple linear regression, Sx shows up in the slope formula:
slope (b) = r × (Sy / Sx)
This relationship is useful. If you already calculated Sx, Sy, and r, you can find your regression line without doing all the extra algebra.
The y-intercept is:
a = ȳ - b(x̄)
Quick Comparison: Manual vs Calculator vs Software
| Method | Speed | Error Risk | Best For |
|---|---|---|---|
| By Hand | Slow | High | Learning the concept |
| TI-84/Casio | Fast | Low | Exams, quick homework |
| Excel/Sheets | Very Fast | Very Low | Large datasets |
| Python/R | Instant | Very Low | Research, automation |
How to Get Sx and Sy in Excel
Enter your data in two columns. Then use:
=STDEV.S(A2:A101) → gives you Sx for column A
=STDEV.S(B2:B101) → gives you Sy for column B
Use STDEV.P if you have the entire population, not a sample.
Common Mistakes That Mess Up Your Answer
- Using n instead of n-1 — you'll underestimate the spread
- Forgetting to square the deviations — negative and positive values cancel out, giving you zero
- Confusing sample vs population standard deviation — check what your assignment actually asks for
- Rounding too early — keep full precision until the final answer
- Using the wrong column — double-check you're pulling Sx from x-data and Sy from y-data
What Sx and Sy Actually Tell You
These values don't mean much on their own. They're useful when you compare them.
If Sx > Sy, your x-variable is more spread out than your y-variable.
If Sy > Sx, your y-variable has more variability.
In regression, a larger Sx in the denominator makes your slope smaller (for the same r value). This is why standardizing your variables matters when comparing effects across different scales.
When You'll Actually Use This
Sx and Sy show up in:
- Calculating Pearson's r
- Finding regression slope coefficients
- Standardizing variables (z-scores use similar logic)
- Comparing variability across different datasets
- ANOVA and hypothesis testing
If you're taking stats, you'll see these in almost every chapter from correlation onward.
The Bottom Line
Sx and Sy are just standard deviations for your x and y variables. The calculation is tedious by hand, but your calculator or spreadsheet does it instantly. The real skill is knowing why these values matter — they show up in correlation formulas, regression slopes, and help you understand your data's spread. Get those concepts down and the calculations become secondary.