Why do we calculate standard deviations like this?

The square root will give you the standard deviation. Without it, you would get the variance. One advantage of standard deviation over variance is that the unit of the standard deviation is the same as the unit of your dataset.

The main difference between the two formulas is, that the first assumes we know the true mean mu, while the second does not know mu a priori. Therefore it tried to estimate it using the arithmentic mean (x1+...+xn)/n . But since we are using the data points xi to both compute the mean AND the standard deviation, this "squared distance" to the arithmetic mean will on average be smaller than the squared distance to the true mean. Therefore, we reduce the denominator by 1 to make the fraction larger, to counteract this bias.
We square the differences from the mean in order to make them all positive, but this also has the effect of making the differences bigger (or smaller if their magnitude is less than 1). Therefore we put the whole thing under a square root so that bigger numbers aren't magnified quite so much.

You may wonder why we do this whole square-and-then-square-root thing, instead of just taking the absolute values of the differences from the mean. The reason is that the absolute-value function is not smooth, but the root-mean-square-difference function is. If a single point *p* begins above the mean but decreases over time, the mean absolute difference will fall, and then suddenly begin to rise when *p* goes below the mean. The graph will look like a "V". The standard deviation, however, will fall, slowing down as *p* approaches the mean, and curve will look more like a "U". This has useful properties, at the cost of perhaps being farther from your intuition about what the standard deviation "should" be.

The N-1 is called the *degrees of freedom*. It's hard to give the full details of where it comes from, but I prefer to think of it this way: if you have some samples x₁, x₂, ..., xN, then you can add a constant to all of them without changing the standard deviation: (x₁+c), (x₂+c), ..., (xN+c) will have a different mean but the same standard deviation. In fact, if you set c=-x₁, you can always set the first sample to 0 (and set the others to x₂-x₁,...,xN-x₁) without changing the standard deviation. So there is a sense in which the standard deviation only cares about x₁ as something to compare x₂,...,xN to, and not the actual value of x₁ itself. This leaves only n-1 samples whose value the standard deviation really cares about.

Confusingly, degrees of freedom are only relevant to the *sample* standard deviation. The *population* standard deviation just has an N, not an N-1. I don't know a really good reason why this is off the top of my head. You just have to remember.

Sometimes when we're looking at things other than standard deviation, like in chi-squared tests, the degrees of freedom can be something less than N-1. This indicates that we can set *more* than 1 sample to zero, with appropriate modifications to the other samples, without changing the quantity (for example chi-square) that we're looking at.

0 like 0 dislike