|Home| |New Reviews| |Software Methodologies| |Popular Science| |AI/Machine Learning| |Programming| |Java| |Linux/Open Source| |XML| |Software Tools| |Web| |Other| |All By Date| |All By Title| |Resources| |About| |
|
The standard deviation is a measure of statistical dispersion.
In plain English it's a way of describing how spread out a set of values are around the mean of that set. For example, if you have a set of height measurements, you can easily work out the arithmetic mean (just sum up all the individual height measurements and then divide by the number of those measurements). However, knowing the mean (or average, as it's more commonly called), doesn't tell you about the spread of those heights. Were all the people in your group the same height, or did you have some tall and some short, or was there one really tall person who towered over everybody else? It's possible that you could have exactly the same average height from wildly different groups. Knowing about how spread out those heights are compared to the mean gives you extra information over and above the mean value.
The basic idea of the standard deviation is that you're measuring variations around the mean value. Some of those values will be below the mean, some above and sometimes you'll have some that are equal to the mean. In other words some of the differences between the individual measurements will be positive (more than the mean), some will be negative (below the mean) and some will be zero (directly equal to the mean). Now just adding these differences up is dangerous because the positive and negative values will cancel each other out. For example, to take an incredibly simplistic case, if you've got a sample of two values, one of 9 and one of 11, the mean is equal to 10. The differences are -1 and +1, adding these together gives us a total variation of 0. But we know that there's not zero variation around that mean value!
So, to get round this problem each of the variations around the mean is squared. When you square a negative value you get a positive value. So, to work out the standard deviation we square all of the differences from the mean, add them all up and divide by one less than the number of values in our set. This new number is called the variance. Now we take the square root of the variance (which is reversing the squaring we did earlier so that our number is closer to the original differences), and that's the standard deviation.
Here's the recipe again:
That's it for the most part. A set of values that are closely clustered near the mean will have a low standard deviation, a set of numbers that are widely apart will have a higher standard deviation and a set of numbers that are all the same will have a standard deviation of zero (because they're all equal to the mean anyway).
If your data is normally distributed, in other words you've got most data near the mean and the further away you get from the mean the fewer measurements you have, then the standard deviation gives you extra information. A lot of real world measurements are normally distributed - height, weight, test scores, wages… - and if you graph the data it has a bell-shaped curved. For data that is normally distributed the standard deviation gives us the following information:
So, if the mean of your data set is 100 and the standard deviation is 10, you can expect to see about 68% of values in the range 90-110, and 95% to be in the range 80-120. And anything below 70 or above 130 is going to be very rare…
Yes. Most of the data you ever have to deal with is a sample - after all you can't measure the height of everybody in the world. However, in those cases where you do have data for a complete population then a slightly different formula applies. If you're using Excel or other tool to do your calculations then make sure you pick the correct standard deviation formula.
If you're not a maths wiz but need to get a grip on stats, then there we at TechBookReport have done the hard work of looking at a range of statistics books that are pitched at the beginning student or non-expert. These recommended titles are: Statistics For Dummies, Statistics For People (Who Think They) Hate Statistics and Intro Stats. Take a look at the reviews and then do the decent thing and buy the book. If you need to do well in stats it'll be worth the investment.