Search This Blog

Thursday, July 22, 2010

Thoughts on sampling

My academic discipline is mathematics. It is natural for someone to say, "Oh you know a lot about statistics," but the truth is more complicated.

I am a topologist, and I've only ever had one course in statistics which was as an undergraduate and was theoretical. However, at a school like ours, we are not allowed to be to picky as to what we teach, so, as a consequence of this, I started teaching our elementary statistics course about fifteen years ago and discovered that I liked the subject. Now, after having taught it for that length of time, I cannot claim expertise, but I can say that I do understand some of the basic ideas.

One of the basic ideas is that of a sample. I would like to explain that to you.

Statistics is about the search for answers in a particular spirit. We begin with the need to know, but with the realization that we can't know exactly. Given that, we proceed to find the approximate answer with the realization that the approximate answer will, in a certain number of cases, not be good enough. It is very practical.

We might start out desiring to know what the average IQ of the students on our campus is. The obvious way to approach this would be to administer an IQ test to every student on campus. There are problems with this, however: Students don't like taking tests, testing is expensive, and so forth. What we would need to do, in this case, would be to take a sample.

The idea is that a sample serves as a representative of the whole. One of the great mathematical miracles that make statistics work is that most samples do serve as representatives of the whole. One of the dangers is that regardless of that, bias can creep in through sampling techniques.

The gold standard of taking a sample of size 100 would be to choose 100 student identification numbers at random and then to administer the test to those 100 individuals. One problem with this is that some of that 100 will not show up, so we would have to over sample by a certain amount. But a more basic problem is to get any so show up at all. It is possible to offer incentives, but then the risk is biasing the sample by the type of reward you offer. What if you offer an iPod as a prize? Will you bias this by attracting students who like to gamble? Will you bias it toward groups that don't already have iPods?

In a similar project, we've dealt with this by choosing to measure particular classes, not for IQ but for other things. In doing this, there is a risk of bias toward students who choose particular classes, but the low cost of this method makes it very attractive and, being aware of the possibility of bias, we can be on the guard against it.

No comments:

Post a Comment