Test if sampled data are randomly sampled

Is there a way to test if data are (or at least seem) randomly sampled? In other words, is there a way to measure if my data are randomly sampled -- instead of coming from a complex survey sampling for example -- to a statistically significant level? I imagine something like comparing means over repeated sub-sampling. Or is this impossible? If so, why?

asked Feb 6, 2015 at 21:36 233 3 3 silver badges 7 7 bronze badges

$\begingroup$ What is "random"? Let's say my number in the sample are "almost random": $x_i=i+\xi_i$, where $\xi_i$ - random variable from standard normal. Would this be sufficiently random for you? $\endgroup$

Commented Feb 6, 2015 at 21:45 $\begingroup$ @Aksakal I've tried to clarify the question I'm trying to ask. $\endgroup$ Commented Feb 6, 2015 at 21:51

$\begingroup$ @Tim thanks! Your answer (you should post it as an answer!) is along the lines of what I'm looking for. I did not gather the data myself. It was gathered by a government institution and they don't have any documentation available on the method they used for gathering it, so I thought I should look for something like a test instead of just assuming the data are randomly sampled. Also, it most probably is not, as you said, so I also wanted a way to quantify how much it looks like it's randomly sampled. $\endgroup$

Commented Feb 6, 2015 at 22:23 $\begingroup$ @ivanmp check my answer. $\endgroup$ Commented Feb 6, 2015 at 22:49

$\begingroup$ Since every possible sample is equally likely, you can't check all possible ways in which it could be non-random from a single sample. You instead need to have some particular kind (or kinds) of deviation from equiprobable sampling in mind - particular sampling characteristics (how would the sample look different if it were from soe complex design?). Once you know in what way it will be different, you can choose a statistic which is responsive to that. $\endgroup$

Commented Feb 7, 2015 at 3:46

2 Answers 2

$\begingroup$

The process of taking a simple random sample means that every possible sample has an equal probability of being the sample taken. This means that any sample that could have come from a more complex sampling scheme (stratified, cluster, etc.) could also have come from a simple random sample. So there is no definitive way to prove one way or another.

However, you could come up with a prior on how likely different types of sampling are, then do a Bayesian analysis to find the posterior probability of a simple random sample vs. the other types.

answered Feb 6, 2015 at 22:47 52.4k 2 2 gold badges 112 112 silver badges 189 189 bronze badges $\begingroup$

If the data was gathered in a methodologically sound way and samples are big enough then both samples could reflect the population. Generally, it appears that you want to see if the data reflects the population well enough (or stating it differently: if one of the samples is not biased). The best way to do this is to compare those samples to the population - are the properties of the sample similar to those in the population.

Another thing is "randomness" of a sample. First of all, there is no such a thing as a "random" survey sample - it is never possible to sample literally any person in the population with the same probability. You always make decisions on how would you sample individuals to your research. If you use telephone interviews then you don't sample individuals without phones, if you go door by door, then you sample mostly the unemployed or housewives etc. (check How to Lie With Statistics book for more examples). So what you should do is to take into consideration how and why your sample is biased and does it influence possible results and describe it in methodological part of your report. The general question should be here: "how much the sample possibly differs from the population?", rather then "how much it looks like it's randomly sampled?". There is nothing bad in non-random samples as far as you remember that they are not random and do not threat them as random.

Generally there are tests for randomness, however I don't feel that this is what you are looking for. Notice that, if this is a survey data, then people are not similar to "white noise" in any fashion, so there is no point in checking if the data is purely random.