Stratified sampling is a technique of random sampling where the entire population is divided into a fixed numbers of distinct groups or strata and then sampling units are randomly selected from each stratum. The entities with each sample should be same or similar, there should be no overlap between the strata and together all strata should be able to cover the entire population. In technical terms, these strata should be homogeneous, mutually exclusive and collectively exhaustive. Further, each sampling unit that is picked must belong only one stratum. To remove any bias, a simple random sampling could be used to pick out the sampling units from each stratum.
As a researcher, it is important to ensure that the final selection from the strata is representative of the population distribution. Each stratum might be of different size, based on their share in total population. Once you have divided your population into different strata, you could pick up a fixed percentage, say 5% of units from each stratum through simple random sampling to ensure that the final sample distribution is proportionate to the population distribution. This would ensure that the entire population is represented while also reducing any sampling error.
Example of Stratified Sampling
Say you are conducting a survey based on socio-economic class of the population. To go forward with this survey, you could divide the entire population into categories like lower class, middle class, and upper class. Say, from population size of 10,000 people, 3,000 belong to the lower class, 5,000 to the middle class and 2,000 to the upper class. If you take a 5% random sample from each stratum, your final sample would have a total of 500 people, of which 150 would belong to the lower class, 250 to the middle class and 100 to the upper class.
Note that the 5% number used is arbitrary and is only applied for the sake of explaining the concept. The size of your total sample would depend on the your desired confidence interval, standard deviation and margin of error (for an infinite population).
How is Stratified Sampling Different from Clustering?
In clustering, the entire population is divided into multiple groups or clusters (say communities or schools, etc.) of sampling units (households or children, etc.). Then instead of choosing individual sampling unit, an entire cluster is picked up using a random sampling technique. This is done because it is assumed that the clusters themselves might be broadly similar and variation can be found from within the clusters. Thus, clusters need to be formed in such a way that each cluster should have as much diversity as possible and thus, should have the potential to represent the entire population.
Clusters, taken together, should add up to the population (like the strata in stratified sampling). However, unlike the strata in stratified sampling which are homogeneous based on a particular characteristic (socio-economic class in our example above), clusters should be heterogeneous or diverse, such that each cluster should be representative of the population.
For example, if you would like to know the average marks of all 10th graders in Mathematics, you could first get a list of all schools (clusters) and randomly pick-up a few schools to see how well the 10th graders performed in those schools on the Mathematics exam.
Clustering can be single-step (when the entire cluster is chosen randomly) or two step (when from the randomly chosen cluster, a further random sample is picked up). In our example above, if from each of the chosen clusters, you further randomly select the 10th graders to check their score in Mathematics , then it would be a two-step clustering process.
Note that at times, these naturally occurring clusters like a school or community, might be more homogeneous that the population and could represent a skewed picture. Researcher should be mindful of thus anomalies and should correct them while designing the sampling technique.