What is Stratified Sampling?
Magpi has now been used for nearly 20 years by groups working in global health and international development, as well as many other fields. Although Magpi, just like a paper form, doesn’t dictate your sampling strategy for any data collection, it’s definitely true that lots of public health data collection involves stratified sampling. Consider this blog post a basic introduction to the subject.
When Should You Use Stratified Sampling?
Imagine if you are interested in the affect of government payments to the poor on childhood nutrition. Depending on the country or state or location of interest, there might be subgroups of the group of interest (“the poor”) that are also of interest. In India, for example, you might be interested in how such state payouts affected kids in low-income Dalit (formerly called “untouchable”) households versus kids in other low-income households.
Stratified sampling is a sampling method used in statistics to address this kind of situation, where the main group has significant subgroups of interest. Common variables used to define subgroups include age, gender, race, income, education level, or geographic location. The sampling technique involves dividing a population into subgroups, or strata, and then selecting a sample from each subgroup in proportion to the subgroup's size. This ensures that you get a usable sample for each subgroups.
Of course, stratified sampling isn’t just for health-related topics. In another example, you might want to investigate the effectiveness of a new teaching method in improving test scores among high school students, but the population of high school students is diverse in terms of grade level, academic ability, and socioeconomic status. In this case, stratified sampling could be used to make sure that the sample is representative of students in each subgroup of interest (e.g. in each grade, or in distinct socioeconomic groups, or by race or ethnicity).
When Stratified Sampling Doesn’t Make Sense
Stratified sampling is most useful when the population is not too homogenous: the population needs to have subgroups for you to be able to theorize about them, and to design a sample that includes them. So if there aren’t any interesting subgroups in the population, stratified sampling won’t make sense.
Likewise the population has to be big enough to allow you to do your subgroup sampling and still have an adequate representative sample. So even if there are interesting subgroups, you’ll have to check with your statisticians to see if there are enough people in each subgroup to stratify.
How Stratified Sampling is Done
The approach begins by first dividing the population of interest into mutually exclusive strata that are homogenous in some way that is relevant to the research question. The “mutually exclusive” part is key: this won’t work if any respondent belongs to two ethnic groups, for example, if ethnicity is a subgroup of interest. This can be done based on any characteristic that is important to the research question, such as age, gender, race, income, education level, or geographic location. The size of each stratum is determined based on the proportion of the population that belongs to that stratum. Then, a random sample is selected from each stratum in proportion to its size. This can be done using simple random sampling, where each member of the stratum has an equal chance of being selected, or by using other methods such as systematic sampling or stratified random sampling with equal allocation.
So to list out the steps involved in stratified sampling:
Choose your strata — first you’ll need to look at the population you’re interested in, and decide which subgroups in it are important to your research or activity.
Figure out the size of each strata — the relative size for each strata is proportional to the portion of the main population that belongs to each subgroup, and the absolute size must be sufficient to allow the desired statistical analyses (hint: hire a statistician right after reading this article).
Sample each strata — as noted above, you can choose from a number of approaches to sampling each strata, as long as you use the same method for each subgroup.
Those three steps — easier to write down than to do, of course! — will give you a dataset that allows you to answer the questions you’re hoping to answer.
Pros and Cons of Stratified Sampling
Pros
One advantage of stratified sampling is that it can increase the precision and accuracy of sample estimates by reducing the variability of the sample data. This is because stratified sampling ensures that each subgroup is represented in the sample, which can reduce the effect of sampling error and increase the representativeness of the sample. If you’re not a statistician, suffice it to say that stratified sampling helps to make sure that your sample size is big enough to let you make conclusions about the various subgroups you’re interested in.
Another advantage of stratified sampling is that it can allow for comparisons between subgroups. By selecting a proportional sample from each stratum, researchers can compare the characteristics or outcomes of different subgroups within the population. This can provide insights into differences or similarities between the subgroups that can inform policy or interventions — in fact, this is often the main reason for the research.
Cons
Nothing is free, of course: stratified sample is more complex, and can be more time-consuming than other sampling methods (like simple random sampling), as it requires you to know the population characteristics in advance (so you can have an idea which subgroups might be relevant to your research), and to be able to stratify the population accurately (if you’re stratifying based on ethnicity, for example, how will you determine the ethnicity of each respondent?). This can make stratified sampling more expensive and difficult to implement than other methods, particularly when the population is large and diverse.
Where Magpi Comes In
Magpi doesn’t dictate what kind of approach you take to sampling. Whether you’re planning on using convenience sampling, simple random sampling, stratified sampling, cluster sampling, or some other scheme, you’ll want to use a robust and well-tested electronic platform to collect the data (we’re hoping we don’t have to convince you of the benefits of mobile electronic data collection). Magpi’s been around almost twenty years, used by universities, corporations, UN agencies, the US and many other governments, and lots of small organizations looking to make their data collection dollar go farther. No other mobile data system has our decades long record of reliability, ease-of-use, or support, and we hope you’ll consider it.
FAQ
Where is stratified sampling used?
Stratified sampling is used all over the world, for a variety of different topic areas, whenever the population of interest has important subgroups. So if you’re a biologist looking at the adaptation of arctic populations to global warming, your subgroups might be different kinds of animals: polar bears, penguins, etc. If you’re a public health doctor and interested in vaccination rates in Chile, you might be interested in looking at socioeconomic status subgroups: those with no education, primary education, etc.
What is the difference between cluster and stratified sampling?
Stratified sampling divides the sampled population into subgroups based on characteristics of interest that already exist within the entire population: socioeconomic status, age, ethnicity, political affiliation, etc. So for a political survey in the US, you might have three subgroups: Democrats, Republicans, and Independents, and you are interested in differences in responses among those three subgroups.
Cluster sampling divides the sampled population into clusters in order to reduce the cost of sampling. So if you are interested in what consumer products are popular in California towns and cities, you might make a list of the towns/cities and randomly pick some number of towns/cities from the list, and then just choose respondents from the randomly-chosen ones. This is in place of sampling from all the towns, and reduces your time and expense.
In general, stratified sampling increases costs but provides more information about existing subgroups, while cluster sampling decreases costs.
Is stratified sampling biased?
In general, stratified sampling reduces bias when properly implemented. For that to be true, the strata or subgroups in the sample need to be well-defined, and mutually exclusive. In addition, the strata must be sampled proportionally, as discussed above.
Conclusion and Further Reading
Hopefully, this brief intro has given you a basic understanding of what stratified sampling is, and the strengths and weaknesses. Want to learn more? Here are a few good resources: