Random samples—what to do when yours aren't: data analysis, part threePhilanthropy Daily

3 min read

June 15, 2020

Random sampling is hard to achieve in human studies. That doesn’t make it useless, but does affect its utility The third in a three-part series on how nonprofits can use data to aid their fundraising

This is part three of a series of short columns on problems in data analysis. I will use the lens of COVID-19-related data and interpretation (and misinterpretation) to shed some light on how nonprofit fundraising operations can avoid similar data-driven issues. See parts one and two.

Imagine you take an anonymous poll of your friends. You ask them two simple questions: am I funny? Am I good person?

You are relieved when you get the results—your friends rank you highly on a 10-point scale on both questions! You decide to put away some of that crippling self-doubt you’ve always had.

But one friend who is a statistician (and an odd guy generally) calls you up to lodge a complaint. He says that you can’t really figure out whether you are funny or good from the poll, since you only polled your friends.

Your friends, of course, are much more likely to say nice things about you—even anonymously—because they are your friends. Otherwise they might not be your friends! And even amongst your friends, those who secretly dislike you or think you are not funny are probably much less likely to answer the poll than those with genuine affection for you.

If you were to poll a completely random sample of people that you have met in person—not necessarily friends—you might reach completely opposite conclusions about your likability. So, other than distancing yourself from your awkward statistician friend, it’s unclear what you should do. Capturing this random set of acquaintances is extremely difficult to do.

RANDOM SAMPLES IN HUMAN STUDIES

The same problems with getting a random sample for studies are found throughout nearly every use of data related to humans.

Random sampling was at the core of the blur of information at the beginning of the COVID outbreak. It was impossible to say how many people had the virus, because only people with virus symptoms were permitted to get tests, meaning that the percentage of tests that came out positive was probably dramatically higher than the prevalence of the virus in the general population.

One effort to get around that sampling problem was to try to establish just how far the virus had spread by administering seroprevalence studies—tests that show whether a person had COVID in the past (aka “antibody tests”).

This was a great idea in theory, but even so, getting a random sample can prove difficult. One now infamous study (more on that in a future column) attempted to estimate seroprevalence in California by running Facebook ads to get people to come take the test. But as critics pointed out, if you’ve been perfectly healthy for the past few months, wouldn’t you be less likely to answer the ad, compared to someone who suffered mild COVID-like symptoms during the same time period? Thus the sample wouldn’t be truly random, and would produce an inflated estimate of the number of people who have already contracted COVID-19.

RANDOM SAMPLES IN NONPROFIT DATA

Random sampling issues come up frequently in nonprofits. If you are a grantmaking foundation and you survey your grantees on how difficult they found the application process, you’ll get much rosier results than if you also surveyed those who didn’t make the cut.

If your nonprofit administers a donor survey, you will hopefully receive a lot of positive feedback—but that may be the result of selection bias, at least partially. Those who have extremely positive opinions about you are the most likely to answer—and those who have extremely negative opinions about you are unlikely to be on your donor list to begin with!

To take a more obscure example, many surveys are constructed with a certain carrot at the end to encourage participation. For example, “take this survey and be entered to win a new iPad.” These sorts of structures are a good way to encourage participation, but they also potentially introduce bias. Someone who really wants an iPad and for whom buying an iPad would constitute a major purchase are much more likely to answer than wealthier people who might already own an iPad or have no interest in acquiring one.

This is not to say that such surveys and studies are worthless. It’s just that when conducting surveys or studies of individuals, it pays to ask basic questions about your sample before you start drawing conclusions from your results.

Who is most likely to answer this survey, and why? What are their motives? Who won’t answer it? If someone has strong convictions about the topics you are interested in that aren’t positive, will they come forward and say it, or no? Could I offer this survey in a different format or using a different method to include more representative responses?

As in all aspects of data gathering, taking a critical eye will help you better interpret the data, and use your limited time and resources for surveys or focus groups to find true answers to the questions about your organization that keep you up at night.

Random samples—what to do when yours aren't: data analysis, part three

RANDOM SAMPLES IN HUMAN STUDIES

RANDOM SAMPLES IN NONPROFIT DATA

Leave a Reply Cancel reply

MORE FROM Matthew Gerken: