In a study that demonstrated a commendable commitment to scientific progress, a researcher hid in a bathroom stall for hours and counted the number of times people washed their hands. The reason? People may exaggerate if you ask them; and people are more likely to wash their hands if they see you watching – incidentally, the precise hypothesis the researcher wanted to prove.
Collecting good field data is challenging, not least for hygiene behaviours. It is even more challenging during a global pandemic – when there is great urgency, but travel restrictions make in-person quality control more difficult. How can you conduct this vital research when it is so hard to collect the necessary data?
Our first piece of advice is to partner with a fantastic implementing organisation. We worked with BRAC in Bangladesh to test ways to increase handwashing. Surveys can be slow and unreliable, hidden observers are expensive – and so, we attached tally counters (‘clickers’) to the soap pedals of foot-operated public handwashing stations. Weekly checks by BRAC staff allowed us to estimate that one of our interventions led to an additional 100,000 handwashes with soap – read more about it here. It was a novel solution, but how could we be sure it really worked? In this blog, we outline what we did to make sure that our soap pedal tally counts reflected real behaviour change.
Lesson 1: Pilot your data collection method in advance
We piloted a number of key parts of our experiment in Bangladesh, including our data collection method (you can read more about the piloting in this blog post). Findings from the pilot allowed us to address some teething issues before we launched the experiment at scale. For example, we zip-tied plastic bags over the clickers to prevent them rusting from rainwater. We also clarified instructions for the station caretakers after realising some were diligently resetting the count to zero every day, which had given us some very odd results!
Figure 1: After the pilot, BRAC’s engineers covered the tally counters (‘clickers’) used to measure station usage in plastic bags to prevent rusting from rainwater
Lesson 2: Do independent data checks early in the process to identify if you have a problem with data quality
There’s only so much you can do by looking at pilot data to judge whether it looks accurate. We therefore worked with the BRAC Institute of Governance and Development (BIGD) to get a second independent read on the clickers once the trial was underway. Out of the 1000 handwashing stations built by BRAC, BIGD enumerators visited 100 randomly selected stations and recorded the clicker numbers for two days.
These ‘spot checks’ revealed a problem: they matched the clicker data we were receiving from caretakers and field staff less than 15% of the time. Working with BRAC and BIGD, we investigated several possible explanations for this discrepancy, from transcription errors to estimated data to mixed up station labels, but could not pin down their source. Thanks to the spot checks, however, we knew we had to cross-check the results against other data sources.
Figure 2: A worksheet completed by a station caretaker to record station usage
Lesson 3: Collect data on proxy measures for the primary outcome which aren’t vulnerable to the same data quality risks as your main measure
Although we believed that the clickers were the single best way of measuring handwashes at the stations, we knew the data collection process was labour intensive and could be susceptible to human error. We had therefore identified other data sources that, though less ideal in many ways, were not susceptible to the same vulnerabilities as the clicker data.
One such data source were the records kept by district managers of the amount of soap powder sent to each handwashing station each week. This soap distribution data was a less direct measure of handwashing, but it served as a useful proxy since the people involved in recording it, and the incentives around doing so accurately, were different to the clicker data. Any noise in this data was therefore likely to be uncorrelated with noise in the clicker data.
Another set of proxy data was collected by the BIGD enumerators who did the spot checks. To make the most of their time, we also asked them to conduct surveys and observations near the handwashing stations. This included asking passersby about how often they had used the staton in the past week. While self-reports aren’t ideal, this data served as a good proxy for the same reasons as the soap data.
These proxy measures showed similar results to the clicker data: that a high-intensity intervention including active promotion and free soap and mask giveaways increased handwashing, but only while it was delivered. Soap distribution was about 10% higher at the high-intensity stations, compared to the control and low-intensity stations, in the first three weeks of the trial, but not afterwards. Similarly, self-reported use of the stations was significantly higher at the high-intensity stations than at the other stations towards the end of the first three-week period. This convergence of findings across three distinct data sources gave us more confidence in the findings from the clicker data.
Figure 3: The primary analysis suggested the high-intensity intervention generated a 16% increase in station usage, but only for the first 3 weeks (while the intervention was actively delivered)
Figure 4: Soap distribution records kept by district managers showed a short-term spike in amount of soap distributed to the high-intensity stations which subsequently faded out
Figure 5: When people passing by the stations were asked whether they had used the stations at the end of week 3, a significantly higher proportion reported doing so at the high-intensity intervention sites
Lesson 4: Identify signs of unreliable data, and check the robustness of results when you exclude these observations
While the soap distribution data and survey data reinforced our main finding, we still weren’t certain about the size of the effect we observed in the clicker data. Since we couldn’t verify each clicker reading individually, we identified several signs that might indicate a problem at a specific station. These included:
- The last digit on the readings having an unusual distribution – e.g. too many 5s
- The change in the readings having an unusual distribution – e.g. too many 10s
- The readings across the three basins of the station moving unusually closely together
We identified 8 such signs in total, and then re-ran our analysis after excluding stations with data that appeared problematic according to each sign. This exercise suggested that the boost in handwashing caused by the high-intensity interventions was somewhere between 11% and 24%, as compared to the 16% estimate obtained from the full dataset. This range was narrow enough to give us confidence that the high-intensity intervention did in fact have an effect of roughly the size suggested by the full dataset.
Collecting good data can be hard, particularly during a pandemic, but good data is essential for robust, causal evaluations. We hope that the lessons above are useful when planning your next evaluation. If you’d like to work with us to develop and test innovative approaches to thorny problems, please get in touch.