Embracing insights from many sources

Randomised Controlled Trials (RCTs) often represent the best, fastest and most statistically straightforward way of determining whether a new intervention works. This is why RCTs are central to what we do at the Behavioural Insights Team. We can’t be sure that an intervention is going to translate from a lab study to the field, or from one context to another so we need to establish, as quickly as possible, both what works and what doesn’t work.

Over the last six years, we have conducted more than 300 randomised trials. We are very proud of the work that we have done to normalise this practice, particularly the trials that have been carried out in areas of policy where often this would not be considered possible, like social work. However, the use of RCTs is frequently criticised. While many of these criticisms are overblown, less applicable to a policy environment than to an academic one, or just plain invalid, some do hold water.

Angus Deaton and others have criticised RCTs for failing to test theories or understand the underlying mechanisms that drive RCT findings – this criticism is fair. Similarly, we find ourselves in a methodologically tight spot when it comes to average effects. The replication crisis in psychology is rightly leading us to be more cautious about running large numbers of statistical tests (this can lead to false positive results, which are misleading or incorrect). Yet Deaton and others criticise the lack of nuance associated with many RCT findings, which are concerned with average effects and cannot be analysed by varying participant characteristics. Looking at the average effects might lead us to ignore the real side effects experienced by some people, or to spuriously find that an intervention that is designed, for example, to help only the worst off in society has no effect at all.

Average effects might also miss the variation that is caused by an intervention being delivered well, or poorly, which can dramatically alter its effectiveness. As evidence produced by the Education Endowment Foundation shows, teaching assistants in schools can either be deployed in a way that is effective at increasing student performance, or in an ineffective way, actively causing harm.

Proponents of evidence-based policy, as well as scientists, cannot simply brush these criticisms aside, but nor should we panic. Instead, we should continue to run RCTs, while acknowledging these criticisms where they are valid, and looking to other fields and methodologies to help us to overcome them. In the last nine months, we have embraced two such methodologies as part of this pursuit, located at opposite ends of a spectrum of research methods.

Firstly, we have begun to systematise the use of qualitative research methods as part of our randomised trials – conducting interviews with both service users and professionals. In doing so, we can try to uncover whether an intervention is being implemented as we intended, unpick how participants react to an intervention and uncover mechanisms that we might otherwise have missed.

Our findings from this qualitative research provide us not only with a more detailed picture of each trial but also enable us to think about which interventions might be effective in different contexts. For example, interviews with participants in our recent ‘Study Supporter’ trial showed that participants would have been more likely to sign up if they’d been given clearer information and that often, it was difficult to initiate conversations with their supporters. Both of these lessons have been incorporated into the design of a new trial that is currently in the field – we have seen an increase in signups from 35 to 50 per cent.

The second methodology we have begun to use much more commonly is what is widely described as data science. Of particular promise, we are beginning to use machine learning techniques in conjunction with our randomised trials. These techniques allow us to robustly identify subgroups for which an intervention is either effective, ineffective, or harmful. Using machine learning in this way helps to overcome the criticism of not looking at how the results affect different types of participants, while cross-validation, enhancing how applicable our findings can be in other situations, goes some way to avoiding false positives (incorrect results).

While we continue to develop more and more sophisticated RCTs, we have invested in developing new expertise and building new research functions within our team. This methodological pluralism – accepting not just ideas but also ways of testing those ideas from different fields and disciplines – is an essential part of our research philosophy here at BIT. We believe it represents our best chance of achieving the maximum social impact from our work.

Embracing insights from many sources

Authors

Michael Sanders

Jessica Heal