Skip to content

Data science at BIT – first year report

14th Dec 2017

This morning sees the publication of BIT’s first data science report. It marks the culmination of twelve months of work by our data science team, which was inaugurated in January 2017. The team have worked across policy areas from education to health to children’s social care and road safety, and made use of techniques including gradient boosted trees, natural language processing and causal machine learning.

There is no doubt that data science can be hugely beneficial as a tool for government, and that machine learning can be put to a more practical use than beating its own previous iterations at Chess. The challenge then, is to find ways to quickly identify problems that can benefit from data science, bring data together, and deliver practical insights quickly.

As behavioural scientists, we know that it’s very easy to fall foul of behavioural biases in planning, and we could easily have spent this entire year just getting started. With this in mind, we’ve tried to apply behavioural insights to ourselves (including a few tricks from our colleagues Owain and Rory’s book). As with the Behavioural Insights Team itself, the data science team was set up with a sunset clause – we have had one year to demonstrate the value of data science in policy. We also set a concrete goal – to produce three exemplar projects in that time, showing practically what could be done with data science.

Operating under this model, we’ve focused our energies on projects that are shovel ready, and continued with BIT’s goal of radical incrementalism – taking on projects where a small change, for example changing the inspection regime for schools, could make a disproportionate difference. As John Manzoni, the Chief Executive of the Civil Service, says in the foreword to the report, the full potential of data science will not be realised overnight – but we’ve tried to demonstrate what can be done with a small team operating at a fast pace for a year.

The results in the report show that we think, broadly, that we have succeeded. In the last eleven months we’ve completed 8 exemplars, ranging from predicting which schools or GP practices are most likely to fail an Ofsted or CQC inspection, to using causal machine learning to try and identify which groups benefit most from interventions in an RCT with King’s College London.

Our most successful projects so far are those that focus on increasing the effectiveness and efficiency of inspections. We found that 65 per cent of ‘requires improvement’ and ‘inadequate’ schools were within the 10 per cent of schools identified as highest risk by our model. Increasing this to the riskiest 20 per cent, our model captured 87 per cent of these schools.

Other projects have been more ambitious. A project that uses Natural Language Processing to predict social work case escalation in one local authority puts a huge range of information – the text that social workers put into their assessment – to uses that would not have been possible only a few years ago. We’re now working with that local authority on how to put the insights from data science at the fingertips of social workers as they go about their work. Information on how we’re attempting this – and the other exemplar projects we’ve completed this year – can be found found in the report.

In the next twelve months we’re hoping to expand on this promising work, expanding to tackle new predictive problems in new policy areas, and begin to see some of the applications of data science really put into practice. If you have a problem that you think might benefit from data science, please get in touch!