Developing machine learning algorithms and data science platforms to understand and improve air quality in London

United Kingdom
Published at 13/04/2023 Last update 20/04/2023

By utilising city-wide air quality sensors, this project is developing machine learning algorithms and data science platforms to understand and improve air quality in London.


Problem or opportunity

Air quality in London has improved in recent years due to policies to reduce emissions, primarily from road transport. However, significant areas still exceed NO2 EU Limit Values. Poor air quality has particularly been identified as a threat to health, with an estimated 9,000+ Londoners dying early every year. Similar issues affect most cities across the UK and Europe.



This work will ensure that data from a wide range of networks can be brought together in a single place for analysis. It will get data into air quality models from a range of quality sensors. It will ensure that we monitor the effectiveness of the different interventions planned across London. It will present the best estimates and forecasts in a way that app and web developers can then use to inform Londoners and will accurately find low-pollution routes for Londoners to follow when walking, cycling or running through the city.  The project researchers are developing machine learning algorithms, statistical methodology and data science platforms to understand and improve air quality over the city of London. Integrating varying-fidelity heterogeneous sensors in an overall real-time monitoring network for air quality, the project will develop state-of-the-art machine learning models for high-resolution air quality forecasting and change-point detection. This will help establish the most effective places to site future sensors and inform policy to make targeted interventions that reduce pollution levels in critical areas and at crucial times. The project’s goals will be complemented by the parallel development of APIs and mobile apps to provide reliable, frequently updated and highly localised air quality data and forecasts for Londoners. Graph optimisation algorithms will be developed to use the air quality forecasts to find less polluted routes for people walking, running and cycling around London’s streets. Algorithms will be analysed for their complexity, efficiency and practicality.



A revolution is happening in air quality monitoring. Traditionally, a relatively small number of reference quality sensors are used, followed by a period of modelling to create a London-wide snapshot. With the proliferation of increasingly affordable air quality sensors, it is possible to monitor air pollution at thousands of locations in a city, greatly enhancing our ability to target and prioritise planned interventions. Increasingly companies, non-profit organisations, community groups and individuals also want to monitor the air and invest in sensors.  This project develops machine learning algorithms, data science platforms and statistical methodology to integrate data and air pollution measurements from various heterogeneous sources to better estimate and accurately forecast air pollution across London. Given these hyper-local estimates and associated uncertainty, the group develops algorithms and optimisation techniques to inform citizens and help design and evaluate government policy.

A whole cloud-compute-based system has been developed, and our big data is stored in Azure. APIs have been developed and are currently refined. Multiple papers have been published in top venues in AI.

Some data sources are European Centre for Medium-Range Weather Forecasts(ECMWF) satellite which measures air pollution globally. We combine this satellite information with other data from ground sensors in the London Air Quality Network, which measures multiple pollutions every few minutes. Apart from the sensing pollution components, we also want to capture urban factors to the pollution. We are collecting traffic data by combining 900 traffic cameras with 11 000 scoot loop detectors operated by Transport for London, which is counting cars passing by. We are retraining our models daily with dynamic data from API to forecast air pollution hourly.

Kubernetes executes scheduled tasks and hosts API access points for direct data collection. Deployment is controlled by Terraform, containing each component of the processing pipeline.