REGIONAL COVID-HUB: COVID-19 data and e-services platform for Wielkopolska region
The Institute of Bioorganic Chemistry Polish Academy of Sciences in Poznan (IBCH PAS), together with the affiliated Poznan Supercomputing and Networking Center (PSNC) serve as a national hub for the European COVID-19 Data Platform that supports the scientific community by accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing. To facilitate the access to expert-verified information and data describing the state of the COVID-19 pandemics in the Wielkopolska region, IBCH PAS & PSNC transferred the best practices of the European initiative and launched REGIONAL-COVID-HUB – a digital platform focused on Wielkopolska region, offering various services and AI-based tools together with open and restricted access to COVID-19-related data. The platform can be used by the end-users with different backgrounds: from the research and medical institutions and experts in the field of biology, chemistry, and computer science through the public administration to the general public.
Currently REGIONAL COVID-HUB offers access to the following services:
- an interactive map of COVID-19 pandemic in Wielkopolska built on the genomic SARS-CoV-2 data (Nextstrain). The software allows to reconstruct the phylogenetic tree and analyze it to track the spatial spread of different viral variants. Based on the evolutionary relationship between different samples, one can map how the outbreak has unfolded, where the virus has been. The viral spread can be followed backward and forward on the animated map and analysed in the context of historical events e.g. to monitor the policy efficiency or protective measurements effectiveness;
- COVID-19 dashboard for visualization of the state and dynamics of COVID-19 pandemic in individual administrative divisions in Wielkopolska provides epidemiological statistics, information about virus evolution (based on genetic screening of both samples and wastewater monitoring)
- e-service for interactive processing, analysis and visualization of data related to COVID-19 pandemic in Wielkopolska, e.g.: automatic analysis of medical imaging data using AI/MLs, automatic analysis of genomic data;
- e-service for remote communication and tele-consultation – provides support for the remote work of scientific and crisis teams established to fight the COVID-19 pandemic;
- dedicated storage space for collecting and sharing COVID-19 biomedical and epidemiological data (according to FAIR principles) together with open/public documents for institutions and citizens in Wielkopolska region. The REGIONAL COVID-HUB platform users can manage data by creating their logical structure and entering, downloading and sharing them efficiently and intuitively. They can assign labels and metadata to data, advanced data search, quick preview, description and comment. Access to this particular tool is provided via web browser as well as a dedicated programming interface (API) compliant with the applicable standards of interoperability, data openness and good engineering practices;
- knowledge base (FAQ, expert-verified information on SARS-CoV-2, useful links), lesson plans and educational materials (immunology, virology, genomics) for knowledge dissemination, rising public awareness and counteracting fake news.
- Thanks to centralized identity and access management, users can access all the services using single identities and logging in via the single sign-on page. The access can be provided with secret registration codes, without collecting users personal data in advance, which strengthen the GDPR compliance within the platform.
Highlights of the selected e-service
An essential element of combating the COVID-19 pandemic was and still is the rapid diagnosis of SARS-CoV-2 infections. Thus, the most specialised e-service offered by REGIONAL COVID-HUB involves AI-based tools supporting COVID-19 diagnostics. They were developed to enable the process of training artificial intelligence models on data from various sources, in particular image data, and then classify patients for SARS-CoV-2 infection. This solution was created as an asnwer to ever growing demand for fast and automated diagnostic support. SARS-CoV-2 , in addition to symptoms such as high fever, loss of smell and taste, or dry cough, can cause pneumonia, which leads to tissue changes that are easy to identify on x-rays. Experts can distinguish bacterial pneumonia from that caused by SARS-CoV-2, however, the process in time consuming and tedious, considering the number of patients that needed the diagnostic during the pandemic.
The process of automatic analysis of medical imaging data using AI was started by collecting the appropriate public datasets. Labeled, 3-class image data was needed in order to perform ternary classification. Iteratively some convolutional deep learning models were being designed, trained, and tested. By analysing the test results, changes were being made to the system, and further iterations were started.
The best model based on convolutional artificial neural network was selected and integrated with web application to allow end users to classify chest X-ray images using web browser.
The end result is in the form of web application, where users can upload their images of chest X-ray. The application uses the aforementioned model to perform classification, which is capable of either telling the user if the patient is healthy, COVID-19 infected, or infected with some other pulmonary disease like pneumonia.
In the process of creating this system, the main research has branched out into the research on the new oversampling algorithm called Adversarial OverSampling. In medical data long-tailed data distributions are very common. Regular, well-performing data imbalance techniques are often deemed inappropriate for image data, so novel technique was developed, which also performs well on other image datasets. The main research was foundation to the development of this algorithm, and it resulted in a publication at ECML PKDD 2022 workshops [https://proceedings.mlr.press/v183/wojciechowski22a/wojciechowski22a.pdf].
From the machine learning standpoint there is a room for extension of the presented solution. The system can be updated with some new image data, which would increase robustness of the classificator, because of the fact that it would learn newer COVID-19 variants. Intelligent methods of training the model can also be tested, like active learning. Other extensions may include support for amangement of large datasets and their batch analysis.
To measure the performance of the ML part of the application, standard deep learning measures have been employed. Due to imbalanced nature of the data, f1-score and geometric mean were computed to describe the effectiveness of the neural network. Metrics were computed on a test set, separate from the training set used to train the model.
Problem or opportunity
An effective response to COVID-19 challenges requires coordinated cooperation of the public administration, scientists, health-care providers and the general public. REGIONAL COVID-HUB provides e-services to help research and public institutions (Sanitary-Epidemiological Stations, hospitals, Marshal of the Wielkopolska Region), as well as general public to counteract COVID-19 pandemic in Wielkopolska region. REGIONAL COVID-HUB brings together relevant epidemiological and genomic datasets for monitoring and modeling the dynamics of COVID-19 pandemics in Wielkopolska, in an effort to support evidence-informed policy-making, improve public service deliveries and enhance internal management.
Additionally, the e-service for the automatic analysis of medical imaging data using AI is tackling the problem of automated classification of chest X-ray images to identify those with COVID-19 symptoms. Thanks to the service medical doctors can perform diagnosis based on X-ray images in shorter time, and spot the changes easier than they could do with a bare eye.
Thanks to the online platform, end-users can get easy access to aggregated official and experts-verified up-to-date data and information. The platform comprehensively and intuitively allows you to track the dynamics and current state of the COVID-19 pandemic in the Wielkopolska region and its districts. This dedicated service helps to raise general social awareness and support decision-makers (representatives of medical centers, government administration, local government units, entrepreneurs and citizens) in making rational and informed decisions related to the COVID-19 pandemic based on most comprehensive data.
The REGIONAL COVID-HUB platform provides videoconferencing and multimedia support for the remote work of scientific and crisis teams established to fight the COVID-19 pandemic. This tool provides an easy-to-use, secure, easily scaleble and accessible videoconferencing solution as an alternative, open and independent to commercial solutions, in particular not limited to several users.
The system of automated classification of chest X-ray images can perform thousands times more diagnoses than a human medical doctor. User can also classify many images in bulk.
The following technologies are adopted on the REGIONAL COVID-HUB Platform:
- Interactive map of the pandemic – Nextstrain
- COVID-19 dashboard for visualization – combination of variety of data sources in order to summarize the information in self-made graphs and tables
- e-service for interactive processing – Project Jupyter
- e-service for remote communication – eduMEET
- storage space – Nextcloud
- Knowledge base, educational materials – Wordpress based webpage with actual information of the pandemic
- Identity management – Keycloak
- Access management – Ladon
All the web applications are deployed on the OKD-based container platform at the Institute of Bioorganic Chemistry PAS - Poznan Supercomputing and Networking Center.
For the interactive COVID-19 map SARS-CoV-2 genomic data were generated on site at the Institute of Bioorganic Chemistry PAS in Poznan and collected from public databases (European Nucleotide Archive, GISAID). Dashboard is built based on epidemiological data retrieved from the Ministry of Health.
For the AI-based tools supporting COVID-19 diagnostics, datasets from many sources were carefully examined, downloaded and combined into one big dataset. It was then ensured, that there were no duplicates in the dataset. Subsequently a standard preprocessing pipeline has been done. The image data was resized to one size, standardized, and further scanned for errors, like all-black images. To develop the model, open-cv and numpy libraries were used to store and preprocess the data, matplotlib and seaborn to perform data visualizations, tensorflow2.0/Keras to train ML models, scikit-learn and imblearn to evaluate the model performance. To track ML pipeline and version control it we used MLflow. The main programing language was Python 3.