Jul 1, 2021

Aggregating Global Data from the Coronavirus Pandemic

By Trudo

From buying groceries, playing with kids, walking the dog, inviting people over for a barbecue to deciding the next steps to handle the crisis, properly-prepared data has proven to be one of the most important aspects of attempting to understand, mitigate and contain the spread of the virus. If anything, COVID-19 has highlighted the importance of using and depending on reliable sources of information to get informed and promote responsible decision-making, especially for families, households, businesses, organisations and many more around the world. In the midst of the pandemic, data analysis was one of the main sources of consulting when taking quick decisions, legislative action around international travel, supporting small businesses, bolstering financial markets and medical treatment to those who were infected. On a large scale, the impact of information sharing during the pandemic was truly transformative. Defining, retrieving, parsing and aggregating global data before distributing them around the world is certainly something that we weren’t ready to prepare for in such a short period of time.

The rapid spread of COVID-19 has made information and access to global data sources important to inform the public, governments and the private sector. Data and access available could potentially help organizations and other stakeholders address and manage the crisis better. In this blog, we discuss data access, data processing and simple data architectures to enable insights that you can unlock and use yourself.

Challenges involved with data related to the Coronavirus

The tremendous growth of articles published around Coronavirus has made it challenging to search and find relevant information sources. Publicly accessible articles and collections of studies have surely helped on the matter, for example the World Health Organization (WHO) has set up a collection of articles about COVID-19 and made the database publicly available. However, having access to reliable sources of information and data from global events is not easy, especially when the events are happening in real-time. The biggest data repositories around the world that include data on public health are accessed through an API, but often these data need to be transmitted to the public fast and they are provided through more accessible methods first (i.e. PDFs). Parsing and scraping information from these methods can definitely be time-consuming. A team of researchers at John Hopkins University have retrieved and parsed information from the PSD files into CSV files to store them on public repositories on COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE). What they did is they used data dashboards displaying global trends in the number of new events or cases of disease, recoveries, and deaths and the availability of state-specific data on COVID-19 data. Similarly, other organisations have contributed to retrieving and parsing of data on GitHub, such as the Pandemic Data Room. The Data Room was created in order to improve understanding of the impact of current policies and to generate critical information needed to adjust policies. Some data sources include HTML tables, Excel spreadsheets or location pins on Google Maps, and some are even in different languages. Although they’ve done a very good job in handling the bulk of information, they are still facing challenges in analyzing the data, conforming to the size, format of data and other aspects of the analysis. Another challenge lies in global norms for sharing global data during health emergencies. This means developing, analyzing, producing and gathering data in timelines required during pandemics. It’s equally important to rely on principles of open data, open science and data sharing to enable analysis of data gathered during COVID-19.

Image Caption Maecenas porttitor lobortis est congue tincidunt.

Use Cases Of Open Data In The Global COVID-19 Response

Mobile applications and other platform innovation during COVID-19 has so far proven to be quite successful and useful in making data-informed decisions to adjust retail behaviors or reduce risks. A good example of this use case is South Korea, where they developed mobile applications to include geocoded locations of shops and other facilities that disclosed information on places where infected people visited. Another great example is an app made by the Taiwanese government that is meant to show peak shopping times and places where masks are in supply. Other countries followed suit by developing “Corona apps” to share and disseminate information on COVID-19. Interconnectivity across data systems
It’s true, challenges involved with data are clear and hard to ignore. But still, we need to be looking for ways to improve and standardize systems in which we can improve the speed of data analysis and encourage better interconnectivity. Naturally this would require collaboration between stakeholders and compromise in deciding best practice and realistic data collection standards. Another alternative would be to process the amount of data already available from online public organizations using artificial intelligence. A combination of natural language processing and machine learning could potentially be able to recognize and process the spread of COVID-19 related data faster than any other repository.

Covid-19 has definitely demonstrated the importance of understanding and investing in data systems, cross-collaboration and policy innovation through steady flows of information. If anything, it taught us how being passive in terms of innovation with technologies is costly and that real-world data can be of utmost importance when it comes to monitoring, preparing and understanding for future challenges.