From buying groceries, playing with kids, walking the dog, inviting people over for a barbecue to deciding the next steps to handle the crisis, properly-prepared data has proven to be one of the most important aspects of attempting to understand, mitigate and contain the spread of the virus. If anything, COVID-19 has highlighted the importance of using and depending on reliable sources of information to get informed and promote responsible decision-making, especially for families, households, businesses, organisations and many more around the world. In the midst of the pandemic, data analysis was one of the main sources of consulting when taking quick decisions, legislative action around international travel, supporting small businesses, bolstering financial markets and medical treatment to those who were infected. On a large scale, the impact of information sharing during the pandemic was truly transformative. Defining, retrieving, parsing and aggregating global data before distributing them around the world is certainly something that we weren’t ready to prepare for in such a short period of time.
The rapid spread of COVID-19 has made information and access to global data sources important to inform the public, governments and the private sector. Data and access available could potentially help organizations and other stakeholders address and manage the crisis better. In this blog, we discuss data access, data processing and simple data architectures to enable insights that you can unlock and use yourself.
Challenges involved with data related to the Coronavirus The tremendous growth of articles published around Coronavirus has made it challenging to search and find relevant information sources. Publicly accessible articles and collections of studies have surely helped on the matter, for example the World Health Organization (WHO) has set up a collection of articles about COVID-19 and made the database publicly available. However, having access to reliable sources of information and data from global events is not easy, especially when the events are happening in real-time. The biggest data repositories around the world that include data on public health are accessed through an API, but often these data need to be transmitted to the public fast and they are provided through more accessible methods first (i.e. PDFs). Parsing and scraping information from these methods can definitely be time-consuming. A team of researchers at John Hopkins University have retrieved and parsed information from the PSD files into CSV files to store them on public repositories on COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE). What they did is they used data dashboards displaying global trends in the number of new events or cases of disease, recoveries, and deaths and the availability of state-specific data on COVID-19 data. Similarly, other organisations have contributed to retrieving and parsing of data on GitHub, such as the Pandemic Data Room. The Data Room was created in order to improve understanding of the impact of current policies and to generate critical information needed to adjust policies. Some data sources include HTML tables, Excel spreadsheets or location pins on Google Maps, and some are even in different languages. Although they’ve done a very good job in handling the bulk of information, they are still facing challenges in analyzing the data, conforming to the size, format of data and other aspects of the analysis. Another challenge lies in global norms for sharing global data during health emergencies. This means developing, analyzing, producing and gathering data in timelines required during pandemics. It’s equally important to rely on principles of open data, open science and data sharing to enable analysis of data gathered during COVID-19.
Image Caption Maecenas porttitor lobortis est congue tincidunt.