Data-driven view on COVID-19 part #1

Italy has become the country with the highest rate of spreading COVID-19 disease. The number of infected people with coronavirus increases every day by 15%. It is predicted that the total number of infected people in Italy will surpass 100.000 by Thursday 26 March 2020.

In this short study, I have invested, whether the coronavirus is spreading equally fast over the whole week or more quickly over weekends when people (used to) meet more intensively. I have found out that the day of the week has no impact on the infection speed. Here is the reasoning for my findings.

Contrary to China, where the spread of coronavirus is slowing down, Italy experiences exponential growth of confirmed cases:

Source: Johns Hopkins coronavirus data, updated daily here.

This allows us to model the spread of coronavirus in Italy by a simple autoregressive AR(1) model

N_{d+1} = c + a*N_d + e_{d+1}

where

  • N_d is the number of confirmed cases in day d;
  • a is the speed of infection spread (i.e. the percentage increase of confirmed cases between two consecutive days);
  • c is a constant;
  • e is an error (assumed to be normally distributed);

It turns out that the constant practically equals zero because there are many days at the beginning of the observed period with no confirmed cases at all. Hence, we can drop the constant from our model.[1]

Then parameter a is estimated by the ML method as 1.148 (with a highly significant p-value of 2.36e-84), meaning that the daily speed of corona infection in Italy is around 15%. This allows us to predict the number of confirmed cases for the next 5 days (including today), i.e. till 26 March. As we can see, this number exceeds 100.000 cases:

Now let us verify, whether this infection speed in somehow affected by weekends. We could, for example, assume that the coronavirus is spreading faster during weekends because people (used to) go out and meet each other more intensively than during working days. Let us, therefore, extend our model by an (exogenous) indicator of weekends (i.e. whether day is Saturday or Sunday):

N_{d+1} = a*N_d + b*I{d=weekend} + e_{d+1}

The estimated coefficient b is, however, insignificant [2], hence we can conclude that the coronavirus is spreading equally fast during weekends as during working days.

In the next blog, I will investigate how the COVID-19 infection speed is influenced by the demographic and economic characteristics of affected countries.

[1] The ML estimate of parameter c is -780.51 with a p-value of 0.267, hence statistically insignificant.

[2] The ML estimate of parameter b is 24.27 with a p-value of 0.849, hence statistically insignificant.


Data-driven view on COVID-19 part #1 was originally published in ableneo Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.