The use of mixture-density networks in the emulation of complex epidemiological individual-based models
Davis C., Hollingsworth D., Caudron Q., Irvine M.
Abstract Complex, highly computational, individual-based models are abundant in epidemiology. For epidemics such as macro-parasitic diseases, detailed modelling of human behaviour and pathogen life-cycle are required in order to produce accurate results. This can often lead to models that are computationally-expensive to analyse and perform model fitting, and often require many simulation runs in order to build up sufficient statistics. Emulation can provide a more computationally-efficient output of the individual-based model, by approximating it using a statistical model. Previous work has used Gaussian processes in order to achieve this, but these can not deal with multi-modal, heavy-tailed, or discrete distributions. Here, we introduce the concept of a mixture density network (MDN) in its application in the emulation of epidemiological models. MDNs incorporate both a mixture model and a neural network to provide a flexible tool for emulating a variety of models and outputs. We develop an MDN emulation methodology and demonstrate its use on a number of simple models incorporating both normal, gamma and beta distribution outputs. We then explore its use on the stochastic SIR model to predict the final size distribution and infection dynamics. MDNs have the potential to faithfully reproduce multiple outputs of an individual-based model and allow for rapid analysis from a range of users. As such, an open-access library of the method has been released alongside this manuscript. Author summary Infectious disease modellers have a growing need to expose their models to a variety of stakeholders in interactive, engaging ways that allow them to explore different scenarios. This approach can come with a considerable computational cost that motivates providing a simpler representation of the complex model. We propose the use of mixture density networks as a solution to this problem. These are highly flexible, deep neural network-based models that can emulate a variety of data, including counts and over-dispersion. We explore their use firstly through emulating a negative-binomial distribution, which arises in many places in ecology and parasite epidemiology. We then explore the approach using a stochastic SIR model. We also provide an accompanying Python library with code for all examples given in the manuscript. We believe that the use of emulation will provide a method to package an infectious disease model such that it can be disseminated to the widest audience possible.