Pinning the tail on the distribution: A multivariate extension to the generalised Pareto distribution
Clifton DA., Hugueny S., Tarassenko L.
Novelty detection is often used for analysis where there are insufficient examples of "abnormal" data to take a multi-class approach to classification. Models of normality are constructed from commonly-available examples of "normal" behaviour, and we then reason about the presence of abnormalities with respect to this normal model. Extreme value theory (EVT) is a branch of statistics that is concerned with modelling extremal events, and is therefore appealing for use with novelty detection. However, conventional existing EVT approaches are limited to the analysis of univariate or low-dimension data. This paper considers the peaks-over-threshold method of EVT, in which exceedances over a (typically univariate) threshold can be shown to tend towards the generalised Pareto distribution (GPD). We extend this method for use with high-dimensional data, allowing us to reason about the "extreme" data lying in the tails of the distributions of complex, real-world datasets, which are typically multivariate and multimodal. Illustrations are provided from the analysis of large clinical studies of hospital patient vital-sign data. © 2011 IEEE.