Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BACKGROUND: Testing for Blood-Borne-Viruses (BBVs) such as the human immunodeficiency virus (HIV), hepatitis C virus (HCV) and hepatitis B virus (HBV) is generally focused on specialist settings. However, people with undiagnosed infections are also present within the general population. We explore whether using machine-learning algorithms (MLAs) can identify people at heightened risk of HIV, HBV, HCV, or a composite 'any BBV' (defined as positivity for one or more of the three infections) in primary care settings. METHODS: From de-identified electronic health records data from 165 general practices in North East London we extracted risk factors for HIV, HCV and HBV and used them to train (75% data) and test (25% data) three MLAs: Logistic Regression (LR), AdaBoost with random under sampling (RUSBoost) and Balanced Random Forest classifier (BRFC). The ROC curves, ROC AUC, sensitivity and specificity values quantified the models' performance. Across the models the key features for each outcome were identified. RESULTS: A total of 1,987,954 patients were included in the study with no inclusion or exclusion criteria, from whom 75 predictive features were selected for HIV, 24 for HCV, 37 for HBV and 88 for any BBV outcome. Different models were optimal for individual BBVs positivity classification, depending on the accuracy metric. As a single infection, HCV was predicted most accurately across models and accuracy metrics. When targeting any BBV outcome, LR was the model with highest AUC value, BRFC was the most sensitive model and RUSBoost was the most specific model. The key identified features were similar across models with age the strongest predictor for both individual positivity and the composite outcome. A number of features were important for two of the BBV positive groups: Black African ethnicity (HIV and HBV), liver disease (HBV and HCV) and opiate and cocaine use (HBV and HCV). A number of individual features were important for individual BBVs positivity. CONCLUSION: Our findings illustrate that combining digital technology with routinely available general practice data has promise in improving case-finding of targeted BBV testing. There are however challenges in identifying the optimal MLAs and the accuracy metrics for multiple HIV/HCV/HBV positivity. This underscores the importance of evaluating different models and applying a broad set of accuracy criteria when utilising digital technology for precision medicine. CLINICAL TRIAL NUMBER: Not applicable.

More information Original publication

DOI

10.1186/s12879-026-13247-0

Type

Journal article

Publication Date

2026-05-11T00:00:00+00:00

Keywords

HBV, HCV, HIV, Machine-learning algorithms, Prediction of blood-borne viruses diagnosis.