Andrew Simpkin

I am a Lecturer (Above the Bar) in Statistics at the School of Mathematics, Statistics and Applied Mathematics at NUI, Galway. My research focusses on longitudinal data analysis, functional data analysis, genomics and data science. I work on many interdisciplinary projects across medicine, engineering, biology, sociology and sports science. I’m interested in applied statistics and data science, developing methods and tools to work with people in a wide variety of disciplines. These collaborations often lead to questions which require novel theoretical statistical approaches.

I came to statistics from a mathematics background, having studied at Trinity College Dublin. After receiving my BA I came to Galway and completed a PhD with Prof. John Newell in 2011. The PhD focused on derivative estimation. In many cases, changes in a response over time are of primary concern, such that modelling the derivative of the response should take precedence. We developed methods in the area of nonparametric regression and smoothing which improved on established approaches to derivative estimation. While in Galway, I worked across many disciplines including biomedical engineering, medicine, computer science, sports science, economics. This exposure showed me the breadth of application of statistical methods.

I left Galway in 2011 and spent five and a half years at the School of Social and Community Medicine at the University of Bristol. There I worked on many biomedical applications, developing models for prostate cancer among others. I used linear mixed models to analyse repeated measures of prostate specific antigen (PSA) in cohorts of men with and without prostate cancer. We developed individualized PSA reference ranges which served as an alert for men questioning whether to remain on active surveillance. We packaged our model into an application and this was used successfully in patient – consultant discussion of ongoing active surveillance.

At Bristol, I worked as part of the MRC Integrative Epidemiology Unit to analyse DNA methylation data as part of the Accessible Resource for Integrated Epigenomics Studies (ARIES). Specifically, I developed novel machine learning models for the analysis of longitudinal DNA methylation data collected as part of the Accessible Resource for Integrated Epigenomics Studies (ARIES) program. DNA methylation data encompass up to 850,000 distinct signatures for each person at each time point in this birth cohort study. Using these models, we investigated patterns of DNA methylation in early life and the association of changes with several important clinical factors. We found that maternal smoking is associated with differences in offspring DNA methylation and that these differences persist through childhood and adolescence. On the other hand, we found that while maternal BMI and birth weight were associated with offspring DNA methylation, these differences resolved during childhood. I have also worked on the epigenetic clock in ARIES, identifying associations between sex, birth weight and epigenetic age acceleration in ARIES children. We also found links between maternal smoking and alcohol consumption and the age acceleration of their offspring. Modelling longitudinal big data (such as DNA methylation) is an important research focus. Big data which are also dynamic require care since data within individuals are correlated, requiring approaches which distinguish between- and within-subject variation. These approaches are often ignored in the big data setting.

In 2014 I was awarded a Career Development Award in Biostatistics from the Medical Research Council (MRC) to investigate flexible methods for analysing longitudinal data. I combined my PhD and postdoctoral work thus far. Here the goal was to gain insights from longitudinal trajectories of biomarkers and other measures, and use these as risk factors for future health outcomes. I developed models for estimating features of trajectories in the big data setting as they related to other longitudinal processes such as child growth. Developing models for repeated measures data where features of longitudinal data are of primary interest. Features, such as maximum velocity of height in adolescents, are useful biomarkers and there is a knowledge gap in how to estimate these alongside other dynamic processes. In this fellowship, I developed the methods which bridged this knowledge gap.

I spent a year at the Insight Centre for Data Analytics in 2017 as part of the Orreco-Insight project. I developed statistical-machine learning algorithms for longitudinal data arising in sports science in conjunction with a commercial partner, Orreco. These models led to web applications and two patents pending in the elite sports domain. The goal of these models is to optimise performance and athlete readiness through recommendations based on GPS, biomarker and wearable data collected from professional athletes repeatedly during a competitive season. In August 2018 I was appointed a Lecturer (Above the Bar) at the School of Mathematics, Statistics and Applied Mathematics at NUI, Galway.