Chris Holmes, University of Oxford, Oxford, United Kingdom
Bayesian learning at scale with approximate models
Bayesian inference is predicated on the likelihood function being a precise reflection of the world for some setting of the function parameters. In reality all models are false. If the data is simple and small, and the models are sufficiently rich, then the consequences of model misspecification may not be severe. Increasingly however data is being captured at scale, both in terms of the number of observations as well as the diversity of data modalities. This is particularly true of modern biomedical applications, where analysts are faced with integration of medical images, genetics, genomics, and biomarker measurements. If Bayesian inference is to remain at the forefront of data-science then we will need new theory and computational methods that accommodate the approximate nature of scalable models.
Chris Holmes is a Professor of Biostatistics and UK Medical Research Council (MRC) Programme Leader in Statistical Genomics. He holds a joint appointment between the Department of Statistics and the Nuffield Department of Medicine, University of Oxford. He is an Affiliate Member of the Big Data Institute, Li Ka Shing Centre for Health Informatics and Discovery, Oxford, and a faculty fellow of The Alan Turing Institute, London. He serves on the MRC’s Expert Panel in Stratified Medicine. His research interests surround the theory, methods, and applications of statistics to medical research. Particular interests are in Bayesian decision analysis, statistical machine learning, and model misspecification within stratified medicine.
Louise Ryan, University of Technology, Sydney, Australia
Simple statistical strategies for the analysis of very large datasets
The biostatistics profession has seen a lot of disruptive change in the past decade as a result of the “big data” revolution. New specialties such as machine learning, AI, data science and analytics have emerged, leaving us feeling sometimes like the poor second cousins from the country. In this presentation, I will offer some perspectives on the changing landscape for biostatistical science and what we can do to strengthen our role in the data science arena. Drawing on some recent collaborations, I’ll describe some strategies for the analysis of very large datasets that are simple, yet grounded in sound statistical practice. I’ll finish up with some thoughts about how we should think about training the next generation as well as up-skilling the current generation of statisticians.
After completing her undergraduate degree in statistics and mathematics at Macquarie University, Louise Ryan left Australia in 1979 to pursue her PhD in statistics at Harvard University in the United States. In 1983, Louise took up a postdoctoral fellowship in Biostatistics, jointly between Dana-Farber Cancer Institute and the Harvard School of Public Health. She was promoted to Assistant Professor in 1985, eventually becoming the Henry Pickering Walcott Professor and Chair of the Department of Biostatistics at Harvard. Louise returned to Australia in early 2009 to take up the role as Chief of CSIRO’s Division of Mathematics, Informatics and Statistics. In 2012, she joined UTS as a distinguished professor of statistics in the School of Mathematical Sciences. Louise is well known for her contributions to the development of statistical methods for cancer and environmental health research. She is loves the challenge and satisfaction of multi-disciplinary collaboration and is passionate about training the next generation of statistical scientists.
Natalie Shlomo, University of Manchester, Manchester, United Kingdom
Statistical Disclosure Control: Where Do We Go From Here?
This talk will start with an overview of the traditional statistical disclosure control (SDC) framework implemented at statistical agencies for standard outputs, including types of disclosure risks, how disclosure risk and information loss are quantified, and some common SDC methods. In recent years, we have seen the digitisation of all aspects of our society leading to new and linked data sources offering
unprecedented opportunities for research and evidence-based policies. These developments have put pressure on statistical agencies to provide broader access to their data. On the other hand, with detailed personal information easily accessible from the internet, traditional SDC methods for protecting individuals from reidentification may no longer be sufficient and agencies are relying more on
restricting and licensing data. One disclosure risk that has largely been ignored by statistical agencies up till now is known as inferential disclosure where confidential information may be revealed exactly or to a close approximation. This type of disclosure risk may be present whether the individual is included in the database or not. With strict control of the data and release of outputs, statistical agencies traditionally have not focused on this type of disclosure. However, with increasing demands for more open and
accessible data, statistical agencies now need to consider new strategies of dissemination and are revisiting their intruder scenarios, types of disclosure risks and more rigorous data protection mechanisms. One such mechanism is Differential Privacy (Dwork, et al. 2006), a mathematically principled method of measuring how secure a protection algorithm is with respect to
personal data disclosures. It incorporates all traditional disclosure risks and inferential disclosure in a ‘worst-case’ scenario. Statisticians have now been investigating the possibilities of incorporating Differential Privacy in their SDC framework, especially for new dissemination strategies which include web-based applications where outputs are generated and protected on-the-fly without the need for human intervention to check for disclosure risks. We discuss other dissemination strategies and the potential for Differential Privacy to provide privacy guarantees.
Natalie Shlomo is Professor of Social Statistics at the School of Social Sciences, University of Manchester. Prior to that she was on faculty at the University of Southampton and a methodologist at the Israel Central Bureau of Statistics. She is a survey statistician with interests in survey design and estimation, record linkage, statistical disclosure control, statistical data editing and imputation and small area estimation. Natalie is an elected member of the International Statistical Institute and currently serving as Vice President. She is also a fellow of the Royal Statistical Society and the International Association of Survey Statisticians. She is the methodology editor of the Journal of the International Association of Official Statistics and an associate editor of several journals including the International Statistical Review and the Journal of the Royal Statistical Society, Series A. She is a member of several national and international methodology advisory boards.
Susan Murphy, Harvard University, Boston, USA
Stratified Micro-Randomized Trials with Applications in Mobile Health
Technological advancements in the field of mobile devices and wearable sensors make it possible to deliver treatments anytime and anywhere to users like you and me. Increasingly the delivery of these treatments is triggered by detections/predictions of vulnerability and receptivity. These observations are likely to have been impacted by prior treatments. Furthermore the treatments are often designed to have an impact on users over a span of time during which subsequent treatments may be provided. Here we discuss our work on the design of a mobile health smoking cessation study in which the above two challenges arose. This work involves the use of multiple online data analysis algorithms. Online algorithms are used in the detection, for example, of physiological stress. Other algorithms are used to forecast at each vulnerable time, the remaining number of vulnerable times in the day. These algorithms are then inputs into a randomization algorithm that ensures that each user is randomized to each treatment an appropriate number of times per day. We develop the stratified micro-randomized trial which involves not only the randomization algorithm but a precise statement of the meaning of the treatment effects and the primary scientific hypotheses along with primary analyses and sample size calculations. Considerations of causal inference and potential causal bias incurred by inappropriate data analyses play a large role throughout.
Susan A. Murphy is Professor of Statistics, Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences and Radcliffe Alumnae Professor at the Radcliffe Institute at Harvard University. Her lab focuses on improving sequential, individualized, decision making in health, in particular on clinical trial design and data analysis to inform the development of personalized just-in-time adaptive interventions in mobile health. Her work is funded by the National Institutes of Health, USA.
Susan is a Fellow of the Institute of Mathematical Statistics, a Fellow of the College on Problems in Drug Dependence, a former editor of the Annals of Statistics, a member of the US National Academy of Sciences, a member of the US National Academy of Medicine and a 2013 MacArthur Fellow.
Thomas Lumley, University of Auckland, Auckland, New Zealand
Validation sampling for large health databases
Health databases will typically have some important variables that are measured inaccurately, are not quite the right variable for the analysis, or require substantial effort to code into their ideal forms. It may possible to take a validation sample of records and recode or re-measure the variables of interest more accurately, even when it is infeasible to do this for the whole database. There have been two broad classes of approach to analysing a validation sample: the measurement-error literature uses the sample to estimate the bias in a naive analysis and correct it; the sample survey literature fits a model to the validation sample and uses the rest of the database to increase precision of estimation. I will talk about ways to unify these approaches and the efficiency/robustness tradeoffs that complicate comparisons of different methods.
Thomas Lumley is Professor of Biostatistics at the University of Auckland, and Affiliate Professor of Biostatistics at the University of Washington. His research covers a wide range of topics in biostatistics, including genomics, the design and analysis of complex epidemiological studies, meta-analysis, and statistical computing and graphics. He writes about statistics in the media at statschat.org.nz.
Titles, abstracts and speaker details for all invited sessions are available here.