Invited Sessions



Analysis for highly complex data structures

Organiser: Aurore Delaigle – University of Melbourne

Stephan Huckeman – University of Göttingen, Germany
Non-Euclidean Statistics and Applications
B. Eltzner1
1 Georg-August-Universität Gottingen
We consider some generalizations of basic statistical data descriptors, like means and principal components, for data that come with an inherent non-Euclidean topological/geometric structure. For these non-Euclidean data descriptors we explore estimation, nesting, as well as their asymptotics, which may exhibit phenomena unknown to the Euclidean setting. Careful choice of data descriptors allows for new insights in RNA structure analysis and adult stem cell differentiation.

Janice Scealy – Australian National University, Canberra
Scaled von Mises-Fisher distributions and regression models for palaeomagnetic directional data
A. Wood2
1 ANU, 2 University of Nottingham
We propose a new distribution for analysing palaeomagnetic directional data that is a novel transformation of the von Mises-Fisher distribution. The new distribution has ellipse-like symmetry, as does the Kent distribution; however, unlike the Kent distribution the normalising constant in the new density is easy to compute and estimation of the shape parameters is straightforward. To accommodate outliers, the model also incorporates an additional shape parameter which controls the tail-weight of the distribution. We also develop a general regression model framework that allows both the mean direction and the shape parameters of the error distribution to depend on covariates. To illustrate, we analyse palaeomagnetic directional data from the GEOMAGIA50.v3 database. We predict the mean direction at various geological time points and show that there is significant heteroscedasticity present.

Adrian Baddeley – Curtin University, Western Australia
Analysing spatial patterns of events on a network
1 Curtin University
Events that occur at random locations along a network, such as road traffic accidents, pose some very challenging problems for statistical methodology. For example it is unclear how to define a “hot spot” or “black spot” for road accidents, because different parts of the road network have different geometry as well as different accident records. Standard methods for spatial data analysis often assume a stationary random process; this concept is not applicable in a network, where the geometry itself is varying from place to place. Nothing less than a complete overhaul of spatial statistics will make it possible to analyse such data in a satisfactory and defensible way. This talk will give an overview of techniques recently developed by statisticians, geographers and others for dealing with point patterns of events along a network. They include kernel smoothing along a network,corrections for the network geometry, and point process modelling.

Deep time-to-event analysis

Organiser: Nadine Binder – University of Freiburg, Germany

Per Kragh Andersen – Section of Biostatistics, University of Copenhagen, Denmark
(Deep) survival analysis: prediction, understanding and causal inference
1 Biostatistics, University of Copenhagen
A statistical analysis may serve a number of different purposes, e.g. to be able to predict the relevant outcome in future subjects or to understand the way in which certain variables are possibly related. The topic of causal inference falls under the second heading. One given modelling approach may not be well suited to meet all such different purposes and should, obviously, be chosen with the purpose in mind. We will discuss these topics in the frameworks of ‘traditional’ survival analysis and ‘deep’ survival analysis, and while deep survival analysis, obviously, seems well suited for prediction we will show that it may also have a role to play for causal inference in survival analysis. We will also discuss how so-called pseudo observations may be useful for using ‘standard’ statistical techniques to right censored survival data.

Harald Binder – Institute for Medical Biometry and Statistics, University of Freiburg, Germany
Combining deep generative models with statistical testing for data with time structure
1 Institute of Medical Biometry and Statistics, Medical Center – University of Freiburg
Deep learning has been successful in applications with image data, and also for data with sequential structure, such as in language processing. Yet, there are still few biomedical applications of deep learning with potentially high-dimensional molecular data and time structure. I will specifically consider two applications from oncology, the first with high-dimensional baseline measurements and a survival endpoint, the second with repeated gene expression measurements. In both scenarios, the primary aim is not prediction, where deep learning is known to excel, but identification of novel patterns. To obtain the latter, I will demonstrate how deep learning can be combined with statistical testing. Specifically, deep Boltzmann machines, as an unsupervised, generative model approach, are used to learn the joint distribution of measurements. A statistical testing approach then links the patterns represented by the Boltzmann machine to the time-to-event endpoint of interest. I will discuss how type 1 error control can be maintained in such a setting, using variable selection to pre-filter patterns to be tested, in combination with a permutation approach.

Truyen Tran – Center for Pattern Recognition and Data Analytics, Deakin University Geelong, Australia
Deep learning for episodic interventional data
1 Deakin University
Modern healthcare is ripe for disruption by AI. A game changer would be automatic understanding the health trajectories hidden in electronic medical records, which are being collected for billions of people worldwide. The data is episodic by nature: we observe a burst of activities when a patient encounters clinical services, and the time gap between two consecutive visits is irregular. These healthcare processes are complicated by the interaction between at least three dynamic components: the illness which involves multiple diseases, the care which involves multiple treatments, and the recording practice which is biased and erroneous. We propose end-to-end recurrent models that read medical records and predict future risk. The model adopts the algebraic view in that discrete medical objects are embedded into continuous vectors lying in the same space. We formulate the problem as modeling sequences of sets, a novel setting that have rarely, if not, been addressed. The bag of diseases recorded at each clinic visit is modeled as function of sets. The same hold for the bag of treatments. The interaction between the disease bag and the treatment bag at a visit is modeled in several ways. Finally, the health trajectory, which is a sequence of visits, is modeled using a recurrent neural network, upon which attention mechanisms are imposed to better focus on high risk visits. We report results on over a hundred thousand hospital visits by patients suffered from two costly chronic diseases – diabetes and mental health. The results show promises in multiple predictive tasks such as readmission prediction, treatments recommendation and diseases progression.

Recent developments and current challenges in statistical genomics

Organisers: Stephen Leslie – University of Melbourne, Damjan Vukcevic – University of Melbourne, David Balding – University of Melbourne

Augustine Kong – Oxford University Big Data Institute
Selection against gene variants associated with educational attainment
1 Oxford University Big Data Institute
Given that, in many populations, individuals with higher educational attainment tend to have fewer children, it should not be a surprise that gene variants associated with educational attainment are under negative selection. Using population-scale data from Iceland, we show that not only is the latter true, the selection force is substantially stronger than if it is manifested entirely through educational attainment: e.g., among individuals who have the same amount of education, those with a higher genetic propensity score tend to have fewer children. This applies to both men and women, but the selection force is stronger with women. In particular, women with a higher genetic propensity score tend to have children later, and as a result have fewer children overall. Indeed, women with a higher genetic propensity actually have more children later in life, but that is not enough to compensate for the deficit accrued from the early part of their reproductive life. While the actual decline in genetic propensity in the population might appear modest, if this selection continues for a few centuries, which is a blink of the eye in evolutionary time, the effect is far from negligible. It is noted that results in this area of research could often be contentious, and thus there is a high bar for proper data and rigorous statistical analyses.

Melanie Bahlo – Walter and Eliza Hall Institute for Medical Research, Melbourne
Detecting relatedness with two different genetic markers using whole genome sequencing data from malaria-causing Plasmodium species
1 The Walter and Eliza Hall Institute of Medical Research
Plasmodium vivax and Plasmodium falciparum are the two malaria-causing species in humans that cause the highest rates of mortality and morbidity. These two species’ genomes are <1/10thof the human genome in size, with both ~20 million base-pairs (A, G, C, Ts) in length. Both species recombine, but unlike humans have predominantly haploid lifecycles. In human genomic studies identity by descent (IBD) methods are frequently used to localize genomic regions containing trait-influencing mutations. IBD methods make use of recombination events, which are often modelled with first-order hidden Markov models. IBD methods can also be used to determine genetic relatedness between samples. This is performed for pairs of samples. Applications of these methods in malaria are more complex than in human studies because plasmodium DNA is often isolated from patients’ blood, which, in regions where malaria is common, can contain more than one clone (sample of plasmodium), of unequal and unknown proportions. Whole genome sequencing is now commonly applied to plasmodium. We have recently developed a new IBD method to detect relatedness in Plasmodium (Henden et al, PLOS Genetics, 2018) that allows for mixed clone samples, known as isolates. This IBD method makes use of single nucleotide polymorphisms (SNPs), which have binary states, and have many thousands of instances throughout both genomes. Short tandem repeat (STR) markers are a different type of genetic marker to SNPs, having many more states than SNPs and are often more informative than SNPs, but are also less frequent than SNPs. We have recently adapted a human STR calling method, implemented as an Expectation-Maximization algorithm, to plasmodium. We compare the ability of STRs and SNPs, determined from whole genome sequencing, to detect relatedness.

Alexei Drummond – Professor of Computational Biology, University of Auckland
Inferring Species Trees Using Integrative Models of Species Evolution
H. Ogilvie1 T. Vaughan2, N. Matzke3, G. Slater4, T. Stadler2, D. Welch3
1 Rice University, 2 ETH Zurich, 3 University of Auckland, 4 University of Chicago
Bayesian methods can be used to accurately estimate species tree topologies, ancestral divergence times and other parameters, but only when the models of evolution sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived calibration priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We show that these biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new package developed for the BEAST2 phylogenetic software platform.

This session is supported by the School of Mathematics and Statistics of The University of Melbourne.

Recent developments in statistical precision medicine

Organiser: Bibhas Chakraborty – Duke-NUS Medical School, Singapore

Eric Laber – North Carolina State University, Raleigh, NC, USA
Sample size considerations for precision medicine
1 North Carolina State University
Sequential Multiple Assignment Randomized Trials (SMARTs) are considered the gold standard for estimation and evaluation of treatment regimes. SMARTs are typically sized to ensure sufficient power for a simple comparison, e.g., the comparison of two fixed treatment sequences. Estimation of an optimal treatment regime is conducted as part of a secondary and hypothesis-generating analysis with formal evaluation of the estimated optimal regime deferred to a follow-up trial. However, running a follow-up trial to evaluate an estimated optimal treatment regime is costly and time-consuming; furthermore, the estimated optimal regime that is to be evaluated in such a follow-up trial may be far from optimal if the original trial was underpowered for estimation of an optimal regime. We derive sample size procedures for a SMART that ensure: (i) sufficient power for comparing the optimal treatment regime with standard of care; and (ii) the estimated optimal regime is within a given tolerance of the true optimal regime with high-probability. We establish asymptotic validity of the proposed procedures and demonstrate their finite sample performance in a series of simulation experiments.

Susan Shortreed – Kaiser Permanente Washington Health Research Institute, Seattle, USA
Using electronic health records to target suicide prevention care
G. Simon1, E. Johnson1, R. Ziebell1, B. Ahmedani2, A. Beck3, J. Lawrence4, F. Lynch5,
R. Penfold1 and R. Rossom6
1 Kaiser Permanente Washington Health Research Institute, 2 Henry Ford Health
System, 3 Kaiser Permanente Colorado Institute for Health Research, 4 Kaiser
Permanente Southern California, 5 Kaiser Permanente Northwest Center for Health
Research, 6 HealthPartners Institute
Worldwide nearly 800,000 people die by suicide each year. In the US, about 45,000 die by suicide and non-fatal suicide attempts result in approximately 500,000 emergency department visits annually. Effective suicide prevention interventions exist, but are often resource intensive. Successful identification of those at increased risk for suicide attempt and death makes implementing suicide prevention
interventions on a large scale feasible. Electronic health records (EHRs) combined with large administrative data bases, including diagnoses and billing codes, contain vast amounts of information on the health care patients have sought and received in real medical settings. We will present work that uses EHR data to identify individuals at risk of suicide. Using least absolute shrinkage and selection operator, we selected important predictors of suicide attempt (fatal and non-fatal) from 350 potential predictors that were curated using scientific knowledge. This analysis included 19.6 million visits made by 2.9 million people. We will highlight the statistical and computational challenges that we faced conducting scientific research using EHR data on millions of patients. This project illustrates the potential for using EHR data to advance medicine and identify individuals at increased risk of suicide in order to better target care.

Ashkan Ertefaie – University of Rochester, New York, USA
Selective inference for dynamic treatment regimes using the LASSO
1 University of Rochester
Constructing an optimal dynamic treatment regime becomes complex when there are large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history over time. Existing methods only focus on selecting the important variables for the decision-making process and fall short in providing inference for the selected model. We fill this gap by leveraging the conditional selective inference methodology. We show that the proposed method is asymptotically valid given certain rate assumptions in semiparametric regression.

Robust Bayesian inference

Organiser: Chris Drovandi – Queensland University of Technology (& SSA Bayesian Statistics), Chris Holmes – University of Oxford

Natalia Bochkina – University of Edinburgh, Scotland
Robustness of Bayesian inference for nonregular constrained ill-posed models
P. Green2
1 University of Edinburgh, 2 University of Technology Sydney
We consider a broad class of statistical models that can be misspecified and ill-posed, from a Bayesian perspective. This provides a flexible and interpretable framework for their analysis, but it is important to understand robustness of the chosen Bayesian model and its effect on the resulting solution, especially in the ill-posed case where in the absence of prior information the solution is not unique. Compared to earlier work about the Bernstein-von Mises theorem for nonregular well-posed Bayesian models, we show that non-identifiable part of the likelihood, together with the constraints on the parameter space, introduce a more complex geometric structure of the posterior distribution around the best reconstruction point in the limit, and provide a local approximation of the posterior distribution in this neighbourhood. The results apply to misspecified models which allows, for instance, to evaluate the effect of model approximation on statistical inference. Emission tomography is taken as a canonical example for study, but our results hold for a wider class of generalised linear inverse problems with constraints.

Jeffrey Miller – Harvard University, Boston, USA
Robust inference using power posteriors: Calibration and inference
D. Dunson2
1 Harvard University, 2 Duke University
Small departures from model assumptions can lead to misleading inferences, especially as data sets grow large. Recent work has shown that robustness to small perturbations can be obtained by using a power posterior, which is proportional to the likelihood raised to a certain fractional power, times the prior. In many models, inference under a power posterior can be implemented via minor modifications of standard algorithms, however, mixture models present a particular challenge requiring new algorithms. We have found a simple and scalable algorithm that yields results very similar to the power posterior for mixture models, by modifying the standard Gibbs sampling algorithm to use power likelihoods for only the mixture parameter updates. Another challenge in the practical implementation of power posteriors is how to choose the power appropriately. We present a data-driven technique for choosing the power in an objective way to obtain robustness to small perturbations. We illustrate with real and simulated data, including an application to flow cytometry clustering.

David Frazier – Monash University, Melbourne
Model Misspecification in Approximate Bayesian Computation: Consequences and Diagnostics
C. Robert2 and J. Rousseau3
1 Monash University, 2 Universite Paris Dauphine PSL, 3 Oxford University
We analyse the behaviour of approximate Bayesian computation (hereafter, ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that if the model is misspecified different versions of ABC will lead to substantially different results. We derive theoretical results which demonstrate that, under regularity conditions, a version of the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, it turns out that under model misspecification the accept/reject ABC posterior has non-standard asymptotic shape, i.e., it is not asymptotically Gaussian, and thus does not yield meaningful expressions of parameter uncertainty. In addition to these results, we also examine the theoretical behaviour of the popular linear regression adjustment to ABC under model misspecification, and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than that obtained by the accept/reject approach to ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.

Spatio-temporal statistics in the environmental sciences

Organiser: David Warton – UNSW & SSA Environmental Statistics

Hsin-Cheng Huang – Academia Sinica, Taiwan
Spatio-Temporal Analysis of Particulate Matter in Taiwan
G. Huang2, W. Hwang3 and L. Chen4
1 Institute of Statistical Science, Academia Sinica, 2 Institute of Statistics, National Tsing Hua University, 3 Institute of Statistics, National Chung Hsing University, 4 Institute of Information Science, Academia Sinica Fine particulate matter (PM2.5) has gained increasing attention due to its adverse health effects to human. In Taiwan, it was conventionally monitored by large environmental monitoring stations of the Environmental Protection Administration. However, only a small number of 77 monitoring stations are currently established. Recently, a project using a large number of small sensing devices, called AirBoxes, was launched in March 2016 to monitor PM2.5 concentrations. Although thousands of AirBoxes have been deployed across Taiwan to give a broader coverage, they are mostly located in big cities, and their measurements are less accurate. In this research, we propose a spatial prediction method to combine these two types of data. We also introduce a spatio-temporal model for PM2.5 forecast at any location in Taiwan. In addition, we develop a spatio-temporal control chart that monitors anomalous measurements.

Yan Wang – RMIT University, Melbourne
Understanding the connections between some species distribution models
L. Stone1
1 RMIT University
Models for accurately predicting species distributions have become essential tools for many ecological and conservation problems. For many species, presence-background (presence-only) data is the most commonly available type of spatial data. A number of important methods have been proposed to model presence-background (PB) data, and there have been debates on the connection between these seemingly disparate methods. The paper begins by studying the close relationship between the LI (Lancaster and Imbens,1996) and LK (Lele and Keim, 2006; Royle et al., 2012) models, which were among the first developed methods for analysing PB data. The second part of the paper identifies close connections between the LK and point process models, as well as the equivalence between the Scaled Binomial (SB), Expectation-Maximization (EM), partial likelihood based Lele (2009) and LI methods, many of which have not been noted in the literature. We clarify that all these methods are the same in their ability to estimate the relative probability (or intensity) of presence from PB data; and the absolute probability of presence, when extra information of the species’ prevalence is known. A new unified constrained LK (CLK) method is also proposed as a generalisation of the better known existing approaches, with less theory involved and greater ease of implementation.

Francis Hui – Australian National University
Spatio-temporal Latent Variable Models: A Potential Waste of Space and Time?
N. Hill2 and A. Welsh1
1 The Australian National University, 2 Institute for Marine and Antarctic Studies
In recent years, generalized linear latent variable models (GLLVMs) have gained popularity in community ecology, where they are used to model the environmental factors driving changes in species assemblages, while accounting for potential spatial and/or temporal as well as between species correlations. This paper is motivated by the Southern Ocean Continuous Plankton Recorder (SO-CPR)survey, an international longitudinal survey focused on studying marine assemblages in the Indian sector of the Southern Ocean. When modeling spatial-temporal community ecology data using GLLVMs,it is becoming common to explicitly include a spatial-temporal correlation function (or some variation thereof) in the covariance structure of the latent variables, as opposed to making the standard assumption of independence. While logical, moving away from independence produces a substantial increase in computation, irrespective of the estimation method used. Motivated by the SO-CPR survey, we set out to study whether, given the computational benefits, there are aspects of inference for GLLVMs which are robust to deliberately misspecifying and assuming independence for the latent variable covariance structure. We focus mainly on estimation and inference of the environmental covariates and prediction of the latent variables, as we explore the impact of misspecification (assuming independence) in the presence of spatial-temporal correlations.

Statistical education - engaging future statisticians

Organiser: Peter Howley – SSA Statistical Education Section, Louise Ryan – University of Technology Sydney

Deborah Nolan – University of California, Berkeley
How can data science improve statistics education?
1 University of California, Berkeley
Students are flocking to the field of data science, yet many of them still say statistics is boring. Of course, we could simply add “data science” to our course titles, go about business as usual, and hope that solves the problem. But, the students will figure it out. It’s time to move our teaching methods away from canned data, code recipes, and the normal curve. By embracing data science, students can work more closely with real-world data, engage in authentic problem solving, and learn how to use statistics to make a difference. The advent of data science brings a fantastic opportunity to improve statistics education and attract more students to the field. At UC Berkeley, we have long been innovating in our statistics curriculum, but only in the past three years have computer science and statistics faculty collaborated to design courses. This year nearly 3000 students enrolled in our two new co-developed and co-taught introductory data science courses. The official major launches in the fall, and one in three undergraduates have indicated they want to major or minor in data science. In this talk, I hope to convey some of the lessons learned from developing this new major and reflect on how data science can help statistics education.

Chris Wild – University of Auckland, New Zealand
On gaining iNZights, having your cake and eating it too
1 University of Auckland
This is a session on “Statistical education – engaging future statisticians.” A customary precursor to engagement is a period of courtship or wooing. To appropriate a famous book title, it is all about “Getting to Yes”. Ways of wooing students include creating as many “Aha!” moments as possible as seamlessly as possible in the least time possible, and populating their imaginations with possibilities – possibilities for “what I can do with data and what data can do for me”. My big interest is in visualisation and analysis software as an enabler of these things. Coding solutions (like R) slow down the rate at which students can experience what you can do with data, but an ability to use coding solutions is where we ultimately want to end up. So this talk will show how the iNZight system offers free and rapid exploration even for beginners – to facilitate speed-dating and the early phases of courtship. But by virtue of its writing R code and R Markdown documents, it also provides a vehicle for transitioning them into both coding and responsible practices like reproducible workflows.

Peter Howley – University of Newcastle, Australia
Inspiring future statisticians!…or at least making statistics-haters outliers
1 The University of Newcastle
As Tukey famously observed, statisticians “get to play in everyone’s backyard”. It is this diversity and endless potential to contribute and collaborate in and across all fields, coupled with the associated creativity and empowerment the profession engenders, which provides many a statistician a sense of infinite opportunity and fulfilment. One of the more striking aspects however, is the gap between this reality and the perception of statistics held by school students who are exposed to the more descriptive nature, and mathematical undertones, of statistics. The more broadly appealing creative, investigative and collaborative aspects of statistics is often lost. This presentation describes national and international initiatives attempting to arrest this situation; in particular, the National Schools Poster Competition delivered in Australia (winner of the 2017 International Statistical Institute’s Best Cooperative Project Award and listed in the Chief Scientist’s STEM Progamme Index 2016), and the International Statistical Literacy Project. These initiatives facilitate interdisciplinary interest, interaction and investigation, engaging students from varied backgrounds and education levels. The national initiative develops key future workplace skills aligned with national school curriculum outcomes and motivates students by enabling them to take the lead, determine the context and self-assess. Significantly, students get to experience, albeit on a smaller scale, what statisticians practice. The presentation will outline the underlying model which connects industry, primary, secondary and tertiary educators, the many supporting resources available, and how you too can participate in supporting the generation of future statisticians.

Statistical learning methods for causal inference

Organiser: Margarita Moreno Betancur – Murdoch Children’s Research Institute & University of Melbourne

Stijn Vansteelandt – University of Ghent, Belgium
How to obtain valid tests and confidence intervals after confounder selection?
O. Dukes1 and V. Avagyan1
1 Ghent University, 2 London School of Hygiene and Tropical Medicine
The problem of how to best select variables for confounding adjustment forms one of key challenges in the evaluation of exposure or treatment effects in observational studies. Routine practice is often based on stepwise selection procedures that use hypothesis testing, change-in-estimate assessments or the lasso, which have all been criticised for – amongst other things – not giving sufficient priority to the selection of confounders. This has prompted vigorous recent activity in developing procedures that prioritise the selection of confounders, while preventing the selection of socalled instrumental variables that are associated with exposure, but not outcome (after adjustment for the exposure). A major drawback of all these procedures is that there is no finite sample size at which they are guaranteed to deliver treatment effect estimators and associated confidence intervals with adequate performance. This is the result of the estimator jumping back and forth between different selected models, and standard confidence intervals ignoring the resulting model selection uncertainty. In this talk, I will develop insight into this by evaluating the finite-sample distribution of the exposure effect estimator in linear regression, under a number of the aforementioned confounder selection procedures. I will then make a simple but generic proposal for generalised linear models, which overcomes this concern (under weaker conditions than competing proposals).

Marco Carone – University of Washington, Seattle, USA
Nonparametric doubly-robust inference on the average treatment effect
D. Benkeser2, M. van der Laan3 and P. Gilbert4,1
1 University of Washington, 2 Emory University, 3 University of California Berkeley, 4 Fred Hutchinson Cancer Research Center
In the past two decades, there has been considerable interest in so-called doubly-robust (DR) estimators of the average treatment effect. To construct such estimators, estimation of two nuisance parameters — the outcome regression and the propensity score — is generally required as an intermediate step. DR estimators derive their name from the fact that they are consistent if either of these two nuisance parameters is consistently estimated. In this talk, we will discuss the recent development of DR estimators that not only enjoy doubly-robust consistency but also allow the construction of confidence intervals and tests that are valid even when one of the nuisance parameters is inconsistently estimated. This innovation is particularly important when flexible estimation strategies (e.g., machine learning) are used, since valid robust inference can then be especially difficult to achieve. These new techniques provide an additional tool to support investigators in their efforts to derive robust scientific conclusions. The use and performance of these procedures will be illustrated numerically, and ongoing challenges will also be discussed.

Romain Neugebauer – Kaiser Permanente, California
Practical impact of data-adaptive estimation on the comparison of dynamic treatment regimens in diabetes care using EHR data
O. Sofrygin1,2, J. Schmittdiel1, P. O’Connor3 and M. van der Laan2
1 Kaiser Permanente Northern California, Division of Research, 2 University of California at Berkeley, Division of Biostatistics, 3 HealthPartners Institute
Background – Consistent estimation of causal effects relies on consistent estimation of nuisance parameters such as propensity scores. In practice, nuisance parameters are commonly evaluated based on arbitrarily specified parametric models. To alleviate the bias expected from this precarious analytic strategy, data-adaptive estimation approaches have been proposed but practitioners might view motivations for their applications as academic considerations that are inconsequential in practice. In this case study, we evaluate the extent to which data-adaptive estimation can improve causal inferences in comparative effectiveness research with large healthcare databases.
Methods – We present analyses of electronic health records (EHR) from a type 2 diabetes study of the effect of four adaptive treatment strategies on a time-to-event outcome. Inverse probability weighting and targeted minimum loss based estimation (TMLE) are implemented using both a model-based and an ensemble learning approach known as Super Learning (SL) to evaluate their nuisance parameters.
Results – We demonstrate that SL estimation of nuisance parameters can result in substantial bias reduction and efficiency gains compared to model-based estimation. We also illustrate the scalability of targeted learning (i.e., TMLE combined with SL) to evaluate the effect of multiple time-point interventions using granular EHR data.
Conclusion. Targeted learning can routinely be applied to improve causal evidence from large healthcare databases.

This session is supported by the Victorian Centre for Biostatistics (ViCBiostat).

Survey statistics

Organiser: Paul Schubert – Australian Bureau of Statistics & Stephen Horn, SSA Official Statistics Section

Natalie Shlomo – University of Manchester, UK, to deliver the E.K. Foreman Lecture
Statistical Disclosure Control: Where Do We Go From Here?
1 University of manchester
This talk will start with an overview of the traditional statistical disclosure control (SDC) framework implemented at statistical agencies for standard outputs, including types of disclosure risks, how disclosure risk and information loss are quantified, and some common SDC methods. In recent years, we have seen the digitisation of all aspects of our society leading to new and linked data sources offering unprecedented opportunities for research and evidence-based policies. These developments have put pressure on statistical agencies to provide broader access to their data. On the other hand, with detailed personal information easily accessible from the internet, traditional SDC methods for protecting individuals from re-identification may no longer be sufficient and agencies are relying more on restricting and licensing data. One disclosure risk that has largely been ignored by statistical agencies up till now is known as inferential disclosure where confidential information may be revealed exactly or to a close approximation. This type of disclosure risk may be present whether the individual is included in the database or not. With strict control of the data and release of outputs, statistical agencies traditionally have not focused on this type of disclosure. However, with increasing demands for more open and accessible data, statistical agencies now need to consider new strategies of dissemination and are revisiting their intruder scenarios, types of disclosure risks and more rigorous data protection mechanisms. One such mechanism is Differential Privacy (Dwork, et al. 2006), a mathematically principled method of measuring how secure a protection algorithm is with respect to personal data disclosures. It incorporates all traditional disclosure risks and inferential disclosure in a ‘worst-case’ scenario. Statisticians have now been investigating the possibilities of incorporating Differential Privacy in their SDC framework, especially for new dissemination strategies which include web-based applications where outputs are generated and protected on-the-fly without the need for human intervention to check for disclosure risks. We discuss other dissemination strategies and the potential for Differential Privacy to provide privacy guarantees. Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In 3rd IACR Theory of Cryptography Conference 265-284.

Second speaker to be confirmed.

STRengthening Analytical Thinking for Observational Studies: Contributions of the STRATOS initiative to analyses of studies with measurement error and time-to-event data

Organisers: Michal Abrahamowicz – McGill University, Katherine Lee – Murdoch Children’s Research Institute, Willi Sauerbrei -University of Freiburg, Germany

Michal Abrahamowicz – McGill University, Montreal, Canada
Assessing non-linear and time-dependent effects of a sparsely measured time-varying covariate
Y. Wang1
1 McGill University
We illustrate the need for integrating of the work of different STRATOS Topic Groups (TG) to develop novel comprehensive methodology, using the example of modeling the effects of continuous time-varying covariates (TVC), measured only infrequently during the follow-up, on the hazard. Accurate modeling of the TVC effect, requires accounting for (i) possibly non-linear (NL) functional form of its association with log hazard (TG2: Functional Forms & Variable Selection), (ii) potential time-dependent (TD) effect i.e. changes over time in the strength of this association (TG8: Survival Analysis), and (iii) specific measurement errors induced when the previously observed TVC value is used as a ‘proxy’ for its un-observed current value (TG4: Measurement Errors). NL and TD effects are frequently reported for time-fixed covariates [Sauerbrei et al, Biom J 2007]. However, assessing the TVC effects is more complicated, especially if measurements are sparse [Andersen & Liesol, SIM 2003]. We propose a flexible model where hazard at time u, conditional on the most recently observed TVC value X(u*), is modeled as . g γ(.) and β(.) represent, the NL (non-linear dose-response) and the TD functions (change over time in the effect’s strength) [Wynant & Abrahamowicz SIM 2014]. γ (.) represents time elapsed since last observation (TEL=), h acting as an effect modifier. All three effects are estimated with regression splines, using 3-step Alternative Conditional Estimation algorithm. To enhance the clinical plausibility/relevance of the simulations, as suggested by the STRATOS Simulation Panel [Boulesteix et al, Biom J 2018], we simulated TVC histories based on the real-life repeated measurementsof systolic blood pressure (SBP) in the Framingham Heart Study (FHS). In simulations, the TD and NL estimates were accurate if the TVC was measured with high frequency, but biased if the measurements were sparse. In the latter case, the TEL estimate helped reduce the under-estimation bias. We re-analyzed the hazard of cardiovascular mortality/morbidity among women in FHS, with biennial TVC measurements of SBP and serum cholesterol, over >40 years of follow-up. We found NL and TD effects of both TVC‘s, with TEL estimates suggesting an immediate effect for cholesterol but a lagged effect for SBP.

Terry Therneau – Mayo Clinic, Rochester, Minnesota, USA
Survival models for observational studies: issues and recommendations.
1 Mayo Clinic
Statisticians and their customers often like to condense data into a single number summary (perhaps too much so), for survival data the most prevalent of theseis the hazard ratio from a proporional hazards (Cox) model. This convenient summary depends on several aspects of the data and model, however, many of which are commonly violated in observational data: a single dominant endpoint, non-informative censoring, appropriate patient selection and time scale, time-dependent covariates, proportional hazards, and model goodness of fit. This talk will about issues and recommedations of the current guidance documents under preparationby the STRATOS survival group (http://www.stratos-initiative.org/group_8), and where future work will be focused.

Victor Kipnis – National Cancer Institute, Bethesda, Maryland, USA
A new longitudinal measurement error model with application to physical activity assessment instruments in a large biomarker validation study
1 National Cancer Institute, 2 STRATOS TG-4
Systematic investigations into the structure of measurement error of different physical activity instruments are lacking. Whether existing instruments consist of objective measurements made by accelerometers or involve self-report on questionnaires or recalls, their measurement errors may contain bias as well as random variation. In lieu of observed true physical activity levels, to estimate those different error components, it is necessary to have some unbiased biomarker measurements such as those made by doubly labelled water (DLW). Existing measurement error models treat an individual’s level of physical activity as a fixed quantity over a long period of time. However, physical activity involves both short term (e.g., month-to-month) and long-term (over years) variation over time. We describe a longitudinal measurement error model that accounts for such variation and apply it to the analysis of data on physical activity energy intake from a large validation study of different physical activity instruments using DLW as reference measurements. We show that this time-varying measurement error model fits the data better than the one based on the long-term average physical activity assumption. Accounting for the time element in physical activity assessment is crucial to avoid biases in evaluation of the effects of measurement error.

Topical issues in cluster-randomised trials

Organiser: Andrew Forbes – Monash University, Melbourne

Karla Hemming, University of Birmingham, UK
Extending the I-squared statistic to describe treatment effect heterogeneity in cluster randomised trials
A. Forbes2
1 University of Birmingham, 2 University of Monash
Treatment effect heterogeneity is commonly investigated and allowed for in meta-analysis of treatment effects across different studies. The effect of the treatment might also vary across clusters in a cluster randomised trial, and it can be of interest to explore any treatment effect heterogeneity at the analysis stage. In stepped-wedge designs or other cluster randomized designs in which clusters are exposed to both treatment and control, this treatment effect heterogeneity can be identified. When conducting a meta-analysis it is common to describe the magnitude of any treatment effect heterogeneity using the I-squared statistic, which is an intuitive and easily understood concept. Here we derive a comparable measure of the description of the degree of heterogeneity in treatment effects across clusters.

Bruno Giraudeau, Université François Rabelais, Tours, France
Biases in cluster randomized trials
A. Caille1,2, C. Leyrat3, S. Kerry4 and S. Eldridge4
1 INSERM U1246, Tours, France, 2 Universite de Tours, France, 3 London School of
Hygiene and Tropical Medicine, UK, 4 Queen Mary University of London, UK Cluster randomized trials (CRTs) are trials in which clusters of individuals such as practices, hospitals or schools are randomized rather than individuals themselves. In many CRTs, clusters are randomized and only then are individuals identified and recruited. Because blinding is rarely possible in CRTs, such a situation favors baseline imbalance and is at risk of bias. A detailed description of the different steps (cluster recruitment, cluster randomization, individual recruitment, information delivered, etc.) is mandatory for an accurate assessment of this risk of bias. However, many reports lack details regarding the timing of trial processes and blinding. We developed a graphical tool, the Timeline cluster, depicting the time sequence of steps and blinding status in CRTs. This tool can help in both the planning stage and reporting the results of the trial. In parallel we were involved in an international workgroup to develop an extension of the new Cochrane Risk of Bias Tool (RoB Tool) for CRTs to help researchers involved in a systematic review assess the risk of bias of the CRTs selected for the review. A remaining issue is whether this sensitivity of CRTs to bias leads to a systematic difference in intervention effect estimates as compared with individually randomized trials assessing the same question. We conducted a meta-epidemiological study including 76 Cochrane meta-analyses with a binary outcome and 45 with a continuous outcome. For analyses of binary outcomes, we did not find any systematic difference in effect estimates between cluster and individually randomized trials. For continuous outcomes, the results were less clear, although accounting for trial sample sizes led to a non-significant difference. In the end, more research is needed, although to date, we have no evidence of a systematic difference in effect estimates from cluster and individually randomised trials.

Richard Hooper, Queen Mary University of London, UK
Optimal incomplete stepped wedge designs in continuous time
1 Queen Mary University of London
In a cluster randomised trial there may be a virtue in finding ways to reduce the total number of individual participants without sacrificing statistical power, for example by reducing the cluster size and increasing the number of clusters. In a stepped wedge design the most efficient way to do this is to concentrate recruitment within particular periods in particular clusters, leading to an ‘incomplete’ design. In designs with continuous recruitment there is a continuum of choices for switching recruitment on and off, and for scheduling the cross-over in a cluster. I consider designs with an upper limit on the rate of recruitment in any one cluster, and an upper limit on the total number of clusters. I assume a time effect modelled as a polynomial, and an intracluster correlation that is either constant or decays smoothly with time. By approximating continuous time with a model in which each cluster produces a potential recruit at regular (small) intervals, and by randomly sampling from the space of possible designs, I build up a picture of the relationship between sample size and precision, and identify designs along the optimal edge of this envelope. As recruitment approaches saturation the optimum converges, as expected, on a ‘hybrid’ between a classic stepped wedge and a parallel groups design. More incomplete designs have a staircase pattern as the optimum. Monte Carlo sampling from the design space may be a feasible approach to designing trials, but requires a sampling strategy weighted towards ‘non-random’ looking designs.