Statistics seminar
Title: Orthogonal Common-source and Distinctive-source Decomposition between High-dimensional Data Views
Speaker: Hai Shu, Biostatistics, NYU
Date and time:
08 Apr 2022,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Zoom link.
Abstract: Modern biomedical studies often collect multi-view data, that is, multiple types of data measured on the same set of objects. A typical approach to the joint analysis of two high-dimensional data views/sets is to decompose each data matrix into three parts: a low-rank common-source matrix that captures the shared information across data views, a low-rank distinctive-source matrix that characterizes the individual information within each single data view, and an additive noise matrix. Existing decomposition methods often focus on the orthogonality between the common-source and distinctive-source matrices, but inadequately consider the more necessary orthogonal relationship between the two distinctive-source matrices. The latter guarantees that no more shared information is extractable from the distinctive-source matrices. We propose a novel decomposition method that defines the common-source and distinctive-source matrices from the L2 space of random variables rather than the conventionally used Euclidean space, with a careful construction of the orthogonal relationship between distinctive-source matrices. The proposed estimators of common-source and distinctive-source matrices are shown to be asymptotically consistent and have reasonably better performance than some state-of-the-art methods in both simulated data and the real data analysis.
Title: COVID-19 Modelling: A new disease analytics framework
Speaker: Mathew Parker, Simon Fraser University
Date and time:
10 Mar 2022,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Abstract: Asymptomatic and paucisymptomatic presentations of COVID-19 along with restrictive testing protocols result in undetected COVID-19 cases. Estimating undetected cases is crucial to understand the true severity of the outbreak. We introduce a new hierarchical disease dynamics model based on the N-mixtures hidden population framework. The new models make use of three sets of disease count data per region: reported cases, recoveries, and deaths. Treating the first two as under-counted through binomial thinning, we model the true population state at each time point by partitioning the diseased population into the categories active, recovered, and died. Both domestic spread and imported cases are considered. These models are applied to estimate the level of under-reporting of COVID-19 in the Northern Health Authority region of British Columbia, Canada during thirty weeks of the provincial recovery plan. Parameter covariates are easily implemented, and used to improve model estimates. When accounting for changes in weekly testing volumes, we found under-reporting rates varying from 60.2% to 84.2%.
Zoom link.
Title: Clustering multivariate counts from modern biological datasets
Speaker: Sanjeena Dang, Carleton University
Date and time:
23 Feb 2022,
12:30pm -
1:30pm
Location: via Zoom
Read full description
Abstract: Multivariate count data are commonly encountered through high-throughput sequencing technologies in bioinformatics. Although the Poisson and negative binomial distributions are routinely used to model these count data, its multivariate extension is computationally expensive, thus restricting their use to small dimensional datasets. Hence, independence between genes is assumed in most cases and this fails to take into account the correlation between genes. Fitting such univariate models for multivariate analysis is not only biologically inappropriate, misspecifying the correlation (covariance) structure can result in poor fit to the data. Recently, we developed mixtures of multivariate Poisson lognormal (MPLN) models to analyze these multivariate count measurements. In the MPLN model, the observed counts, conditional on the latent variable, are modeled using a Poisson distribution and the latent variable comes from a multivariate Gaussian distribution. Due to this hierarchical structure, the MPLN model can account for over-dispersion and allows for correlation between the variables. We show that the univariate version of MPLN provides a similar fit to the widely used negative binomial distribution in terms of capturing the mean-variance trends of the RNA-seq data. Moreover, we developed a computationally efficient framework for parameter estimation for MPLN models that utilizes variational Gaussian approximation and opened the possibilities for extending these models for large datasets. Some recent and on-going extensions of MPLN will be briefly discussed.
Zoom link
.
Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization
Speaker: Yafei Wang, University of Alberta
Date and time:
16 Feb 2022,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Deep reinforcement learning (RL) has been widely used in a variety of challenging tasks, from game playing to robot navigation. However, sample inefficiency and slow convergence rate, i.e. the required number of interactions with the environment and training time is impractically high, remain challenging problems in RL. To address these issues, we propose a general acceleration method for deep RL algorithms built on Anderson mixing, which is an effective approach to accelerating the iterates of the fixed point problems. Specifically, we provide deeper insights into the acceleration schemes in policy iteration by establishing a connection between Anderson mixing and quasi-Newton methods and proving that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. The effectiveness of our proposed method is evaluated on a variety of Atari games. Experiment results show that our proposed method enhances the convergence, stability, and performance of state-of-the-art deep RL algorithms.
Join Zoom Meeting
https://uvic.zoom.us/j/87676075396
Title: Investigating the Relationship Between the Bayes Factor and the Separation of Credible Intervals
Speaker: Zhengxiao Wei, University of Victoria
Date and time:
19 Jan 2022,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Abstract: We examined the relationship between the Bayes factor and the separation of credible intervals in between- and within-subjects designs under a range of effect and sample sizes. For the within-subject case, we considered five intervals: (a) the within-subject confidence interval of Loftus and Masson (1994), (b) the within-subject Bayesian interval developed by Nathoo, Kilshaw, and Masson (2018), whose derivation conditions on estimated random effects, (c and d) two modifications of (b) based on a proposal by Heck (2019) to allow for shrinkage and account for uncertainty in the estimation of random effects, and (e) the standard Bayesian highest-density interval. We observed a clear and consistent relationship between the Bayes factor and the separation of credible intervals. Remarkably, this relationship is well described by a simple exponential curve and is most precise in case (d). We provide a benchmark to evaluate the Bayes factor when the credible intervals for two means touch at their boundaries. In contrast, interval (e) is relatively wide due to between-subjects variability and tends to obscure effects when used in within-subject designs, rendering its relationship with the Bayes factor less clear. We also provide an R package ‘rmBayes’ to enable computation of each of the within-subject credible intervals investigated here using a number of possible prior distributions. Joint work with Farouk Nathoo and Michael Masson.
Keywords: Bayes factor, credible intervals, within-subject designs, within-subject inference
Zoom link.
Title: Logistic spatio-temporal factor model with non-linear interactions for cluster analysis
Speaker: Vinicius Mayrink, Universidade Federal de Minas Gerais (UFMG), Brazil.
Date and time:
01 Dec 2021,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Abstract: In this study, we develop a factor model to explore areal data collected in space and time. The main goal is to incorporate the factor model with non-linear interactions (proposed in 2013) to handle a spatio-temporal random effect in the structure of the logistic regression. The spatial dependence between regions is established through the CAR model specified for each column of the loadings matrix. Temporal dependence is considered to associate the columns of the factor scores matrix. The presence of non-linear interactions is intended to improve cluster detection since new types of groups can emerge as a combination of main factor effects and the interaction effect. In terms of application, this study is motivated by the analysis of an electrocardiogram data set obtained between 2013 and 2016 in the Telessaude System of the Hospital das Clinicas at the Federal University of Minas Gerais in Brazil.
Join Zoom Meeting
https://uvic.zoom.us/j/87235877567
Title: Statistical Network Models for Integrating Functional Connectivity with sMRI and PET Brain Imaging Data
Speaker: James Wilson, University of Pittsburgh
Date and time:
24 Nov 2021,
1:00pm -
2:00pm
Location: via Zoom
Read full description
Abstract: Network analysis is one of the prominent multivariate techniques used to study structural and functional connectivity of the brain. In a network model of the brain, vertices are used to represent voxels or regions of the brain, and edges between two nodes represent a physical or functional relationship between the two incident regions. Network investigations of connectivity have produced many important advances in our understanding of brain structure and function, including in domains of aging, learning and memory, cognitive control, emotion, and disease. Despite their use, network methodologies still face several important challenges. In this talk, I will focus on a particularly important challenge in the analysis of structural and functional connectivity: how does one jointly model the generative mechanisms of structural and functional connectivity with other modalities? I propose and describe a statistical network model, called the generalized exponential random graph model (GERGM), that flexibly characterizes the network topology of structural and functional connectivity and can readily integrate other modalities of data. The GERGM also directly enables the statistical testing of individual differences through the comparison of their fitted models. In applying the GERGM to the connectivity of healthy individuals from the Human Connectome Project, we find that the GERGM reveals remarkably consistent organizational properties guiding subnetwork architecture in the typically developing brain. We will discuss ongoing work of how to adapt these models to neuroimaging cohorts associated with the ADRC at the University of Pittsburgh, where the goal is to relate the dynamics of structural and functional connectivity with tau and amyloid – beta deposition in individuals across the Alzheimer’s continuum.
Join Zoom Meeting
https://uvic.zoom.us/j/87235877567