Statistics seminar
Title: A Bayesian Approach to Response Optimization on Data with Multistratum Structure
Speaker: Professor Po Yang, University of Manitoba
Date and time:
17 Apr 2024,
2:00pm -
3:00pm
Location: DSB C114
Read full description
Abstract: Response optimization is a process of identifying the input variable settings that optimize the response. Multistratum design arises naturally in industrial experiments due to the inconvenient and impractical completely randomization. Accounting for the model uncertainty, we apply the Bayesian model averaging method and predictive approach to investigate the optimization problem for data with multi-stratum structure. With the posterior probabilities of models as weights, we consider the weighted average of the predictive densities of the response over all potential models. The goal of the optimization is to identify the values of the factors that result in a maximum probability of a response in a given range. The method is illustrated with two examples.
Title: PIMS Data Science Seminar: Data thinning to avoid double dipping
Speaker: Lucy Gao, University of British Columbia
Date and time:
10 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 5th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract
"Double dipping" is the practice of using the same data to fit and validate a model. Problems typically arise when standard statistical procedures are applied in settings involving double dipping. To avoid the challenges surrounding double dipping, a natural approach is to fit a model on one dataset, and then validate the model on another independent dataset. When we only have access to one dataset, we typically accomplish this via sample splitting. Unfortunately, in some problems, sample splitting is unattractive or impossible. In this talk, we are motivated by unsupervised problems that arise in the analysis of single cell RNA sequencing data, where sample splitting does not allow us to avoid double dipping. We first propose Poisson thinning, which splits a single observation drawn from a Poisson distribution into two independent pseudo-observations. We show that Poisson count splitting allows us to avoid double dipping in unsupervised settings. We next generalize the Poisson thinning framework to a variety of distributions, and refer to this general framework as "data thinning". Data thinning is applicable far beyond the context of single-cell RNA sequencing data, and is particularly useful for problems where sample splitting is unattractive or impossible.
Speaker bio
Website Link: https://www.lucylgao.com/
Title: PIMS Data Science Seminar: Functional Nonlinear Learning
Speaker: Jiguo Cao, Simon Fraser University
Date and time:
05 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 6th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, e.g., functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this paper proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of representations in predicting curve labels. Hence, representations from FunNoL can be used for both curve reconstruction and classification. Additionally, we have endowed the proposed model with the ability to address the missing observation problem as well as to further denoise observations. The resulting representations are robust to observations that are locally disturbed by uncontrollable random noises. We apply the proposed FunNoL method to several real data sets and show that FunNoL can achieve better classifications than FPCA, especially in the multivariate functional data setting. Simulation studies have shown that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.
Title: PIMS Data Science Seminar - Representation Learning in Large-scale, Heterogeneous Single-cell Genomics
Speaker: Dr. Lin Zhang, Simon Fraser University
Date and time:
15 Mar 2024,
2:00pm -
3:00pm
Location: Cornett B107 and Zoom
Read full description
Zoom link: https://uvic.zoom.us/j/84724365947?pwd=OTRzYml3a29oeTJkNW5ucjhUWmxpdz09
This is our 4th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract: Single-cell omics data play a pivotal role in identifying cell-to-cell heterogeneity, understanding cell differentiation, unveiling cell population structures, and ultimately deciphering disease pathogenesis. Due to inherent high-dimensionality, sparsity, noise, and high correlation of single cell data, machine learning (ML) models, known for its assumption-free flexibility, scalability, and predictive power, have surged in analyzing single cell data to address these challenges. In this talk, I will present some of our recent work on ML-based approaches to accurately and efficiently encode single-cell gene expressions and chromatin accessibility. Our proposed method OCAT, One Cell At A Time, is a ML-based method that sparsely encodes single-cell gene expressions to integrate data from heterogeneous sources without highly variable gene selection or explicit batch effect correction (Wang et al., Genome Biology, 2022). We have demonstrated that OCAT efficiently integrates multiple heterogeneous scRNA-seq datasets and achieves the state-of-the-art performance in cell type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT can efficaciously facilitate a variety of downstream analyses, such as differential gene analysis, trajectory inference, pseudo time inference and cell type inference. OCAT has proven its efficiency and accuracy in characterizing the transcriptomic difference between healthy and diseased kidney samples (McEvoy et al., Nature Communications, 2022). We have further developed OCAT2 that maps multiple complementary single-cell omics to the same domain through multi-modal diffusion mapping. We have demonstrated its accuracy and high computational efficiency on integrating real multi-omics datasets.
Title: PIMS Data Science Seminar: Functional spherical autocorrelation: robust autocorrelation estimation of a functional time series
Speaker: Chi-Kuang Yeh, University of Waterloo
Date and time:
23 Feb 2024,
2:00pm -
3:00pm
Location: Cornett B107 and Zoom
Read full description
Zoom link.
PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract:
Measuring the serial dependence across time is critical in model identification and diagnosis in time series (TS) analysis. In classic TS analysis, the autocorrelation function is perhaps the most widely used method to examine the temporal relationship of the scalar or vector-valued observations. In functional TS (FTS), which refers to TS of functional data, their dependence is best summarised by an autocovariance operator. Evaluating the size and information contained in such an object can be difficult. Existing methods are relatively constrained and unable to capture certain characteristics contained in the FTS objects, such as the "direction" of dependence. We develop a new method to address this problem by projecting lagged pairs unit sphere and computing the angle between them, which we refer to as spherical autocorrelation. We establish the asymptotic properties of the empirical spherical autocorrelation, and we study its use in an application to European electricity data.