Statistics seminar
Title: A Bayesian Approach to Response Optimization on Data with Multistratum Structure
Speaker: Professor Po Yang, University of Manitoba
Date and time:
17 Apr 2024,
2:00pm -
3:00pm
Location: DSB C114
Read full description
Abstract: Response optimization is a process of identifying the input variable settings that optimize the response. Multistratum design arises naturally in industrial experiments due to the inconvenient and impractical completely randomization. Accounting for the model uncertainty, we apply the Bayesian model averaging method and predictive approach to investigate the optimization problem for data with multi-stratum structure. With the posterior probabilities of models as weights, we consider the weighted average of the predictive densities of the response over all potential models. The goal of the optimization is to identify the values of the factors that result in a maximum probability of a response in a given range. The method is illustrated with two examples.
Title: PIMS Data Science Seminar: Data thinning to avoid double dipping
Speaker: Lucy Gao, University of British Columbia
Date and time:
10 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 5th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract
"Double dipping" is the practice of using the same data to fit and validate a model. Problems typically arise when standard statistical procedures are applied in settings involving double dipping. To avoid the challenges surrounding double dipping, a natural approach is to fit a model on one dataset, and then validate the model on another independent dataset. When we only have access to one dataset, we typically accomplish this via sample splitting. Unfortunately, in some problems, sample splitting is unattractive or impossible. In this talk, we are motivated by unsupervised problems that arise in the analysis of single cell RNA sequencing data, where sample splitting does not allow us to avoid double dipping. We first propose Poisson thinning, which splits a single observation drawn from a Poisson distribution into two independent pseudo-observations. We show that Poisson count splitting allows us to avoid double dipping in unsupervised settings. We next generalize the Poisson thinning framework to a variety of distributions, and refer to this general framework as "data thinning". Data thinning is applicable far beyond the context of single-cell RNA sequencing data, and is particularly useful for problems where sample splitting is unattractive or impossible.
Speaker bio
Website Link: https://www.lucylgao.com/
Title: PIMS Data Science Seminar: Functional Nonlinear Learning
Speaker: Jiguo Cao, Simon Fraser University
Date and time:
05 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 6th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, e.g., functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this paper proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of representations in predicting curve labels. Hence, representations from FunNoL can be used for both curve reconstruction and classification. Additionally, we have endowed the proposed model with the ability to address the missing observation problem as well as to further denoise observations. The resulting representations are robust to observations that are locally disturbed by uncontrollable random noises. We apply the proposed FunNoL method to several real data sets and show that FunNoL can achieve better classifications than FPCA, especially in the multivariate functional data setting. Simulation studies have shown that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.
Title: PIMS Data Science Seminar - Representation Learning in Large-scale, Heterogeneous Single-cell Genomics
Speaker: Dr. Lin Zhang, Simon Fraser University
Date and time:
15 Mar 2024,
2:00pm -
3:00pm
Location: Cornett B107 and Zoom
Read full description
Zoom link: https://uvic.zoom.us/j/84724365947?pwd=OTRzYml3a29oeTJkNW5ucjhUWmxpdz09
This is our 4th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract: Single-cell omics data play a pivotal role in identifying cell-to-cell heterogeneity, understanding cell differentiation, unveiling cell population structures, and ultimately deciphering disease pathogenesis. Due to inherent high-dimensionality, sparsity, noise, and high correlation of single cell data, machine learning (ML) models, known for its assumption-free flexibility, scalability, and predictive power, have surged in analyzing single cell data to address these challenges. In this talk, I will present some of our recent work on ML-based approaches to accurately and efficiently encode single-cell gene expressions and chromatin accessibility. Our proposed method OCAT, One Cell At A Time, is a ML-based method that sparsely encodes single-cell gene expressions to integrate data from heterogeneous sources without highly variable gene selection or explicit batch effect correction (Wang et al., Genome Biology, 2022). We have demonstrated that OCAT efficiently integrates multiple heterogeneous scRNA-seq datasets and achieves the state-of-the-art performance in cell type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT can efficaciously facilitate a variety of downstream analyses, such as differential gene analysis, trajectory inference, pseudo time inference and cell type inference. OCAT has proven its efficiency and accuracy in characterizing the transcriptomic difference between healthy and diseased kidney samples (McEvoy et al., Nature Communications, 2022). We have further developed OCAT2 that maps multiple complementary single-cell omics to the same domain through multi-modal diffusion mapping. We have demonstrated its accuracy and high computational efficiency on integrating real multi-omics datasets.
Title: PIMS Data Science Seminar: Functional spherical autocorrelation: robust autocorrelation estimation of a functional time series
Speaker: Chi-Kuang Yeh, University of Waterloo
Date and time:
23 Feb 2024,
2:00pm -
3:00pm
Location: Cornett B107 and Zoom
Read full description
Zoom link.
PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract:
Measuring the serial dependence across time is critical in model identification and diagnosis in time series (TS) analysis. In classic TS analysis, the autocorrelation function is perhaps the most widely used method to examine the temporal relationship of the scalar or vector-valued observations. In functional TS (FTS), which refers to TS of functional data, their dependence is best summarised by an autocovariance operator. Evaluating the size and information contained in such an object can be difficult. Existing methods are relatively constrained and unable to capture certain characteristics contained in the FTS objects, such as the "direction" of dependence. We develop a new method to address this problem by projecting lagged pairs unit sphere and computing the angle between them, which we refer to as spherical autocorrelation. We establish the asymptotic properties of the empirical spherical autocorrelation, and we study its use in an application to European electricity data.
Title: PIMS Data Science Seminar: Artificial intelligence for data integration in biology and medicine
Speaker: Youlian Pan, National Research Council of Canada
Date and time:
14 Dec 2023,
2:30pm -
3:30pm
Location: via Zoom
Read full description
Zoom link.
This is our 3rd talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online.
Abstract:
In this data explosion era, machine learning has accelerated research in biology and medicine at an unprecedent speed and in multiple dimensions. The increased data volume and capacity for data aggregation and analytics power, along with decreasing costs of genome sequencing has spurred the growth in bioinformatics and need for novel tools to integrate the highly heterogenous data from multiple sources and of varying types, and extract meaningful patterns. The big data analytics and AI tools have already created significant impact in many fields of life sciences including medicine. However, the data complexity and multi-dimensionality have led to technical challenges in developing and validating AI solutions that generalize to diverse populations and imped the progress in their implementation in clinical practice due to imbalance in data distribution across population demography and data sparsity. This leads to the unconscious biases in the generated models and algorithms. In this talk, the speaker will discuss applications of AI in biology and medical research, advances and major challenges.
Short bio of the speaker:
Dr. Youlian Pan is an international expert in integrative pattern recognition from big data in Life Sciences. He has authored and co-authored over 80 refereed articles and created significant applications of data mining, machine learning, AI and bioinformatics in genomics, transcriptomics and systems biology with various medical applications, such as cancers, infectious diseases and neurodegenerative diseases. He also has extensive research interest in plants’ pathogenesis and embryogenesis, their interaction with environment, and biological oceanography specifically in marine pollution. Dr. Pan is a Senior Research Scientist at the National Research Council Canada and an Adjunct Professor at the University of Victoria and Brock University. He received his PhD in Biology and Master of Computer Science from Dalhousie University. He has served at various capacities in editorial board of six international journals, such as Journal of Computations & Modeling, Open Medical Informatics, and Frontiers in Genetics, Microbiology and Plant Sciences; and various national and international grant evaluation panels such as Natural Sciences and Engineering Research Council (NSERC) of Canada, National Science Foundation (NSF) of US.
Title: PIMS Data Science Seminar: A novel evolutionary, ensemble method for intrusion detection
Speaker: Belaid Moa, University of Victoria and Digital Research Alliance of Canada
Date and time:
24 Nov 2023,
2:00pm -
3:00pm
Location: Cornett A128 and Zoom
Read full description
Zoom link.
PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract: In this talk, we will share a new evolutionary but ensemble method, that enable us
to track different regimes of behavior and identify when the changes occurred.
As opposed to traditional methods that relies on statistical change tracking to detect intrusions,
we use the performance, and the predictive power of evolving models to detect when and which models can
or cannot describe the observations anymore. By doing so, we obtain a much fine-grain, more adaptive
outlier detection algorithm that can reliably model data while being robust to its variations.
The algorithm can be viewed as an evolutionary algorithm with growth and new generation capabilities,
but it is special in the sense that it includes ensemble of models with performance measures and
age decay corrections to evolve and compare models.
For some special cases, the algorithm can be related to Bayesian Change Point techniques.
Bio: Belaid Moa received the B.Sc. degree in electrical engineering from École Hassania
des Travaux Publics, Casablanca, Morocco, the M.Eng. degree in electronics and signal
processing from École Nationale Supérieure d'électronique, d'électrotechnique, d'informatique,
d'hydraulique et des Télécommunications, Toulouse, France, the DEA Diploma degree in Telecommunications
and Networks from the Institute National Polytechnique de Toulouse, Toulouse, and the Ph.D.
degree in computer science from the University of Victoria.
He is currently an adjunct faculty with ECE Dept., and Advanced Research Computing Specialist with the
Digital Research Alliance of Canada/BCDRI /University Systems, at the University of Victoria. He has
authored or co-authored many research articles and conference proceedings
in various journals and multi-disciplinary research areas
Title: Clustering for Climate Science Insights
Speaker: Dr. John R.J. Thompson, UBC Okanagan
Date and time:
22 Nov 2023,
3:00pm -
4:00pm
Location: via Zoom registration required
Read full description
PCIC is pleased to announce an upcoming talk on Wednesday, November 22nd, titled, Clustering for Climate Science Insights, as part of our Pacific Climate Seminar Series.
This talk will be delivered by Dr. John R.J. Thompson, an Assistant Professor at the University of British Columbia (Okanagan campus) whose areas of expertise are nonparametric and applied statistics and machine learning. This talk will be held between 3 p.m. and 4 p.m. Pacific Time, via Zoom meetings. For more on this talk, including registration information and an abstract, see the talk’s page on our site.