Statistics seminar
Title: Single Cell RNA-Sequencing and cell-type annotation
Speaker: Yushan Hu, University of Victoria
Date and time:
28 Mar 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
In this talk, I will share our work on Single Cell RNA-Sequencing and cell-type annotation based on two recent projects. I'll start by introducing our BAL Single Cell RNA-seq data. Bronchoalveolar lavage (BAL) is a procedure that is sometimes done during a bronchoscopy. BAL is used to collect a sample from the lungs for testing, different from traditional lung tissue samples. After Covid-19, BAL has gotten more attention. However, it lacks significant consensus on annotating BAL scRNA-seq samples. The previous research mainly focused on annotating lung tissue samples. Consequently, we plan to construct an auto-annotation method for the BAL scRNA-seq sample, which can aid with the general cell type annotation. After that, I want to talk about exploring the Macrophages' and Alveolar macrophages' substructure. We want to identify their subtypes and how they affect the disease (COPD).
Title: Daily Mortality and Air Quality with Case-Crossover Models
Speaker: Dr. Patrick Brown, University of Toronto and Centre for Global Health Research, St. Michael’s Hospital
Date and time:
15 Mar 2023,
2:30pm -
3:30pm
Location: Elliot 162
Read full description
Abstract: Quantifying the relationship between daily changes in air pollution and
short-term effects on mortality and hospitalisations is a complex task with
many steps. Air pollution data contains missing values and outliers.
Uncertainty in exposures should be reflected in uncertainty in effect sizes.
Inference must account for the multitude of factors unrelated to air
pollution that can influence mortality. Exposure-response effects are
non-linear. Analyses across multiple cities and regions must take into
account possible city-level variation in effects.
This talk will describe ongoing work undertaken in under contract from Health
Canada to estimate health effects from pollution in 50 Canadian cities. A
key feature of the project is the case-crossover model, a form of partial
likelihood which adjusts for most changes in time using control days.
Title: Deep Learning With Functional Inputs
Speaker: Dr. Jiguo Cao, Statistics, Simon Fraser University
Date and time:
14 Mar 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
Abstract: I will present our recent methodology for integrating functional data into deep neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to a greater interpretability of the relationship between the covariates and the response relative to conventional neural networks. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying relationship between the functional covariate and scalar response; these results were confirmed through real data applications and simulation studies. An R package (FuncNN) has also been developed on top of Keras, a popular deep learning library—this allows for general use of the approach.
Title: Spatial Modal Regression
Speaker: Dr. Tao Wang, Economics, UVic
Date and time:
07 Mar 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
Abstract: We propose to estimate modal regression with spatial data observed over a rectangular domain by assuming that the conditional mode of the response variable given covariates follows a nonparametric regression structure. We study the newly developed spatial modal regression utilizing the local linear approximation augmented with shrinking bandwidths. The asymptotic normal distributions of the resultant modal estimators are established and the explicit formulas for their asymptotic biases and variances are derived under mild regularity assumptions. The selection of optimal bandwidths in theory and practice are discussed. We also show that the targeted spatial modal regression can be utilized as an alternative to a nonparametric spatial robust regression when the data are symmetrically distributed. The asymptotic distributions for such modal-based robust estimators are derived with appropriate choices of bandwidths, which demonstrate that the suggested estimators can achieve the full asymptotic efficiency of the mean estimators when there are no outliers and the error distribution is normal. Monte Carlo simulations and a real data
analysis on soil data are presented to illustrate the good finite sample performance of the estimators. We in the end generalize the propounded spatial modal regression to an additive sum of the form to avoid the curse of dimensionality and develop a kernel-based backfitting algorithm for estimating. We substantiate that the proposed modal estimator of each additive component is asymptotically normal and converges at the univariate nonparametric modal optimal rate.
Title: Statistics for COVID-19: Four Applications
Speaker: Dr. Lloyd T. Elliott, Statistics, Simon Fraser University
Date and time:
01 Mar 2023,
11:00am -
12:00pm
Location: DSB C126
Read full description
Abstract: I will discuss four applications of statistics to COVID-19 research. 1) The disease COVID-19 is caused by infection by the SARS-CoV-2 virus. However, variation in human genetics can modulate the severity of the disease, or susceptibility to infection. HostSeq is a resource of ~10,000 DNA sequences of COVID-19+ Canadians. I will discuss statistical challenges and polygenic risk scores with HostSeq data. 2) Due to asymptomatic infection, and underreporting, the burden of COVID-19 is greater than indicated by confirmed case counts. I will discuss methods to estimate the true burden of COVID-19, and to combine confirmed case counts with serology surveys to provide more accurate estimates of prevalence. 3) I will discuss a statistical operationalization of SIR models to determine the lag between changes in non-pharmaceutical interventions and statistically significant changes in confirmed case count trajectories. 4) I will discuss evidence for neurological manifestation of COVID-19 in brain imaging. These applications are joint work with MAGPIE, UVic, SickKids and Oxford.
Title: Two kinds of over-dispersion affect regional DNA methylation patterns
Speaker: Celia Greenwood, McGill
Date and time:
14 Feb 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
Abstract: DNA methylation is an epigenetic mark intrinsically involved in regulating the activity of DNA, and methylation levels are known to change with age, exposures, and disease status. DNA methylation can be measured with a sequencing technique that gives methylated and unmethylated counts at each targeted position in the genome, but the data are very noisy. I will describe an over-dispersed quasi-binomial model with functional smoothing to model DNA methylation patterns in small genomic regions, and how these patterns depend on covariates. Results will be illustrated with an analysis of DNA methylation and a biomarker strongly associated with rheumatoid arthritis.
Title: Valid inference after clustering with application to single-cell RNA-sequencing data
Speaker: Lucy Gao, UBC
Date and time:
07 Feb 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
Abstract: In single-cell RNA-sequencing studies, researchers often model the variation between cells with a latent variable, such as cell type or pseudotime, and investigate associations between the genes and the latent variable. As the latent variable is unobserved, a two-step procedure seems natural: first estimate the latent variable, then test the genes for association with the estimated latent variable. However, if the same data are used for both of these steps, then standard methods for computing p-values in the second step will fail to control the type I error rate.
In my talk, I will introduce two different approaches to this problem. First, I will apply ideas from selective inference to develop a valid test for a difference in means between clusters obtained from the hierarchical clustering algorithm. Then, I will introduce count splitting: a flexible framework that enables valid inference after latent variable estimation in count-valued data, for virtually any latent variable estimation technique and inference approach.
This talk is based on joint work with Jacob Bien (University of Southern California), Daniela Witten and Anna Neufeld (University of Washington), as well as Alexis Battle and Joshua Popp (Johns Hopkins University).
Title: Multivariate One-sided Tests for Nonlinear Mixed Effects Models with Censored Responses
Speaker: Lang Wu, UBC
Date and time:
31 Jan 2023,
11:00am -
12:00pm
Location: via Zoom
Read full description
Zoom link
Abstract: Nonlinear mixed effects (NLME) models are commonly used in modelling many longitudinal data such as pharmacokinetics and HIV viral dynamics. These models are often derived based on the underlying data generation mechanisms, so the parameters in these models often have meaningful physical interpretations and natural restrictions such as some parameters being positive. Hypothesis testing for these parameters should incorporate these restrictions, leading to one-sided or constrained tests. Motivated from HIV viral dynamic models, in this article we propose multi-parameter one-sided or constrained tests for NLME models with censored responses, e.g., viral dynamic models with viral loads below detection limits. We propose approximate likelihood-based tests which are computationally efficient. We evaluate the tests via simulations and show that the proposed tests are more powerful than the corresponding two-sided or unrestricted tests. We apply the proposed tests to an AIDS dataset with new findings.
Title: A novel machine learning approach for gene module identification and prediction via a co-expression network of single-cell sequencing data
Speaker: Li Xing, University of Saskatchewan
Date and time:
24 Jan 2023,
11:00am -
12:00pm
Location: CLE C115
Read full description
Abstract:Gene co-expression network analysis is widely used in microarray and RNA sequencing data analysis. It groups genes based on their co-expression network. And genes within a group infer similarity in function or coregulation in the pathway.
In literature, the approaches to group genes are mainly unsupervised, which may introduce instability and variation across different datasets. Inspired by ensemble learning, we propose a novel approach that ensemble supervised and unsupervised learning techniques and simultaneously works on two tasks, gene module identification and phenotype prediction, during the data analysis process. The identified gene modules from this approach could suggest more candidate genes to the original pathway, and those genes are potential biomarkers for pathway-related diseases. In addition, the novel approach also improves the prediction accuracy for phenotypes.
The algorithm can be used as a general prediction algorithm. And, as it is specially designed to handle large samples, it is suitable for handling single-cell data with many cells. We showcased the use of the algorithm in single-cell cell-type auto-annotation.
Title: Evaluation of Logrank, MaxCombo and Difference in Restricted Mean Survival Time in Immuno-Oncology (IO) trials - A retrospective analysis in patients treated with anti-PD1/PDL1 agents across solid tumors
Speaker: JiaBu Ye, MSD
Date and time:
29 Nov 2022,
1:30pm -
2:30pm
Location: via Zoom
Read full description
Zoom link.
Abstract: The log-rank test is considered the criterion standard for comparing 2 survival curves in pivotal registrational oncology trials. However, with novel immunotherapies that often violate the proportional hazards assumptions over time, log-rank can lose power and may fail to detect treatment benefit. We performed systematic review and meta-analysis of 63 studies between the log-rank, maxcombo and dRMST. The findings of this review show that MaxCombo may provide a pragmatic alternative to log-rank when departure from proportional hazards is anticipated. Both tests resulted in the same statistical decision in most comparisons. Discordant studies had modest to meaningful improvements in treatment effect. The dRMST test provided no added sensitivity for detecting treatment differences over log-rank.
Bio:
Jiabu Ye is principal scientist of Biostatistics at MSD. He is endometrial indication lead statistician over see several endometrial late phase clinical trials. Before joining MSD, Jiabu worked at late development trial statistician in AstraZeneca and lead multiple late phase trial development. He is also a member of NPH cross-pharma working group. He received PhD in biostatistics from University of Texas Health Science Center at Houston.