Statistics seminar
Title: New numerical methods for computing optimal regression designs
Speaker: Julie Zhou, University of Victoria
Date and time:
18 Sep 2024,
2:00pm -
3:00pm
Location: DTB A203
Read full description
Optimal regression design problems on discrete design spaces can be written as convex optimization problems. When the number of points in the discrete design space is not very large, several numerical algorithms can find optimal designs effectively. However, when the number of points is huge, say 10,000 or more, it is challenging to find optimal designs. There are also issues to discretize irregularly shaped design spaces. We develop an effective iterative procedure to compute approximate optimal designs on discrete design spaces with a huge number of points. This procedure includes several new ideas for computing optimal designs: (1) new strategy to discretizing design spaces, (2) new rule to updating design spaces in the iteration, (3) new step to clustering support points in optimal designs. It is easy to use and is very fast. It can be applied for any regression model and many convex optimality criteria.
Title: A Bayesian Approach to Response Optimization on Data with Multistratum Structure
Speaker: Professor Po Yang, University of Manitoba
Date and time:
17 Apr 2024,
2:00pm -
3:00pm
Location: DSB C114
Read full description
Abstract: Response optimization is a process of identifying the input variable settings that optimize the response. Multistratum design arises naturally in industrial experiments due to the inconvenient and impractical completely randomization. Accounting for the model uncertainty, we apply the Bayesian model averaging method and predictive approach to investigate the optimization problem for data with multi-stratum structure. With the posterior probabilities of models as weights, we consider the weighted average of the predictive densities of the response over all potential models. The goal of the optimization is to identify the values of the factors that result in a maximum probability of a response in a given range. The method is illustrated with two examples.
Title: PIMS Data Science Seminar: Data thinning to avoid double dipping
Speaker: Lucy Gao, University of British Columbia
Date and time:
10 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 5th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Abstract
"Double dipping" is the practice of using the same data to fit and validate a model. Problems typically arise when standard statistical procedures are applied in settings involving double dipping. To avoid the challenges surrounding double dipping, a natural approach is to fit a model on one dataset, and then validate the model on another independent dataset. When we only have access to one dataset, we typically accomplish this via sample splitting. Unfortunately, in some problems, sample splitting is unattractive or impossible. In this talk, we are motivated by unsupervised problems that arise in the analysis of single cell RNA sequencing data, where sample splitting does not allow us to avoid double dipping. We first propose Poisson thinning, which splits a single observation drawn from a Poisson distribution into two independent pseudo-observations. We show that Poisson count splitting allows us to avoid double dipping in unsupervised settings. We next generalize the Poisson thinning framework to a variety of distributions, and refer to this general framework as "data thinning". Data thinning is applicable far beyond the context of single-cell RNA sequencing data, and is particularly useful for problems where sample splitting is unattractive or impossible.
Speaker bio
Website Link: https://www.lucylgao.com/
Title: PIMS Data Science Seminar: Functional Nonlinear Learning
Speaker: Jiguo Cao, Simon Fraser University
Date and time:
05 Apr 2024,
2:00pm -
3:00pm
Location: DTB A102 and Zoom
Read full description
Zoom link.
This is our 6th talk of the PIMS Data Science Seminar Series. PIMS requests all seminar participants to complete the demographics form online at https://ubc.ca1.qualtrics.com/jfe/form/SV_6QcNr2rQcIlQGyy
Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis, e.g., functional principal component analysis (FPCA). However, functions, as infinite-dimensional objects, sometimes have nonlinear structures that cannot be uncovered by linear mapping. Linear methods will be more overwhelmed by multivariate functional data. For that matter, this paper proposes a functional nonlinear learning (FunNoL) method to sufficiently represent multivariate functional data in a lower-dimensional feature space. Furthermore, we merge a classification model for enriching the ability of representations in predicting curve labels. Hence, representations from FunNoL can be used for both curve reconstruction and classification. Additionally, we have endowed the proposed model with the ability to address the missing observation problem as well as to further denoise observations. The resulting representations are robust to observations that are locally disturbed by uncontrollable random noises. We apply the proposed FunNoL method to several real data sets and show that FunNoL can achieve better classifications than FPCA, especially in the multivariate functional data setting. Simulation studies have shown that FunNoL provides satisfactory curve classification and reconstruction regardless of data sparsity.