This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember your browser. We use this information to improve and customize your browsing experience, for analytics and metrics about our visitors both on this website and other media, and for marketing purposes. By using this website, you accept and agree to be bound by UVic’s Terms of Use and Protection of Privacy Policy.  If you do not agree to the above, you can configure your browser’s setting to “do not track.”

Skip to main content

Quan Nguyen

  • BEng (FPT University, 2014)

  • MSc (University of Hamburg, 2018)

Notice of the Final Oral Examination for the Degree of Doctor of Philosophy

Topic

Adaptive Algorithms for Online Learning in Non-Stationary Environments

Department of Computer Science

Date & location

  • Thursday, January 15, 2026

  • 10:00 A.M.

  • Clearihue Building, Room B021

  • Virtual Defence

Reviewers

Supervisory Committee

  • Dr. Nishant Mehta, Department of Computer Science, University of Victoria (Supervisor)

  • Dr. Venkatesh Srinivasan, Department of Computer Science, UVic (Member)

  • Dr. Cristóbal Gúzman, Department of Electrical and Computer Engineering, UVic (Outside Member) 

External Examiner

  • Dr. Ashok Cutkosky, Department of Electrical and Computer Engineering, Boston University 

Chair of Oral Examination

  • Dr. Hong-Chuan Yang, Department of Electrical and Computer Engineering, UVic 

Abstract

Traditional online learning literature often assumes static environments, where funda mental properties like data distribution or action spaces do not change over time, and the learner competes against a single best action. This framework, however, fails to capture the complexity of many practical scenarios, such as automated diagnostic systems or inventory management, where the optimal course of action is non-stationary and changes sequentially. In such settings, adaptivity is crucial as algorithms must maintain and leverage past information to respond effectively to unforeseen changes. This thesis advances the theory of online learning in non-stationary environments by developing adaptive algorithms with provably strong theoretical guarantees. 

Two key non-stationary learning problems are online multi-task reinforcement learning (OMTRL)and multi-armed bandits with sleeping arms. In OMTRL, a learner interacts with a sequence of Markov Decision Processes (MDPs). Each MDP is chosen adversarially from a small collection of MDPs, requiring the learner to efficiently transfer knowledge between tasks. In multi-armed bandits with sleeping arms, the set of available arms varies adversarially across rounds, prompting the learner with unique exploration-exploitation tradeoff methods. A key contribution of this thesis is a number of novel lower bounds and algorithms with near-optimal worst-case regret upper bounds for these two problems. In addition, this thesis applies the new techniques in these new algorithms into deriving improved sample complexity for group distributionally robust optimization (GDRO) and novel data-dependent best-of-both-worlds regret upper bounds for multi-armed bandits. 

In summary, this thesis provides mathematically-grounded adaptive algorithms that achieve state-of-the-art performance guarantees in learning from non-stationary and adversarially changing environments in reinforcement learning and multi-armed bandits, as well as showing new, fundamental connections between multi-armed bandits with sleeping arms and robust optimization.