This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember your browser. We use this information to improve and customize your browsing experience, for analytics and metrics about our visitors both on this website and other media, and for marketing purposes. By using this website, you accept and agree to be bound by UVic’s Terms of Use and Protection of Privacy Policy. If you do not agree to the above, you must not use this website.

Skip to main content

Zhiming Huang

  • BEng (Northwestern Polytechnical University, 2018)
  • MSc (University of Victoria, 2020)
Notice of the Final Oral Examination for the Degree of Doctor of Philosophy

Topic

Swap-Regret-Minimizing Bandits for Distributed Network Optimization

Department of Computer Science

Date & location

  • Monday, August 18, 2025
  • 9:00 A.M.
  • Virtual Defence

Examining Committee

Supervisory Committee

  • Dr. Jianping Pan, Department of Computer Science, University of Victoria (Supervisor)
  • Dr. Nishant Mehta, Department of Computer Science, UVic (Member)
  • Dr. Hongchuan Yang, Department of Electrical and Computer Engineering, UVic (Outside Member)

External Examiner

  • Dr. Jia (Kevin) Liu, Department of Electrical and Computer Engineering, Ohio State University

Chair of Oral Examination

  • Dr. Karena Shaw, School of Environmental Studies, UVic

Abstract

Modern networked systems—ranging from real-time communication platforms to distributed computing infrastructures—operate in increasingly dynamic and strategic environments, where traditional optimization methods often fall short. This dissertation develops a new algorithmic framework for distributed network optimization grounded in game-theoretic bandit learning. We model fundamental problems, such as congestion control and resource allocation, as repeated games involving strategic agents who receive only partial (bandit) feedback. Motivated by practical challenges in computer networks, we design and analyze algorithms that not only minimize regret but also steer collective behavior toward equilibrium.

The contributions of this dissertation are threefold. First, we propose a new framework based on swap-regret minimization and online mirror descent, and establish high-probability regret bounds in multi-player bandit settings. These results guarantee convergence to correlated equilibria under decentralized, partial information feedback. Second, we introduce optimistic learning techniques to accelerate convergence by leveraging predictability in the environment. Third, we apply our algorithms to real-world networking tasks, including TCP congestion control, and demonstrate improved stability, throughput, and fairness through ex tensive trace-driven emulations.

Together, these contributions bridge the theoretical foundations of online learning and game theory with practical considerations in network protocol design, offering robust tools for decentralized decision-making in uncertain and adversarial environments.