This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember your browser. We use this information to improve and customize your browsing experience, for analytics and metrics about our visitors both on this website and other media, and for marketing purposes. By using this website, you accept and agree to be bound by UVic’s Terms of Use and Protection of Privacy Policy.  If you do not agree to the above, you can configure your browser’s setting to “do not track.”

Skip to main content

Jing Zhang

  • BSc (Zhongnan University of Economics and Law, 2023)
Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Beyond Conventional P-Values: Addressing Statistical Challenges in Big Data

Department of Mathematics and Statistics

Date & location

  • Wednesday, January 14, 2026
  • 7:00 P.M.
  • Virtual Defence

Examining Committee

Supervisory Committee

  • Dr. Xuekui Zhang, Department of Mathematics and Statistics, University of Victoria (Co-Supervisor)
  • Dr. Min Tsao, Department of Mathematics and Statistics, UVic (Co-Supervisor)

External Examiner

  • Dr. Alex Thomo, Department of Computer Science, UVic

Chair of Oral Examination

  • Dr. Ke Xu, Department of Economics, UVic

Abstract

Do larger sample sizes lead to higher false positive rates in statistical analysis? The answer provided by ChatGPT 4o is ’no’, which is a common opinion shared by many statisticians. However, empirical evidence from large datasets analyses, such as those from biobanks and single-cell genomics, challenges this conclusion. Common practice assesses both p-values and effect sizes to mitigate the risk of identifying spurious effects in large samples. Nonetheless, the need to adjust p-values in these contexts is unaddressed, which motivated this investigation. We found that common beliefs and practices are incorrect in real-world data analysis, since theoretical assumptions are always violated. Growing sample sizes can amplify violation impacts, inflating false positive rates. Using a simulation study, we provide examples to support our statement and illustrate a permutation-based remedy. This work’s intended contribution is to heighten awareness within our community about the pressing need to reevaluate standard statistical methods in analyzing datasets with huge sample sizes, thereby inspiring further substantial efforts to tackle this emerging challenge of the big data era.