Yihan Hu
MPhil student · Health Data Science / Epidemiology · University of Cambridge · yh623@cam.ac.uk
MRC Epidemiology Unit
MRC Biostatistics Unit
Institute of Metabolic Science
University of Cambridge
I’m an MPhil student in Health Data Science / Epidemiology at the University of Cambridge, based across the MRC Epidemiology Unit, the MRC Biostatistics Unit, and the Institute of Metabolic Science. My current research sits at two main frontiers: nutritional epidemiology, with a particular focus on ultra-processed foods (UPFs) and how exposure to them is shaped by the digital food environment; and spatial data science, with an emphasis on US-based applications — most recently the post-pandemic shift in US housing market seasonality and its mobility-driven mechanisms.
The four threads below trace the papers that have come out of this work so far:
Clinical AI evaluation — methodological rigour for clinical large language model applications. Recent papers argue that conversational clinical LLM apps should be evaluated under a retrieval-grounded protocol, not just AUC (JMIR AI, 2026), and that end-to-end LLM clinical triage skips the intermediate reasoning attributes that the rest of medicine uses (Clinical Imaging, 2026).
Digital food environment — moving food policy and exposure measurement from physical retail to algorithm-driven platforms. Recent work in The Lancet (2026) on why ultra-processed food policy needs an explicit digital track, and in Public Health Nutrition (2026) on the gap between content prevalence and adolescent exposure on TikTok.
US housing market dynamics — seasonality, mobility, and search-and-matching frameworks. With Yifei Huang (Northwestern), we documented the post-pandemic shift in US housing seasonality from May/June to March/April using 33 years of FHFA and Census data (Real Estate, 2025). With Cemil Selcuk (Cardiff), we extended the Ngai-Tenreyro framework to monthly frequency and showed the spring shift can be explained by a corresponding change in household mobility timing alone (SSRN, 2026).
Applied machine learning — clustering, segmentation, and interval-valued data representations for retail analytics.
The fastest way to reach me about a specific paper is by email. The full publication list is on the publications page.