blank

An accidental international team — and our first epidemic-forecasting paper

2026-05-29T09:00:00+00:00

I don’t usually write blog posts about how a paper happened. This one is an exception, because this one didn’t happen the way papers usually do.

How it started — a formal swap

Early in the MPhil, I met Minghong through a formal swap between our two colleges. I’m at Wolfson; he’s at Peterhouse. Formal swaps are a Cambridge ritual where colleges trade a handful of students for a black-tie dinner across the dining hall — a low-friction way to meet people outside your own department and your own college.

We ended up at the same table. The conversation kept going after dinner, and over the next few weeks Minghong mentioned, in passing, that his older brother Mingxin Liu was a researcher in the Department of Health Communication at the University of Tokyo’s Graduate School of Medicine. I filed it away.

A Dalian coincidence

Around the same time, I got to know Jingyuan Han — a researcher in the Department of Cancer Prevention and Control at the National Cancer Center, Chinese Academy of Medical Sciences and Peking Union Medical College in Beijing. We weren’t in any of the same labs either. We met the way two Chinese students in different cities do: through somebody who knew somebody.

Somewhere in our first real conversation Jingyuan mentioned his girlfriend was a student at Dalian Medical University.

I’m from Dalian.

That moment landed in the place such moments land. The conversation slowed down for a second, then sped up. We didn’t immediately start talking about science — we talked about Dalian. The seafood. The harbour. Where she studied versus where I grew up. Whether the campus had changed since I was last back.

I think that’s actually when the project started, even though the project didn’t exist yet.

The accidental geometry

After a few more conversations, the three of us — Cambridge → Tokyo → Beijing — realised we were each chasing the same problem from different angles.

Public-health surveillance systems do not speak the same language across borders. Different case definitions. Different rules for what counts as a lab-confirmed case. Different ICD coding habits. Reporting delays, revisions, and backfill. A forecasting model fitted on one system breaks the moment you cross a border, change a coding standard, or pull from a different reporting tier.

Mingxin saw it from the health-communication side at Tokyo — how surveillance protocols shape what the public ends up hearing and trusting. Jingyuan saw it from the cancer prevention and control side at the National Cancer Center, where national surveillance data flows through specific institutional pipes and feeds intervention design. I saw it from the statistical / methodological side at Cambridge — if the underlying protocols differ, your forecasting model needs to know about the protocols.

None of us would have written this paper alone.

EpiMap-LLM

The proposal is straightforward in retrospect: bake the protocol structure into the forecasting model itself. Instead of treating each surveillance stream as a black-box source of case counts, condition the model on the protocol that generated those counts — case definition, reporting cadence, revision pattern, backfill — so it learns what each stream is actually measuring, not just the headline numbers.

We built this as EpiMap-LLM — a parameter-efficient alignment framework that connects numerical time-series representations with a frozen language-model backbone, with only lightweight trainable components on top. Protocol-aware tokens are injected as context, so the model can distinguish protocol-induced fluctuations from genuine epidemiological shifts and transfer across datasets and temporal granularities.

We tested it on two heterogeneous surveillance settings — JHU CSSE COVID-19 (daily) and CDC influenza hospitalization (weekly). EpiMap-LLM consistently improves MAE and RMSE over strong forecasting baselines. The gain is largest where reporting irregularities are worst — exactly the operational regime where forecasts have to be trusted.

The full paper is open access in Frontiers in Public Health, published 29 May 2026. Jingyuan and I are co-first authors.

Read the paper · Frontiers (open access)

What I want to remember

The geometry. Cambridge, Tokyo, Beijing. A college friend’s older brother. A girlfriend at a Dalian medical school. Late-night WeChat threads that turned into shared spreadsheets that turned into a draft. International collaboration sometimes gets described as if it were arranged at the institutional level — MOUs, summer schools, formal exchanges. This wasn’t that. Three sides showed up because three friendships did. The paper followed.

What’s next

Jingyuan and I are starting to look at a follow-up — social media in China and cancer prevention — pulling on his cancer-prevention work at the National Cancer Center and on the communication side that Mingxin works on in Tokyo. It’s a natural extension: surveillance data and public-facing communication are two halves of the same loop. It also connects to my dissertation thread on the digital food environment, where the same question recurs in a different language. When we have something to show, this is where I’ll write about it.

Mingxin, Jingyuan — thank you. This one is special.

BFR with sprinters — a collaborative RCT

2026-05-12T09:00:00+00:00

Coming into the MPhil at Cambridge, I wanted to get hands-on experience across the breadth of health research — not just nutritional epi, but also things like genomics, physiology, sport science — before settling deeper into one. The publication itself is secondary to the experience. So when an opportunity came to join a sport-physiology RCT led by Ji Zhu and colleagues at Fujian Normal University (with Jiale Wang, Huangkun Chen, Ming Li, and Yanlin Wang), I took it. I contributed the statistical analysis and the writing, and joined as corresponding author.

Most blood-flow restriction (BFR) evidence is in older adults and rehabilitation populations. Data in trained athletes is thin. The team ran an 8-week study in male college sprinters (n = 28), comparing functional strength training (FST) alone against FST + BFR. Cuff pressures were set at 50% of individualised arterial occlusion pressure measured by Doppler — a more rigorous protocol than ratio-based BFR prescriptions.

The finding was narrower than I expected, and that’s the interesting part:

Both groups improved on isokinetic knee strength, CMJ, squat jump, FMS, Y-Balance
The BFR-specific advantage was a significant group × time interaction on anaerobic power only (+92.99 W vs FST alone, F = 80.51, η²p = 0.756)
For overall strength and movement quality, FST alone got you most of the way

Practical framing for coaches: BFR is a power tool, not a generic strength multiplier. If the priority is sprint-relevant anaerobic power, the cuffs do real work. If the priority is general strength, FST alone is probably enough.

Open access in Frontiers in Physiology, 2026: link.

Food policy is still regulating streets — we wrote to The Lancet about screens

2026-04-30T09:00:00+00:00

The Lancet ran a comment on ultra-processed food policy by Scrinis and colleagues — a strong piece that pushed UPF policy beyond reformulation toward fiscal, labelling, marketing, and retail measures. But the digital food environment barely came up.

This sat alongside the MPhil dissertation work on UPFs I was doing with my supervisors Mike Essman and Jean Adams at Cambridge. Reading the UPF literature in depth, then talking through what the policy frame still missed — with Mike, Jean, and other colleagues at the MRC Epidemiology Unit and CEDAR — made the gap feel obvious. Online food delivery platforms, social media food marketing, and algorithm-driven content curation are now major exposure pathways. Current food-environment frameworks don’t really see them.

I wrote a follow-up correspondence in The Lancet. The argument is simple: UPF policy needs an explicit digital track. Three concrete asks:

Algorithmic exposure audits for adolescent food marketing
Platform-level marketing regulation parallel to broadcast rules
Measurement frameworks that include view-weighted exposure, not just retail proximity

Correspondence in The Lancet, 2026: link. Pairs naturally with the TikTok exposure letter in Public Health Nutrition.

Clinical LLM apps need retrieval-grounded evaluation, not just AUC

2026-04-15T09:00:00+00:00

Same MPhil exploration period that gave me the Clinical Imaging letter — reading widely across clinical AI, evaluation methodology, and infectious-disease epidemiology. JMIR AI published a study using fine-tuned LLaMA2 / Flan-T5 for pediatric COVID-19 severity risk assessment, deployed as a conversational app. The evaluation reported AUC.

AUC tells you the classifier is calibrated on held-out data. It does not tell you whether the natural-language explanations the deployed app produces are grounded in actual clinical guidance — only whether the underlying yes/no token probabilities discriminate on the dataset.

That gap matters. The deployed system is a chat interface giving caregivers fluent risk explanations. Fluent + calibrated is not the same as grounded. A model can confidently invent a CDC recommendation that doesn’t exist.

I wrote a methodological letter for JMIR AI arguing for a parallel evaluation track. Same pipeline, evaluated twice:

LLM-only, current setup. Score on AUC, accuracy, calibration.
Retrieval-grounded against a fixed clinical corpus (CDC pediatric COVID guidance + WHO + IDSA). Score on citation faithfulness, evidence-grounded correctness, and subgroup robustness.

This is an architectural shift, not a tuning refinement. RAG is not just a deployment pattern; it is an evaluation substrate.

Letter in JMIR AI, 2026: link.

End-to-end LLM clinical triage misses the steps that matter

2026-03-15T09:00:00+00:00

Starting the MPhil, I deliberately let myself wander a bit before settling — reading across nutritional epi, genomics, sport-physiology, and clinical imaging — partly to know what the next year of work should look like, partly because I just enjoyed the surveying. This letter came out of that wandering period. A Clinical Imaging paper crossed my reading list: GPT-4 mapping free-text breast-pain descriptions to a binary triage recommendation. Sensitivity was reasonable, and the demonstration was useful. But the end-to-end design bothered me.

Clinical reasoning for breast pain does not go directly from “free text” to “refer / don’t refer.” It goes via intermediate attributes — focality, cyclicity, associated red-flag features (mass, skin changes, lymphadenopathy) — and only then to a risk-stratified decision. An end-to-end LLM that skips attribute extraction is asking the model to implicitly infer features that are neither extracted nor verifiable.

The original paper’s misclassifications cluster in cases with ambiguous or absent attribute information. That is exactly what you would expect if the model is failing on implicit feature inference.

I wrote a methodological commentary arguing for a two-step pipeline: attribute extraction first, risk stratification second. Failures become localisable. Clinicians can override at the attribute level rather than at the decision level. That is what auditable clinical AI looks like.

Letter in Clinical Imaging, 2026: link.

From a Cambridge nutritional epi class to a letter in PHN

2026-02-18T09:00:00+00:00

This one came out of my MPhil dissertation work on ultra-processed foods, supervised by Mike Essman and Jean Adams at Cambridge. Reading widely around UPFs and the food environment for the dissertation, I kept running into TikTok food-marketing surveillance papers — careful methodology, but the implicit step from “content prevalence” to “adolescent exposure” bothered me.

On algorithm-driven platforms, what creators post is not what teenagers see. The platform decides. A handful of high-view posts can drive most of the actual exposure; a much larger number of low-view posts can drive almost none. Sampling content does not give you exposure unless you weight by views.

Conversations with Mike, Jean, and other colleagues across the MRC Epidemiology Unit and CEDAR sharpened the argument. I wrote it up as a letter. Public Health Nutrition published it.

The proposal is methodological: surveillance studies should report view-weighted prevalence, bounded sensitivity analysis for missing nutrient data, and audit multi-product appearances. None of these require platform-side data access; they make the inference defensible.

Letter in Public Health Nutrition, 2026: link.

From data to mechanism — continuing with Cemil, now under review at JEBO

2026-02-10T09:00:00+00:00

The empirical paper documented that US housing seasonality shifted. This one asks why.

After the MDPI paper came out, Cemil Selcuk — my original undergrad supervisor at UCL, now at Cardiff — and I picked the project back up over the holidays. We turned the empirical pattern into a structural story. He brought the modelling discipline; I brought the data and the calibration. The whole thing was built across vacations and weekends, on top of the MPhil schedule, but the dynamic that started in his STAT0035 office is still what made it work.

We extended Ngai-Tenreyro (2014) search-and-matching to monthly frequency, proved existence and uniqueness of the equilibrium, and calibrated to observed US data.

The headline: a post-2021 shift in the timing of residential moves (SIPP household-mobility data, with Google Trends as a corroborating signal) is sufficient on its own to reproduce the spring shift in both prices and transaction volumes. We don’t need to invoke remote work, credit conditions, or housing supply shocks. The mobility channel alone gets you there. Those other channels may be operating too — they just aren’t required.

The paper is currently under review at the Journal of Economic Behavior and Organization (JEBO). Working paper version on SSRN: link.

My first journal paper — started as a UCL undergrad project

2025-12-15T09:00:00+00:00

This one is special to me. It started as my STAT0035 third-year project at UCL Data Science, supervised by Cemil Selcuk. The question was simple: had US housing market seasonality changed since the pandemic? The textbook said summer peak, June-July, every year. The FHFA data after 2020 looked different to me, and I wanted to know whether the eye was right.

It was. X-13-ARIMA on 33 years of FHFA HPI + Census transaction data: peak moved from May/June to March/April after 2020, amplitude grew, pattern consistent across most US regions.

What I’m proud of is that I didn’t stop when the course did. After graduating, I teamed up with Yifei Huang — a close classmate and friend from the UCL stats department, who has since moved on to Northwestern University in the US — to extend the project into a full journal article. Together we worked through additional data, robustness checks, and the regional breakdown that ended up in the published version.

It is rare for an undergrad project to make it all the way through to a published journal paper, and rarer still to keep building on it afterwards. The follow-up — a structural model with Cemil — is now under review at JEBO.

Open access in Real Estate (MDPI), 2025: link.

Where I started — a UCL undergrad project on K-means initialisation

2023-11-15T09:00:00+00:00

This is where my publication record actually starts. UCL Data Science, STAT0041 (Computational Statistics). The course was taught by Professor Yvo Pokern, and the experience reshaped how I think about three things at once — statistics, algorithms, and programming. His teaching had a particular German rigour to it: an algorithm is not just a recipe to memorise but an object you derive, dissect, and rebuild from first principles. After that module I no longer experienced “stats” and “code” as two separate subjects. They were the same craft. This paper was the first thing I wrote with that perspective in mind.

The annoyance that drove it: random and k-means++ initialisation can place initial centres in low-density regions, then converge to clusters that don’t match anything a marketer would recognise. The fix was almost embarrassingly simple — pick initial centres from the centroids of the densest grid cells in feature space. The interesting bit was the data representation. Treating customer features as interval-valued data (each feature an interval [min, max] capturing per-customer variability) rather than as point estimates preserved a layer of information that simpler approaches squashed.

On simulation experiments: silhouette +0.1249 over k-means++, +0.4903 over vanilla K-means.

Looking back, the methodological reflex that runs through everything I’ve done since started here. What is the data actually telling you, beyond the point estimate? That question carries from clustering all the way to TikTok exposure measurement and clinical LLM evaluation.

Conference paper at ICIICS 2023 (IEEE): link.