Eight papers across clinical AI evaluation, the digital food environment, US housing market dynamics, and applied machine learning. Open access where available.
2026
Front. Public Health
Protocol-aware epidemic forecasting across heterogeneous public health surveillance systems
Purpose: Public-health forecasting is central to epidemic intelligence and operational decision support. In practice, surveillance data are affected by reporting delays, revisions, and backfill, as well as abrupt regime shifts, which often reduce model reliability across regions and systems. Methods: We developed EpiMap-LLM, a protocol-aware forecasting approach that links epidemic dynamics with surveillance context using a frozen language-model backbone and lightweight trainable components. Results: Across daily and weekly surveillance settings (JHU CSSE COVID-19 and CDC influenza hospitalization surveillance), EpiMap-LLM consistently improves MAE and RMSE over strong forecasting baselines. Conclusion: Protocol-aware forecasting improves robustness and practical usefulness for surveillance dashboards, early warning, and public-health decision support in heterogeneous reporting systems.
@article{hu2026protocol,author={Hu, Yihan and Han, Jingyuan and Liu, Mingxin},title={Protocol-aware epidemic forecasting across heterogeneous public health surveillance systems},journal={Frontiers in Public Health},volume={14},pages={1829302},year={2026},doi={10.3389/fpubh.2026.1829302},publisher={Frontiers Media},url={https://doi.org/10.3389/fpubh.2026.1829302},}
JMIR AI
Toward Retrieval-Grounded Evaluation for Conversational Large Language Model-Based Risk Assessment
A conversational LLM clinical app that scores well on AUC is not the same as a clinical app whose explanations are grounded in real clinical guidance. We argue that LLM-based pediatric risk-assessment systems should be evaluated under two conditions — the current LLM-only setup and a retrieval-grounded setup using a fixed clinical corpus (CDC + WHO + IDSA guidance) — and scored on citation faithfulness, evidence-grounded correctness, and subgroup robustness alongside AUC. This is an architectural shift in how clinical LLM apps should be validated before deployment.
@article{hu2026retrieval,author={Hu, Yihan},title={Toward Retrieval-Grounded Evaluation for Conversational Large Language Model-Based Risk Assessment},journal={JMIR AI},volume={5},pages={e90759},year={2026},doi={10.2196/90759},publisher={JMIR Publications},url={https://doi.org/10.2196/90759},}
Clin. Imaging
Comment on “Classifying the clinical significance of common breast pain symptoms using a large language model, ChatGPT (GPT-4)”
Recent work showed GPT-4 can map free-text breast pain descriptions to a binary triage recommendation with reasonable sensitivity. The end-to-end design has a structural problem: clinical reasoning for breast pain usually goes via intermediate attributes (focality, cyclicity, red-flag features) before risk stratification. An end-to-end LLM bypasses these auditable intermediate steps, which means failures cannot be localised. We argue clinical-LLM triage should be evaluated as a two-step pipeline (attribute extraction then risk stratification) with attribute-level metrics.
@article{hu2026comment,author={Hu, Yihan},title={Comment on ``Classifying the clinical significance of common breast pain symptoms using a large language model, ChatGPT (GPT-4)''},journal={Clinical Imaging},volume={132},pages={110741},year={2026},doi={10.1016/j.clinimag.2026.110741},publisher={Elsevier},url={https://doi.org/10.1016/j.clinimag.2026.110741},}
Public Health Nutr.
Content prevalence is not adolescent exposure in TikTok influencer food marketing surveillance
Content prevalence on algorithm-driven platforms is not a clean proxy for adolescent exposure. A TikTok feed is shaped by reach, engagement dynamics, and audience composition, not by aggregate posting patterns. We propose three cheap fixes for ongoing surveillance: view-weighted prevalence, bounded sensitivity analysis for missing nutrient data, and audits of multi-product appearances to estimate undercounting from one-product-per-video coding rules.
@article{hu2026tiktok,author={Hu, Yihan},title={Content prevalence is not adolescent exposure in TikTok influencer food marketing surveillance},journal={Public Health Nutrition},volume={29},pages={e53},year={2026},doi={10.1017/S1368980026102213},publisher={Cambridge University Press},url={https://doi.org/10.1017/S1368980026102213},}
Lancet
Ultra-processed food policy must regulate the screen as well as the street
Most food-environment research and policy still assumes that exposure is shaped by neighbourhood retail — outlet density, school zones, buffers around homes. That model was right in 2010 but is incomplete in 2026: online food delivery platforms, social media food marketing, and algorithm-driven content curation are now major exposure pathways. We argue ultra-processed food policy needs an explicit digital track: platform-level marketing regulation parallel to broadcast rules, algorithmic exposure audits, and measurement frameworks that include view-weighted exposure rather than only retail proximity.
@article{hu2026upf,author={Hu, Yihan},title={Ultra-processed food policy must regulate the screen as well as the street},journal={The Lancet},year={2026},doi={10.1016/S0140-6736(26)00686-0},publisher={Elsevier},url={https://doi.org/10.1016/S0140-6736(26)00686-0},}
Front. Physiol.
The effects of 8 weeks of functional strength training and blood flow restriction training on lower limb muscle strength, maximal power, and movement quality in male sprinter college athletes
Ji Zhu, Jiale Wang, Huangkun Chen, and 3 more authors
Most BFR (blood-flow restriction) evidence is in older adults and rehab populations. In a randomised study of 28 male college sprinters over 8 weeks, we compared functional strength training (FST) alone against FST + BFR (cuffs at 50% of individualised arterial occlusion pressure). Both groups improved on isokinetic strength, jumps, FMS, and Y-Balance. The BFR-specific advantage was narrow but significant: a group × time interaction on anaerobic power (+92.99 W, F = 80.51, eta-squared partial = 0.756). For overall strength and movement quality, FST alone produces equivalent gains. Practical implication: BFR is a power tool, not a generic strength multiplier.
@article{zhu2026effects,author={Zhu, Ji and Wang, Jiale and Chen, Huangkun and Li, Ming and Wang, Yanlin and Hu, Yihan},title={The effects of 8 weeks of functional strength training and blood flow restriction training on lower limb muscle strength, maximal power, and movement quality in male sprinter college athletes},journal={Frontiers in Physiology},volume={17},pages={1798606},year={2026},doi={10.3389/fphys.2026.1798606},publisher={Frontiers Media},url={https://doi.org/10.3389/fphys.2026.1798606},}
SSRN
From Summer to Spring: A Shift in US Housing Market Seasonality
A theoretical companion to our Real Estate paper. We extend the Ngai-Tenreyro (2014) search-and-matching framework to monthly frequency, prove existence and uniqueness of the equilibrium, and calibrate to observed US data. Using SIPP household-mobility data and Google Trends as a corroborating indicator, we document a corresponding post-2021 shift in mobility timing. The calibrated model reproduces the spring shift in both prices and transaction volumes using mobility timing alone, without invoking changes in housing supply, credit conditions, or remote-work preferences.
@unpublished{hu2026summer,author={Hu, Yihan and Selcuk, Cemil},title={From Summer to Spring: A Shift in US Housing Market Seasonality},note={SSRN Working Paper},year={2026},doi={10.2139/ssrn.6762518},url={https://doi.org/10.2139/ssrn.6762518},}
2025
Real Estate
Seasonality in the US Housing Market: Post-Pandemic Shifts and Regional Dynamics
Seasonality has traditionally shaped the US housing market, with activity peaking in spring-summer and declining in autumn-winter. This study uses X-13-ARIMA decomposition on FHFA Housing Price Index data and Census transaction data covering 1991 to 2024 and documents a clean structural break after 2020. Post-pandemic, the seasonal peak has moved to March/April; the amplitude of the cycle has also grown; the pattern is consistent across US regions.
@article{hu2025seasonality,author={Hu, Yihan and Huang, Yifei},title={Seasonality in the US Housing Market: Post-Pandemic Shifts and Regional Dynamics},journal={Real Estate},volume={2},number={4},pages={22},year={2025},doi={10.3390/realestate2040022},publisher={MDPI},url={https://doi.org/10.3390/realestate2040022},}
2023
ICIICS
Customer Market Analysis Based on Interval Value Data Dynamic Clustering Algorithm
Yihan Hu
In 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), 2023
Standard K-means clustering for customer segmentation can converge to clusters that do not match commercial reality because random and k-means++ initialisation can place initial centres in low-density regions. We propose a dense-grid initialisation that selects initial centres from the centroids of the densest grid cells, and reformulate customer features as interval-valued data (each feature is an interval [min, max] capturing per-customer variability) rather than point estimates. On simulation experiments this approach achieves silhouette +0.1249 over k-means++ and +0.4903 over vanilla K-means, with consistent improvements in CR index and accuracy.
@inproceedings{hu2023customer,author={Hu, Yihan},title={Customer Market Analysis Based on Interval Value Data Dynamic Clustering Algorithm},booktitle={2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS)},year={2023},publisher={IEEE},doi={10.1109/ICIICS59993.2023.10421290},url={https://doi.org/10.1109/ICIICS59993.2023.10421290},}