A curated collection of research papers I'm reading, have read, or plan to read.
Tools
Papers
Knowledge Base
Courses
- ComfyAI - Collection of LLM techniques and workflows
- verl - Volcano Engine Reinforcement Learning for LLMs (RLHF framework supporting FSDP, vLLM, SGLang)
- FeatureDB - Pattern recognition methods for ECG feature extraction (expert features including HRV, morphologic variability, frequency domain, QRS axis)
- HeartPy - Python Heart Rate Analysis Toolkit for PPG and ECG signals (time-domain & frequency-domain measures)
- Braindecode - Deep learning toolbox for decoding EEG, ECG, and MEG signals (PyTorch-based, includes datasets, preprocessing, models)
- Proximal Policy Optimization Algorithms - John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov (2017)
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models - Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan (2023)
- Understanding the Math Behind GRPO — DeepSeek-R1-Zero - Soumanta Das, Yugen.ai (2025)
- DeepSeek-V3 Explained 1: Multi-head Latent Attention - Shirley Li (2025)
- Mixture-of-Experts (MoE) LLMs - Cameron R. Wolfe (2025)
- DeepSeek-V3 — Advances in MoE Load Balancing and Multi-Token Prediction Training - Soumanta Das, Yugen.ai (2025)
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models - Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, et al. (2024)
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - DeepSeek-AI (2025)
- QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training - Wei Dai, Peilin Chen, Chanakya Ekbote, Paul Pu Liang (2025)
- MedCritical: Enhancing Medical Reasoning in Small Language Models via Self-Collaborative Correction - (2025)
- OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data - Patrick Langer, Thomas Kaar, Max Rosenblattl, Maxwell A. Xu, Winnie Chow, et al. (2025)
- The Anatomy of a Personal Health Agent - A. Ali Heydari, Ken Gu, Vidya Srinivas, Hong Yu, et al. (2025)
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer - Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean (2017)
- Deep contextualized word representations (ELMo) - Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018)
- LoRA: Low-Rank Adaptation of Large Language Models - Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, et al. (2021)
- The Llama 3 Herd of Models - Meta AI (2024)
- Qwen2.5 Technical Report - Qwen Team, Alibaba (2024)
- Gemma 3 Technical Report - Gemma Team, Google DeepMind (2025)
- ENCASE: an ENsemble ClASsifiEr for ECG Classification Using Expert Features and Deep Neural Networks - Shenda Hong, Meng Wu, Yuxi Zhou, Qingyun Wang, Junyuan Shang, Hongyan Li, Junqing Xie (2017)
- ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram - Jungwoo Oh, Gyubok Lee, Seongsu Bae, Joon-myoung Kwon, Edward Choi (2023)
- Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data - Yubin Kim, Xuhai Xu, Daniel McDuff, Cynthia Breazeal, Hae Won Park (2024)
- A lightweight deep neural network for personalized detecting ventricular arrhythmias from a single-lead ECG device - Zhejun Sun, Wenrui Zhang, Yuxi Zhou, Shijia Geng, et al. (2025)
- ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis - Yubao Zhao, Jiaju Kang, Tian Zhang, Puyu Han, Tong Chen (2024)
- ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling - William Han, Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao (2024)
- GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images - Xiang Lan, Feng Wu, Kai He, Qinghao Zhao, Shenda Hong, Mengling Feng (2025)
- Signal, Image, or Symbolic: Exploring the Best Input Representation for Electrocardiogram-Language Models Through a Unified Framework - William Han, Chaojing Duan, Zhepeng Cen, Yihang Yao, Xiaoyu Song, Atharva Mhaskar, Dylan Leong, Michael A. Rosenberg, Emerson Liu, Ding Zhao (2025)
- Retrieval-Augmented Generation for Electrocardiogram-Language Models - Xiaoyu Song, William Han, Tony Chen, Chaojing Duan, Michael A. Rosenberg, Emerson Liu, Ding Zhao (2025)
- SensorLM: Learning the Language of Wearable Sensors - Yuwei Zhang, Kumar Ayush, Siyuan Qiao, A. Ali Heydari, et al. (2025)
- LSM-2: Learning from Incomplete Wearable Sensor Data - Maxwell A. Xu, Girish Narayanswamy, Kumar Ayush, Dimitris Spathis, et al. (2025)
- PPGFlowECG: Latent Rectified Flow with Cross-Modal Encoding for PPG-Guided ECG Generation and Cardiovascular Disease Detection - Xiaocheng Fang, Jiarui Jin, Haoyu Wang, Che Liu, Jieyi Cai, Guangkun Nie, Jun Li, Hongyan Li, Shenda Hong (2025)
- MEETI: A Multimodal ECG Dataset from MIMIC-IV-ECG with Signals, Images, Features and Interpretations - Deyun Zhang, Xiang Lan, Shijia Geng, Qinghao Zhao, Sumei Fan, Mengling Feng, Shenda Hong (2025)
- Communication-Efficient Learning of Deep Networks from Decentralized Data (Federated Learning) - H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, Blaise Agüera y Arcas (2016)
- Privacy and Security Challenges in Large Language Models - Vishal Rathod, Seyedsina Nabavirazavi, Samira Zad, Sundararaja Sitharama Iyengar (2025)
- SoK: Security and Privacy Risks of Healthcare AI - Yuanhaur Chang, Han Liu, Chenyang Lu, Ning Zhang (2024)
- Zero-Shot Text-to-Image Generation (DALL-E) - Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever (2021)
- Learning Transferable Visual Models From Natural Language Supervision (CLIP) - Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, et al. (2021)
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation - Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi (2022)
- Sigmoid Loss for Language Image Pre-Training - Xiaohua Zhai, Basil Mustafa, Alexander Kolesnikov, Lucas Beyer (2023)
- Visual Instruction Tuning (LLaVA) - Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee (2023)
- Med-Flamingo: a Multimodal Medical Few-shot Learner - Michael Moor, Qian Huang, Shirley Wu, Michihiro Yasunaga, Cyril Zakka, Yash Dalmia, Eduardo Pontes Reis, Pranav Rajpurkar, Jure Leskovec (2023)
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond - Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren Zhou (2023)
- CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models - Wei Dai, Peilin Chen, Malinda Lu, Daniel Li, Haowen Wei, Hejie Cui, Paul Pu Liang (2025)
- Qwen2.5-VL Technical Report - Qwen Team, Alibaba (2025)
- Introduction to Deep Learning - CMU 11-785, Fall 2025
- Large Language Models: Methods, Analysis, and Applications - CMU 11-667/11-867
- Advanced Natural Language Processing - CMU 11-711, Spring 2025
- Multimodal Machine Learning (YouTube) - CMU 11-777
- How To AI (Almost) Anything - MIT MAS.S60, Spring 2025
- Affective Computing and Multimodal Interaction - MIT MAS.S63, Fall 2025
This project is licensed under the MIT License - see the LICENSE file for details.