I am Huajie Tan (谭桦杰), a third-year M.S. student at the School of Computer Science, Peking University, advised by Prof. Shanghang Zhang. Previously, I received my dual-degree B.Eng. from Tianjin University (College of Intelligence and Computing & College of Microelectronics) and was honored with the Outstanding Graduate Award.

My research focuses on embodied AI and multi-modal foundation models. I am currently an intern at the Beijing Academy of Artificial Intelligence (BAAI), exploring pathways toward general-purpose robotic intelligence and real-world deployment of embodied systems. I am also open to collaborative opportunities and research partnerships, feel free to email me: tanhuajie@stu.pku.edu.cn.

Meanwhile, I am now seeking entrepreneurship opportunities. I have received multiple top-tier offers from leading industry labs, e.g., Huawei Top Minds (华为天才少年), Tencent QingYun (腾讯青云), JD Tech Genius Team (京东TGT), BAAI Star (智源智星) and Xiaomi Top Talent (小米顶尖). If you have a better fit or would like to connect, please also feel free to reach out.

🔥 News

2026.02: 🎉 Both Robo-Dopamine and Action-Sketcher get accepted to CVPR 2026. See you in Denver, Colorado, USA!
2026.01: 🔥 Released RoboBrain 2.5, our more powerful embodied foundation model (first author, project lead).
2026.01: 🎯 Released our new research paper Action-Sketcher (first author, project lead).
2025.12: 🎯 Released our new research paper Robo-Dopamine (first author, project lead).
2025.10: 🔥 Released LLaVA-OneVision-1.5, a fully open framework for democratized MLLM training.
2025.10: 🔥 Released more advanced RoboOS-NeXT (first author, project lead).
2025.09: 🎉 Reason-RFT accepted to NeurIPS 2025. See you in San Diego, USA!
2025.06: 🎉 Released RoboBrain 2.0 and RoboOS in BAAI Conference 2025 (first author, project lead).
2025.04: 🌍 RoboBrain 1.0 selected for CVPR 2025’s official Embodied AI Trends Commentary.
2025.02: 🎉 RoboBrain 1.0 accepted to CVPR 2025.

📝 Publications

Technical Report 2026

RoboBrain 2.5: Depth in Sight, Time in Mind.

BAAI RoboBrain Team

First Author, Project Lead, Technical Report 2026

Project | Paper | Code | Checkpoints

TL;DR: RoboBrain 2.5 is a next-generation embodied AI foundation model designed for complex robotic manipulation, driven by two major upgrades: (1) precise 3D spatial reasoning that generates depth-aware, physically-constrained manipulation traces, and (2) dense temporal value estimation that provides step-by-step execution feedback. Together, these advancements enable a more physically grounded and execution-aware intelligence for downstream learning tasks.

Technical Report 2025

RoboBrain 2.0: See Better. Think Harder. Do Smarter.

BAAI RoboBrain Team

First Author, Core Contributor, Technical Report 2025

Project | Paper | Code | Checkpoints

TL;DR: RoboBrain 2.0 is the previously most powerful embodied brain model designed to unify perception, reasoning, and planning for complex physical tasks. Achieving top performance across diverse spatial and temporal benchmarks, it excels in critical real world capabilities including spatial understanding, trajectory forecasting, and multiple agent long horizon planning, marking a significant step toward developing generalist embodied agents.

Technical Report 2025

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

LLaVA-OneVision Community Contributors

Core Contributor, Technical Report 2025

Paper | Code | Datasets | Checkpoints

TL;DR: LLaVA-OneVision-1.5 is an open and highly efficient family of Large Multimodal Models that achieves state of the art performance with significantly reduced training costs. Powered by massive curated datasets and an optimized training framework developed under a $16,000 budget, the models consistently outperform competitors like Qwen2.5-VL across numerous benchmarks.

CVPR 2026

Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

Huajie Tan*, Sixiang Chen*, Yijie Xu*, Zixiao Wang, Yuheng Ji, Cheng Chi, Yaoxu Lyu, Zhongxia Zhao, Xiansheng Chen, Peterson Co, Shaoxuan Xie, Guocai Yao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

First Author, Project Lead, CVPR 2026

Project | Paper | Code | Checkpoints

TL;DR: Joy is dopamine’s handiwork—whether in humans or in robotics.

CVPR 2026

Action-Sketcher: From Reasoning to Action via Visual Sketches for Long-Horizon Robotic Manipulation

Huajie Tan*, Peterson Co*, Yijie Xu*, Shanyu Rong, Yuheng Ji, Cheng Chi, Xiansheng Chen, Qiongyu Zhang, Zhongxia Zhao, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

First Author, Project Lead, CVPR 2026

Project | Paper | Code

TL;DR: Make intent visible. Make action reliable.

NeurIPS 2025

Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models

Huajie Tan*, Yuheng Ji*, Xiaoshuai Hao*, Xiansheng Chen, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

First Author, NeurIPS 2025

Project | Paper | Code | Datasets | Checkpoints

TL;DR: Reason-RFT is a pioneering two-stage RFT framework that enhances visual reasoning in VLMs, delivering superior performance, data efficiency, and robust generalization under domain shifts across various complex tasks.

CVPR 2025

RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Yuheng Ji*, Huajie Tan*, Jiayu Shi*, Xiaoshuai Hao*, Yuan Zhang, Hengyuan Zhang, Pengwei Wang, Mengdi Zhao, Yao Mu, Pengju An, Xinda Xue, Qinghang Su, Huaihai Lyu, Xiaolong Zheng, Jiaming Liu, Zhongyuan Wang, Shanghang Zhang

First Author, CVPR 2025

Project | Paper | Code | Datasets | Checkpoints

TL;DR: RoboBrain is the first unified embodied brain model, equiped with planning capability, affordance perception and trajectory prediction.

ArXiv 2025

RoboOS-NeXT: A Unified Memory-based Framework for Lifelong, Scalable, and Robust Multi-Robot Collaboration

Huajie Tan*, Cheng Chi*, Xiansheng Chen*, Yuheng Ji*, Zhongxia Zhao, Xiaoshuai Hao, Yaoxu Lyu, Mingyu Cao, Junkai Zhao, Huaihai Lyu, Enshen Zhou, Ning Chen, Yankai Fu, Cheng Peng, Wei Guo, Dong Liang, Zhuo Chen, Mengsi Lyu, Chenrui He, Yulong Ao, Yonghua Lin, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang

First Author, Project Lead, ArXiv 2025

Project | Paper | Code

TL;DR: RoboOS-NeXT is a unified memory-based framework for lifelong, scalable, and robust multi-robot collaboration.

📖 Educations

2023.09 - 2026.06, Master, School of Computer Science, Peking University, Beijing.
2019.09 - 2023.06, Undergraduate, College of Intelligence and Computing & School of Microelectronics, Tianjin University, Tianjin.

💻 Internships

2024.12 - now, Beijing Academy of Artificial Intelligence (BAAI), Beijing.
2024.05 - 2024.11, DeepGlint, Beijing.

👥 Services

Served as reviewer for CVPR, ICCV, NeurIPS, ICRA
Active open-source contributor to huggingface/transformers, huggingface/smolagents
Teaching Assistant, Fundamentals of Artificial Intelligence (Instructor: Prof. Shanghang Zhang)