I am a researcher at NVIDIA GEAR, working under the guidence of Dr. Jim Fan and Prof. Yuke Zhu. My research sits at the intersection of Multimodal Large Language Models and Robot Learning, with a specific focus on building foundation models for robotic perception and manipulation.
Previously, I was a researcher at SenseTime, where I worked on Vision-Language-Action models across pre-training, post-training, and their applications in multi-view generalization and human-robot interaction. I also contributed to video understanding, particularly agent-based reinforcement learning for long-horizon video reasoning.
I received my Ph.D. from the Institute of Automation, Chinese Academy of Sciences in 2024. My doctoral research focused on robotic dexterous grasping and humanβrobot interaction.
News
- 2025.11: One paper got accepted by AAAI 2026.
- 2025.06: One paper got accepted by ICCV 2025.
- 2024.11: One paper got accepted by IEEE Transactions on Robotics (T-RO).
- 2024.06: One paper got accepted by IEEE Transactions on Robotics (T-RO).
- 2024.05: Finished my Ph.D. final defense! What an unforgettable journey!
- 2024.01: One paper got accepted by ICRA 2024.
- 2023.01: Started internship at NIO Autonomous Driving.
Publications
Please visit my Google Scholar page for full publications. Hover a tile to read.
Education
- 2021 - 2024, Ph.D., Institute of Automation, Chinese Academy of Sciences, Beijing, China.
- 2019 - 2021, M.Sc., University of Pittsburgh, Pittsburgh, Pennsylvania, U.S.
Experience
- 2026.01 - Now, Researcher, NVIDIA GEAR.
- 2024.07 - 2025.09, Senior Researcher, SenseTime Research.
- 2023.01 - 2023.06, Algorithm Engineer Intern, NIO Autonomous Driving.