Homepage - Yiran Qin

Yiran Qin

CUHK-Shenzhen

Oxford University

Shanghai AI Laboratory

Hi, I'm Yiran, a forth-year Ph.D. student at The Chinese University of Hong Kong, Shenzhen, advised by Prof. Ruimao Zhang. Currently, I am a visiting Ph.D. student at TVG in Oxford University, advised by Prof. Philip Torr. I also work as a research intern at Shanghai AI Laboratory, advised Prof. Lei Bai. I am honored to collaborate with Prof. Xihui Liu, Dr. Xintao Wang and my friend Jiwen Yu.

Conference Reviewer for ICLR (2025), CVPR (2024, 2025), ICCV (2025), NeurIPS (2025), ICML(2025), CORL(2025), ICRA(2024,2025), IROS(2025), WACV (2025). Workshop Challenge Organizer for MFM-EAI in ICML 2024.

My goal is to address real-world problems by translating cutting-edge research into practical solutions:

Robot Manipulation, Navigation and Collaborative Simulation (Imitation Learning, Reinforcement Learning)
Building Real-World Embodied Society with Agents (Spatio-temporal Intelligence, Robotic Planning)
Video generation models as World Simulators (Physics-compliance, Memory Consistency)

I am always open to academic and industrial collaborations, if you share the vision, please do not hesitate to contact me!

Research Framework

yiranqin(at)link.cuhk.edu.cn GitHub WeChat LinkedIn

Education

Oxford University

Visiting Ph.D. Student, advised by Prof. Philip Torr

Jun. 2025 - present
The Chinese University of Hong Kong, Shenzhen

Ph.D. Student, advised by Prof. Ruimao Zhang

Sep. 2021 - present
The University of Hong Kong

Visiting Ph.D. Student, advised by Prof. Xihui Liu

Mar. 2024 - Jul. 2025
Shandong University

B.S. in Computer Science

Sep. 2017 - Jul. 2021

Experience

Shanghai AI Laboratory

Research Intern, advised by Dr. Lei Bai

Apr. 2025 - present
Kuaishou Kling

Research Intern, advised by Dr. Xintao Wang

Oct. 2024 - Apr. 2025
Shanghai AI Laboratory

Research Intern, advised by Dr. Jing Shao

Jun. 2023 - Oct. 2024
NIO

Research Intern, advised by Dr. Ningning Ma

Dec. 2021 - Jun. 2023

News

2025

VIKI-R and GauDP are accepted by ICCV 2025, see you in San Diego, USA!

Sep 19

CDP is accepted by CoRL 2025, see you in Seoul, Korea!

Jul 20

RoboFactory and GameFactory are accepted by ICCV 2025, see you in Honolulu, Hawaii!

Jun 19

Start as a visiting Ph.D. at TVG in Oxford University.

Jun 05

WorldSimBench is accepted by ICML 2025.

May 01

Selected Publications (view all )

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

Jiahua Ma*, Yiran Qin*^†, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang^#(* equal contribution, ^# corresponding author, ^† project lead)

Conference on Robot Learning (CoRL) 2025

[Paper] [Project Page]

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

Jiahua Ma*, Yiran Qin*^†, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang^#(* equal contribution, ^# corresponding author, ^† project lead)

Conference on Robot Learning (CoRL) 2025

[Paper] [Project Page]

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin#, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai^#, Zhenfei Yin^#(* equal contribution, ^# corresponding author)

Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

[Paper] [Project Page] [Code] [Dataset]

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Li Kang*, Xiufeng Song*, Heng Zhou*, Yiran Qin#, Jie Yang, Xiaohong Liu, Philip Torr, Lei Bai^#, Zhenfei Yin^#(* equal contribution, ^# corresponding author)

Annual Conference on Neural Information Processing Systems (NeurIPS) 2025

[Paper] [Project Page] [Code] [Dataset]

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

Jiwen Yu, Jianhong Bai, Yiran Qin, Quande Liu^#, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu^#(^# corresponding author)

SIGGRAPH Asia 2025

[Paper]

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

Jiwen Yu, Jianhong Bai, Yiran Qin, Quande Liu^#, Xintao Wang, Pengfei Wan, Di Zhang, Xihui Liu^#(^# corresponding author)

SIGGRAPH Asia 2025

[Paper]

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin^#, Xiaohong Liu, Xihui Liu, Ruimao Zhang^#, Lei Bai^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

[Paper] [Project Page] [Code] [Dataset]

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Yiran Qin*, Li Kang*, Xiufeng Song*, Zhenfei Yin^#, Xiaohong Liu, Xihui Liu, Ruimao Zhang^#, Lei Bai^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2025 Best Paper Award at CVPR 2025 MEIS Workshop

[Paper] [Project Page] [Code] [Dataset]

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu*, Yiran Qin*, Xintao Wang^#, Pengfei Wan, Di Zhang, Xihui Liu^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2025 Highlight

[Paper] [Project Page] [Code] [Dataset]

GameFactory: Creating New Games with Generative Interactive Videos

Jiwen Yu*, Yiran Qin*, Xintao Wang^#, Pengfei Wan, Di Zhang, Xihui Liu^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2025 Highlight

[Paper] [Project Page] [Code] [Dataset]

Interactive Generative Video as Next-Generation Game Engine

Jiwen Yu*, Yiran Qin*, Haoxuan Che, Quande Liu, Xintao Wang^#, Pengfei Wan, Di Zhang, Xihui Liu^#(* equal contribution, ^# corresponding author)

ArXiv Preprint

[Paper]

Interactive Generative Video as Next-Generation Game Engine

Jiwen Yu*, Yiran Qin*, Haoxuan Che, Quande Liu, Xintao Wang^#, Pengfei Wan, Di Zhang, Xihui Liu^#(* equal contribution, ^# corresponding author)

ArXiv Preprint

[Paper]

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao^#, Lei Bai^#, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Machine Learning (ICML) 2025 Oral at CVPR 2025 WorldModelBench Workshop

[Paper] [Project Page]

WorldSimBench: Towards Video Generation Models as World Simulators

Yiran Qin*, Zhelun Shi*, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao^#, Lei Bai^#, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Machine Learning (ICML) 2025 Oral at CVPR 2025 WorldModelBench Workshop

[Paper] [Project Page]

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

Yiran Qin*, Ao Sun*, Yuze Hong, Benyou Wang, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Robotics and Automation (ICRA) 2025

[Paper] [Project Page]

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

Yiran Qin*, Ao Sun*, Yuze Hong, Benyou Wang, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Robotics and Automation (ICRA) 2025

[Paper] [Project Page]

Minedreamer: Learning to follow instructions via chain-of-imagination for simulated-world control

Enshen Zhou*, Yiran Qin*, Zhenfei Yin^†, Yuzhou Huang, Ruimao Zhang^#, Lu Sheng^#, Yu Qiao, Jing Shao(* equal contribution, ^# corresponding author, ^† project lead)

International Conference on Intelligent Robots and Systems (IROS) 2025

[Paper] [Project Page] [Code]

Minedreamer: Learning to follow instructions via chain-of-imagination for simulated-world control

Enshen Zhou*, Yiran Qin*, Zhenfei Yin^†, Yuzhou Huang, Ruimao Zhang^#, Lu Sheng^#, Yu Qiao, Jing Shao(* equal contribution, ^# corresponding author, ^† project lead)

International Conference on Intelligent Robots and Systems (IROS) 2025

[Paper] [Project Page] [Code]

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng^#, Ruimao Zhang^#, Yu Qiao, Jing Shao^†(* equal contribution, ^# corresponding author, ^† project lead)

Conference on Computer Vision and Pattern Recognition (CVPR) 2024

[Paper] [Project Page] [Code] [Dataset] [Video]

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Yiran Qin*, Enshen Zhou*, Qichang Liu*, Zhenfei Yin, Lu Sheng^#, Ruimao Zhang^#, Yu Qiao, Jing Shao^†(* equal contribution, ^# corresponding author, ^† project lead)

Conference on Computer Vision and Pattern Recognition (CVPR) 2024

[Paper] [Project Page] [Code] [Dataset] [Video]

SupFusion: Supervised LiDAR-camera fusion for 3D object detection

Yiran Qin*, Chaoqun Wang*, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2023

[Paper] [Code]

SupFusion: Supervised LiDAR-camera fusion for 3D object detection

Yiran Qin*, Chaoqun Wang*, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang^#(* equal contribution, ^# corresponding author)

International Conference on Computer Vision (ICCV) 2023

[Paper] [Code]

Research Framework

Warning

Action required

Education

Experience

News

Selected Publications (view all )

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

Context as Memory: Scene-Consistent Interactive Long Video Generation with Memory Retrieval

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

GameFactory: Creating New Games with Generative Interactive Videos

GameFactory: Creating New Games with Generative Interactive Videos

Interactive Generative Video as Next-Generation Game Engine

Interactive Generative Video as Next-Generation Game Engine

WorldSimBench: Towards Video Generation Models as World Simulators

WorldSimBench: Towards Video Generation Models as World Simulators

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants

Minedreamer: Learning to follow instructions via chain-of-imagination for simulated-world control

Minedreamer: Learning to follow instructions via chain-of-imagination for simulated-world control

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

Mp5: A multi-modal open-ended embodied system in minecraft via active perception

SupFusion: Supervised LiDAR-camera fusion for 3D object detection

SupFusion: Supervised LiDAR-camera fusion for 3D object detection

All publications