Mengyi Shan 单梦伊

Mengyi Shan

I am a PhD student at the Paul G Allen School of Computer Science in University of Washington where I am affiliated to the Graphics and Imaging Lab (GRAIL) and the Reality Lab. I am co-advised by Steve Seitz, Brian Curless and Ira Kemelmacher-Shlizerman.

I studied at Harvey Mudd College in Claremont, California with double majors in Computer Science and Mathematics and a concentration in linguistics. I was advised by TJ Tsai in the Music Information Retrieval Lab.

Before college, I grew up in Beijing China. I spent seven years at the High School Affiliated to Renmin Univeristy of China (RDFZ), in the first class of Early Development Program (EDP).

Email / CV / Google Scholar / Twitter / Github

News

March 2025: I will be joining Google AR/VR as a Student Researcher starting this summer.
July 2024: Our paper Multi-Person Motion Synthesis was accepted to ECCV 2024!
February 2024: I will be joining Meta GenAI as a Research Scientist Intern, working on Movie Gen.
February 2024: Our paper on animal motion generation OmniMotionGPT was accepted to CVPR 2024!
August 2023: Our paper Animating Street View was accepted to SIGGRAPH Asia 2023!
March 2023: I will be joining OPPO US Research Center as a Research Scientist Intern, working on human-centric AIGC.

Research

I work on generating creative contents with modern technologies in vision and graphics.

	GenEscape: Hierarchical Multi-Agent Generation of Escape Room Puzzles Mengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz Preprint, 2025 arXiv We challenge text-to-image models with generating escape room puzzle images that are visually appealing, logically solid, and intellectually stimulating.
	Populate-A-Scene: Affordance-Aware Human Video Generation Mengyi Shan, Zecheng He, Haoyu Ma, Felix Juefei-Xu, Peizhao Zhang, Tingbo Hou, Ching-Yao Chuang Preprint, 2025. ICCV Review 4/5/6 out of 6 project page / paper We repurpose a text-to-video generation model as a human-world interaction simulator.
	AMG: Avatar Motion Guided Video Generation Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang Preprint, 2024 arXiv / code / data We combine the 2D photorealism and 3D controllability in human video generation by conditioning video diffusion models on controlled rendering of 3D avatars.
	Towards Open Domain Text-Driven Synthesis of Multi-Person Motions Mengyi Shan, Lu Dong, Yutao Han, Yuan Yao, Tao Liu, Ifeoma Nwogu, Guo-Jun Qi, Mitch Hill ECCV, 2024 project page / arXiv / data / supp / code coming soon We build datasets with multi-person pose and motions, and jointly train to generate natural and diverse group motions of multiple humans from textual descriptions.
	OmniMotionGPT: Animal Motion Generation with Limited Data Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang CVPR, 2024 project page / arXiv / code We generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset.
	Animating Street View Mengyi Shan, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz, SIGGRAPH Asia, 2023 project page / arXiv / video We present a system that automatically brings street view imagery to life by populating it with naturally behaving, animated pedestrians and vehicles.
	StyleSDF: High-Resolution 3D-Consistent Image and Geometry Generation Roy Or-El, Xuan Luo, Mengyi Shan, Eli Shechtman, Jeong Joon Park, Ira Kemelmacher-Shlizerman CVPR, 2022 project page / arXiv / code We introduce a high resolution, 3D-consistent image and shape generation technique trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation.
	Automatic Generation of Piano Score Following Videos Mengyi Shan, TJ Tsai TISMIR, 2021 project page / arXiv / code We build a system that generates piano score following videos from an audio recording in a fully automated manner.
	Improved Handling of Repeats and Jumps in Audio-Sheet Image Synchronization Mengyi Shan, TJ Tsai ISMIR, 2020 project page / arXiv / code We study the problem of automatically generating piano score following videos given an audio recording and raw sheet music images.