I studied at Harvey Mudd College in Claremont, California with double majors in Computer Science and Mathematics and a concentration in linguistics.
I was advised by TJ Tsai in the Music Information Retrieval Lab.
Before college, I grew up in Beijing China. I spent seven years at the High School Affiliated to Renmin Univeristy of China (RDFZ), in the first class of Early Development Program (EDP).
We combine the 2D photorealism and 3D controllability in human video generation by conditioning video diffusion models on controlled rendering of 3D avatars.
We build datasets with multi-person pose and motions, and jointly train to generate natural and diverse group motions of multiple humans from textual descriptions.
We introduce a high resolution, 3D-consistent image and shape generation technique trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation.