Worked over the summer as a research intern on the multimodal (video generation) team at Character.ai, a Silicon Valley startup founded by Noam Shazeer. Developed an end-to-end data preprocessing pipeline using Apache Spark, hundreds of H100 GPUs, and open-source models to curate a dataset of millions of videos and hundreds of thousands of hours of footage—reducing runtime by more than 4x. Joined former ByteDance researcher Weimin Wang to train a 13B-parameter, Stable Diffusion–style model that generates high-quality, audio-synced videos using distributed learning techniques and advanced PyTorch code, leading to both a full-time offer, and a part-time offer accepted during university. Co-authored and developed an open-source, STOTA video and audio generation model rivaling Veo3, currently under consideration for ICLR 2026. Ovi has garnered significant traction and our code is open-sourced at
https://github.com/character-ai/Ovi.