Research Scientist, Foundation Model at Pika

Pika · Research · Palo Alto HQ · FullTime

posted 2026-05-16

Apply on the company site

About the Role   At Pika, we are pioneering the next generation of creative infrastructure built around real-time, multimodal generation and intelligent agentic platforms. We are seeking accomplished Research Scientists in Foundation Models with expertise in pre-training and mid-training large-scale multimodal foundation models to advance our mission of making agentic, real-time generative technology accessible and transformative for millions of creators. This is a staff and lead-level opportunity.   As a key member of our research team, you will design and implement core technologies, develop new methodologies for large-scale multimodal pre-training/mid-training (text, image, audio, and video), and drive innovative approaches for foundational model architecture. You will collaborate closely with engineering and product teams, shaping the future of real-time creative and agentic platforms at scale.   What You’ll Do   - Lead research and development on pre-training and mid-training of multimodal foundation models at scale. - Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interaction across modalities. - Focus on scalable data pipeline curation and model training strategies for broad, diverse, and sensory-rich datasets. - Advance state-of-the-art techniques in diffusion, autoregressive, and other generative models for large-scale pre-training and fine-tuning. - Identify, create, and leverage large, high-quality cross-modal datasets. - Bring research advancements into production-ready systems in collaboration with engineering and product teams. - Publish work in top-tier conferences and journals, and clearly communicate research both internally and externally. - Stay at the forefront of foundational model and real-time multimodal AI research.   What We’re Looking For   - 5+ years of research experience in large-scale pre-training/mid-training of multimodal foundation models (LLMs, VLMs, Audio LMs, or similar), ideally at the staff or lead scientist level. - Track record as a first author on major publications in top conferences or journals (e.g., NeurIPS, ICML, ICLR). - Extensive hands-on experience with large-scale multimodal model design, training, and deployment. - Deep understanding and implementation experience with generative architectures (diffusion, autoregressive, cross-modal, etc.). - Expertise in high-throughput, scalable dataset curation and model pipeline optimization for multimodal applications. - Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.) and experience deploying research into production systems. - Excellent communication and collaboration skills, and a passion for building creative enabling technology.   What We Offer   - Competitive salary and substantial equity in a high-growth startup - Full health benefits + 401k matching and more - Collaborative, mission-driven team environment with major growth opportunities - Flexible on-site/remote hybrid (HQ in Palo Alto, CA)   About Pika   Pika empowers creators by building state-of-the-art agentic and multimedia platforms. Our vision is to break down technical barriers to creativity, making real-time generative and intelligent orchestration accessible to all. Join us and help shape the next evolution of creative technology!   If you are a leading researcher excited to build and scale real-time multimodal foundation models, we want to hear from you.