About the Role
At Pika, we are pioneering the next generation of creative infrastructure built around real-time, multimodal generation and intelligent agentic platforms. We are seeking accomplished Research Scientists in Foundation Models with expertise in pre-training and mid-training large-scale multimodal foundation models to advance our mission of making agentic, real-time generative technology accessible and transformative for millions of creators. This is a staff and lead-level opportunity.
As a key member of our research team, you will design and implement core technologies, develop new methodologies for large-scale multimodal pre-training/mid-training (text, image, audio, and video), and drive innovative approaches for foundational model architecture. You will collaborate closely with engineering and product teams, shaping the future of real-time creative and agentic platforms at scale.
What You’ll Do
- Lead research and development on pre-training and mid-training of multimodal foundation models at scale.
- Design and prototype novel algorithms and architectures for high-fidelity, real-time multimodal synthesis and interaction across modalities.
- Focus on scalable data pipeline curation and model training strategies for broad, diverse, and sensory-rich datasets.
- Advance state-of-the-art techniques in diffusion, autoregressive, and other generative models for large-scale pre-training and fine-tuning.
- Identify, create, and leverage large, high-quality cross-modal datasets.
- Bring research advancements into production-ready systems in collaboration with engineering and product teams.
- Publish work in top-tier conferences and journals, and clearly communicate research both internally and externally.
- Stay at the forefront of foundational model and real-time multimodal AI research.
What We’re Looking For
- 5+ years of research experience in large-scale pre-training/mid-training of multimodal foundation models (LLMs, VLMs, Audio LMs, or similar), ideally at the staff or lead scientist level.
- Track record as a first author on major publications in top conferences or journals (e.g., NeurIPS, ICML, ICLR).
- Extensive hands-on experience with large-scale multimodal model design, training, and deployment.
- Deep understanding and implementation experience with generative architectures (diffusion, autoregressive, cross-modal, etc.).
- Expertise in high-throughput, scalable dataset curation and model pipeline optimization for multimodal applications.
- Strong programming and prototyping skills (Python, PyTorch, TensorFlow, etc.) and experience deploying research into production systems.
- Excellent communication and collaboration skills, and a passion for building creative enabling technology.
What We Offer
- Competitive salary and substantial equity in a high-growth startup
- Full health benefits + 401k matching and more
- Collaborative, mission-driven team environment with major growth opportunities
- Flexible on-site/remote hybrid (HQ in Palo Alto, CA)
About Pika
Pika empowers creators by building state-of-the-art agentic and multimedia platforms. Our vision is to break down technical barriers to creativity, making real-time generative and intelligent orchestration accessible to all. Join us and help shape the next evolution of creative technology!
If you are a leading researcher excited to build and scale real-time multimodal foundation models, we want to hear from you.