logo

Ambient.ai

Applied Research Scientist - Foundation Models

Department
Engineering
Job Type / Location
Redwood City
Experience Required
3+ years
Posted On

About the role:

Ambient.ai is hiring an Applied Research Scientist to build the next generation of foundation models for computer vision. You will join a team responsible for building multimodal models with state-of-the-art performance on real-world vision benchmarks. In this role, you’ll own full-cycle model development: from pre-training and fine-tuning on image-language data to applying distillation and compression techniques for deployment. This is a hands-on, cross-functional role where your work will directly impact our mission of preventing every security incident possible.

What you'll do:

  • Develop & Optimize VLMs: Design and optimize transformer-based vision-language models to understand images, videos, and text, and optimize for real-time inference.
  • Pre-training & Fine-tuning: Own the full training pipeline—from pre-training on image-text data to fine-tuning for Ambient.ai’s physical security domain and use cases.
  • Model Compression & Optimization: Apply techniques like distillation, quantization, and pruning to reduce model size and latency, enabling efficient edge deployment.
  • Leverage Open-Source & Innovate: Use and extend state-of-the-art open-source models. Prototype new architectures and training methods to advance Ambient.ai’s multimodal AI research.
  • Cross-Team Collaboration: Work with engineering and product teams to integrate models into the platform. Iterate based on real-world feedback and deployment data to improve performance.
  • Research and Experimentation: Stay current with vision, NLP, and multimodal AI research. Design experiments to test new algorithms and continually enhance our core AI systems.

What you'll bring:

  • Ph.D. or Master’s in CS, EE, or related field, with a strong foundation in AI/ML (Ph.D. preferred or Master’s with strong experience)
  • Proficient in Python/C++ and deep learning frameworks like PyTorch or TensorFlow. Comfortable with large-scale training pipelines
  • Hands-on experience with CNNs, Transformers, and Vision Transformers (ViT). Strong understanding of vision-language models and how to fine-tune or adapt them
  • Proven skills in model training and optimization, including fine-tuning on large datasets and applying distillation, quantization, or similar techniques. Experience with foundation or multimodal models is a plus.
  • Strong problem-solving ability: quick prototyping, diagnosing failure cases, and iterating on solutions
  • Startup experience preferred: Comfortable with ambiguity, fast iteration, and owning projects end-to-end

View Assessment Process

Think you'll be a good fit?