About the role:
Ambient.ai is hiring an Applied Research Scientist to build the next generation of foundation models for computer vision. You will join a team responsible for building multimodal models with state-of-the-art performance on real-world vision benchmarks. In this role, you’ll own full-cycle model development: from pre-training and fine-tuning on image-language data to applying distillation and compression techniques for deployment. This is a hands-on, cross-functional role where your work will directly impact our mission of preventing every security incident possible.
What you'll do:
- Develop & Optimize VLMs: Design and optimize transformer-based vision-language models to understand images, videos, and text, and optimize for real-time inference.
- Pre-training & Fine-tuning: Own the full training pipeline—from pre-training on image-text data to fine-tuning for Ambient.ai’s physical security domain and use cases.
- Model Compression & Optimization: Apply techniques like distillation, quantization, and pruning to reduce model size and latency, enabling efficient edge deployment.
- Leverage Open-Source & Innovate: Use and extend state-of-the-art open-source models. Prototype new architectures and training methods to advance Ambient.ai’s multimodal AI research.
- Cross-Team Collaboration: Work with engineering and product teams to integrate models into the platform. Iterate based on real-world feedback and deployment data to improve performance.
- Research and Experimentation: Stay current with vision, NLP, and multimodal AI research. Design experiments to test new algorithms and continually enhance our core AI systems.
What you'll bring:
- Ph.D. or Master’s in CS, EE, or related field, with a strong foundation in AI/ML (Ph.D. preferred or Master’s with strong experience)
- Proficient in Python/C++ and deep learning frameworks like PyTorch or TensorFlow. Comfortable with large-scale training pipelines
- Hands-on experience with CNNs, Transformers, and Vision Transformers (ViT). Strong understanding of vision-language models and how to fine-tune or adapt them
- Proven skills in model training and optimization, including fine-tuning on large datasets and applying distillation, quantization, or similar techniques. Experience with foundation or multimodal models is a plus.
- Strong problem-solving ability: quick prototyping, diagnosing failure cases, and iterating on solutions
- Startup experience preferred: Comfortable with ambiguity, fast iteration, and owning projects end-to-end