About Zscaler
Zscaler accelerates digital transformation, ensuring customers are more agile, efficient, resilient, and secure. As an AI-forward enterprise, Zscaler leverages the world’s largest security data lake to power its cloud-native Zero Trust Exchange platform, protecting customers from cyberattacks and data loss by securely connecting users, devices, and applications.
Role
Zscaler is seeking a Sr. Staff Software Engineer to join the Zscaler Digital Experience (Core Intelligence and Data) team. This is a hybrid role based in San Jose, CA, requiring 3 days a week in the office. The engineer will report to the Sr. Manager, Software Engineering and contribute to building and enhancing the world’s largest cloud security platform, increasing its global footprint, and enabling organizations to leverage speed and agility through a cloud-first strategy and multi-tenant architecture.
What you’ll do (Role Expectations)
- Own the agentic troubleshooting framework, including framing high-impact use cases, designing workflows and playbooks, and building processes for all products.
- Evaluate and integrate state-of-the-art GenAI advances to deliver reliable and cost-efficient production features, utilizing LLMs, various machine learning models, data processing, fine-tuning, and inference optimization.
- Work with the world-class cloud platform and data lakes for feature exploration and generation.
- Handle high-volume data with real-time pipelines for data processing and aggregation.
- Design, implement, and operate scalable production systems, with a specific focus on microservices, data pipelines, orchestration, and caching.
What We’re Looking for (Minimum Qualifications)
- BS in Computer Science with 8+ years of experience, or MS/PhD with 5+ years of experience solving real-world problems using AI/ML and distributed systems.
- Proficiency in programming, data structures, algorithms, and machine learning, with exceptional problem-solving skills driven by first-principles thinking.
- Hands-on experience with AI modeling, including feature generation, prompt engineering, evaluations, and productionization.
- Experience in the full lifecycle of ML models, including building, deployment, monitoring, and optimization.
- Expertise in designing and operating distributed microservices using tools like Kubernetes and Docker, and writing production-grade code in Python, Go, or Java.
What Will Make You Stand Out (Preferred Qualifications)
- Experience fine-tuning and deploying proprietary SLMs/LLMs at scale, with a focus on optimizing latency, cost, safety, and evaluations.
- Experience delivering production-ready AI systems, including expertise in anomaly detection, event correlation, and incident investigation.
- Proven ability to design and implement high-performance, resilient systems with well-defined service-level objectives.