Key Responsibilities

Design and evaluate autonomous AI agents across multiple LLMs for real-world domains like health, education, and daily life
Develop evaluation rubrics with objective pass/fail criteria for AI agent performance
Debug agent traces to identify failure patterns and stress test agents against edge cases
Assess production-grade modular software architecture for multi-turn system interactions
Provide high-density technical feedback for training Large Language Models (LLMs)
Handle multi-turn system interactions and ensure robust tool integration

Requirements

Experience in backend engineering, AI automation, or complex systems integration
Strong command of at least two major languages (e.g., Python, JavaScript, Go, or Java) and SQL databases
Proven ability to build and maintain production-grade software with modular separation
Practical experience with live, non-mocked environments and multi-turn system interactions
Familiarity with persistent state, session-tracking patterns, and security vulnerabilities like prompt injection

AI Automation Engineer - Hire Feed

View Assessment Process