logo

ACR Technology Inc

LLM Evaluation Engineer / Generative AI Quality Engineer - W2 Only

Department
Engineering
Job Type / Location
Sunnyvale, California
Experience Required
7+ years
Posted On

Key Responsibilities

  • Design and develop automated evaluation pipelines for LLM and agentic AI systems
  • Create evaluation scenarios and adversarial test datasets to identify model edge cases and bias
  • Assess AI outputs using metrics such as task success rate, semantic similarity, and sentiment analysis
  • Analyze and debug agent reasoning, tool usage, and action sequences to identify failure points
  • Develop advanced prompt engineering strategies to test reasoning, planning, and instruction adherence
  • Build and maintain custom evaluation frameworks using Python

Requirements

  • 6+ years of hands-on experience in AI model evaluation and debugging
  • Strong proficiency in Python, automation scripts, and evaluation frameworks
  • Deep understanding of large language model behaviors, failure modes, and quality validation techniques
  • Experience with AI failure mode analysis (hallucination, incoherence, jailbreaking)
  • Familiarity with automated testing frameworks (Pytest) and evaluation metrics

View Assessment Process

Think you'll be a good fit?