About the Role

As a Research Engineer on Alignment Science, you'll contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy), often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. You will build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems, with a strong interest in making AI helpful, honest, and harmless, especially in the context of human-level capabilities. This role is ideal for individuals who identify as both a scientist and an engineer.

Representative Projects

Testing the robustness of safety techniques by training language models to subvert them and evaluating intervention effectiveness.
Running multi-agent reinforcement learning experiments to test techniques like AI Debate.
Building tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
Writing scripts and prompts to efficiently produce evaluation questions for testing models’ reasoning abilities in safety-relevant contexts.
Contributing ideas, figures, and writing to research papers, blog posts, and talks.
Running experiments that feed into key AI safety efforts at Anthropic, such as the design and implementation of the Responsible Scaling Policy.

Requirements

You may be a good fit if you:

Have significant software, ML, or research engineering experience.
Have some experience contributing to empirical AI research projects.
Have some familiarity with technical AI safety research.
Prefer fast-moving collaborative projects to extensive solo efforts.
Pick up slack, even if it goes outside your job description.
Care about the impacts of AI.

Strong candidates may also:

Have experience authoring research papers in machine learning, NLP, or AI safety.
Have experience with LLMs.
Have experience with reinforcement learning.
Have experience with Kubernetes clusters and complex shared codebases.

Candidates need not have:

100% of the skills needed to perform the job.
Formal certifications or education credentials.

Research Engineer, Alignment Science