About the Role
As a Research Engineer on Alignment Science, you'll contribute to exploratory experimental research on AI safety, with a focus on risks from powerful future systems (like those we would designate as ASL-3 or ASL-4 under our Responsible Scaling Policy), often in collaboration with other teams including Interpretability, Fine-Tuning, and the Frontier Red Team. You will build and run elegant and thorough machine learning experiments to help us understand and steer the behavior of powerful AI systems, with a strong interest in making AI helpful, honest, and harmless, especially in the context of human-level capabilities. This role is ideal for individuals who identify as both a scientist and an engineer.
Representative Projects
- Testing the robustness of safety techniques by training language models to subvert them and evaluating intervention effectiveness.
- Running multi-agent reinforcement learning experiments to test techniques like AI Debate.
- Building tooling to efficiently evaluate the effectiveness of novel LLM-generated jailbreaks.
- Writing scripts and prompts to efficiently produce evaluation questions for testing models’ reasoning abilities in safety-relevant contexts.
- Contributing ideas, figures, and writing to research papers, blog posts, and talks.
- Running experiments that feed into key AI safety efforts at Anthropic, such as the design and implementation of the Responsible Scaling Policy.
Requirements
You may be a good fit if you:
- Have significant software, ML, or research engineering experience.
- Have some experience contributing to empirical AI research projects.
- Have some familiarity with technical AI safety research.
- Prefer fast-moving collaborative projects to extensive solo efforts.
- Pick up slack, even if it goes outside your job description.
- Care about the impacts of AI.
Strong candidates may also:
- Have experience authoring research papers in machine learning, NLP, or AI safety.
- Have experience with LLMs.
- Have experience with reinforcement learning.
- Have experience with Kubernetes clusters and complex shared codebases.
Candidates need not have:
- 100% of the skills needed to perform the job.
- Formal certifications or education credentials.