Role Summary

We're looking for a Developer Experience Engineer to revolutionize the development experience at Clara by building a self-service platform that automates and simplifies infrastructure, deployments, and operations. You'll be responsible for eliminating friction in the development lifecycle, enabling engineering teams to focus on delivering business value.

Key Responsibilities

Developer Experience & Self-Service Platform

Design and maintain a developer portal (Backstage.io or similar) as the central hub for resource management
Build abstractions and APIs that enable developers to provision resources without manual intervention
Implement self-service workflows for environment creation, configurations, and permissions
Create reusable templates and blueprints for services, repositories, and pipelines

CI/CD & Automation

Design, implement, and optimize highly automated CI/CD pipelines
Reduce build and deployment times through intelligent caching, parallelization, and optimizations
Implement GitOps and continuous deployment with automated rollback capabilities
Automate testing (unit, integration, e2e) in pipelines with clear reporting
Create advanced deployment strategies (blue-green, canary, feature flags)

Ephemeral Environments

Design and implement ephemeral/preview environment solutions for each PR/branch
Automate the complete lifecycle: creation, configuration, and cleanup
Optimize costs through auto-scaling, scheduling, and garbage collection of unused resources
Integrate ephemeral environments with code review and testing workflows

Observability & Alerting

Implement intelligent alerting systems with noise reduction and event correlation
Configure dashboards and SLI/SLO metrics for critical services
Establish automated runbooks and auto-remediation for common incidents
Integrate observability (logs, metrics, traces) into the developer portal

Infrastructure as Code & Security

Maintain and evolve infrastructure as code (Terraform, CloudFormation, etc.)
Implement automated security controls (policy as code, security scanning)
Manage secrets, configurations, and access securely and with full auditability
Apply least privilege and zero-trust principles across all systems

AI/ML Ops Integration

Explore and implement AI tools for resource optimization and failure prediction
Automate operational tasks using ML (anomaly detection, capacity planning, incident classification)
Evaluate and adopt emerging AI Ops tools

Technical Requirements

Must Have

5+ years of experience in DevOps/SRE/Platform Engineering
Mastery of cloud providers (preferably AWS)
Solid experience with Kubernetes and microservices architectures
Expertise in CI/CD tools (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
Proficiency in Infrastructure as Code (Terraform, Pulumi, CloudFormation)
Experience with containers (Docker, Kubernetes, ECS/EKS)
Advanced scripting skills (Python, Bash, Go)
Knowledge of observability tools (Prometheus, Grafana, ELK, Datadog, New Relic)

Nice to Have

Experience with Backstage.io or similar developer portal platforms
FinOps knowledge and cloud cost optimization
Experience in FinTech organizations or highly regulated environments
Familiarity with AI Ops tools (AIOps platforms, ML-based monitoring)
Cloud certifications (AWS Solutions Architect, CKA, etc.)
Experience with service mesh (Istio, Linkerd)
Compliance and security knowledge (PCI-DSS, SOC2)

Key Skills

Automation obsession: If something is done twice, it should be automated
Product mindset: Treat internal platform as a product with "customers" (developers)
Ability to abstract complexity: Make the complex simple for end users
Effective communication: Document clearly, create runbooks, and educate teams
Problem solving: Systems thinking to solve problems at their root
Continuous improvement mindset: Constantly seek ways to optimize and simplify

What You'll Build

Self-Service Portal: Unified interface where developers can:
- Create new services from templates
- Provision ephemeral environments in seconds
- Configure alerts and dashboards with clicks
- Request access and permissions with automated approvals
Intelligent Pipelines: CI/CD that:
- Detects changes and runs only relevant tests
- Auto-deploys to production with quality gates
- Auto-rollback on failures
- Provides instant feedback to developers
On-Demand Environments: System that:
- Creates complete environments per PR in <5 minutes
- Sanitized copy of production data
- Unique URLs for testing and demos
- Automatic cleanup when PR is closed
Proactive Observability: Platform that:
- Alerts only when action is required
- Automatically suggests root causes
- Auto-remediation of known issues
- Customized dashboards per team/service

Developer Experience Engineer (Engenheiro de Experiência do Desenvolvedor AI)