Job Experience
3+ years of experience in Site Reliability Engineering or Application Operations
Required Skills
2–5 years of experience in Site Reliability Engineering or Application Operations.
Solid understanding of Java, Springboot and microservices architecture.
Proficiency in monitoring and observability tools (Prometheus, Grafana, Loki, New Relic, or equivalent).
Familiarity with Kubernetes, containers, and CI/CD pipelines.
Familiarity with incident management, RCA, and performance debugging.
Experience with cloud platforms (AWS, Azure, or GCP).
Strong scripting skills (Bash, Python, or Go) for automation and diagnostics.
Good communication and stakeholder collaboration skills.
Experience with modern observability tools
Familiarity with ITSM or ticketing tools (Jira, ServiceNow) for issue tracking.
Knowledge of security and compliance in production environments.
Hands-on experience with JVM performance tuning and runtime diagnostics to improve Java service performance.
Exposure to using AI or LLM-based tools for alert correlation, log analysis, root-cause detection, or automated diagnostics.