logo

Xsolla

Principal AI/ML Engineer, Platform Engineering

Department
Engineering
Job Type / Location
Kuala Lumpur
Experience Required
8+ years
Posted On

About the Role

As a Principal AI/ML Engineer in Platform Engineering, you will be a key contributor to Xsolla's infrastructure innovation. This role involves designing and implementing advanced AI/ML solutions to optimize infrastructure operations, enhance security, and improve developer productivity across multi-cloud environments, with a strong focus on GCP.

Responsibilities

  • Design and implement AI/ML-powered solutions for infrastructure use cases, including predictive autoscaling, anomaly detection, intelligent cost optimization, and automated remediation across GCP and multi-cloud environments.
  • Build and maintain AI-driven monitoring and observability systems that correlate logs, metrics, and traces to surface root causes, predict bottlenecks, and reduce mean time to resolution (MTTR).
  • Develop and operate automated incident response workflows using AI-powered playbooks that diagnose, contain, and resolve infrastructure issues with minimal manual intervention.
  • Integrate AI tooling into CI/CD pipelines to improve deployment reliability, automate test prediction, score release health, and support rollback automation.
  • Contribute to the development of internal AI agents and virtual assistants integrated into developer workflows (Slack, IDEs, Confluence) — enabling self-service for provisioning, troubleshooting, and infrastructure guidance.
  • Implement AI/ML-based anomaly detection and automated vulnerability management workflows to enhance the security posture of Xsolla's infrastructure.
  • Prototype and productionize Generative AI solutions for infrastructure automation, including auto-generation of Terraform/Puppet modules, IaC configurations, runbooks, and change documentation.
  • Collaborate with senior engineers and leadership to evolve and execute the infrastructure AI strategy across its implementation phases.
  • Maintain clear documentation of AI tools, integrations, and automated workflows; share knowledge and best practices across the team.

Requirements

  • 8+ years of experience in AI/ML engineering, with a strong focus on infrastructure and platform-related applications.
  • Proven track record of designing, building, and deploying production-grade AI/ML systems.
  • Extensive experience with cloud platforms, particularly GCP, and familiarity with multi-cloud environments.
  • Proficiency in developing solutions for monitoring, observability, and incident response.
  • Strong understanding of CI/CD principles and practices.
  • Experience with Generative AI technologies and their application in infrastructure automation (e.g., auto-generation of Terraform/Puppet modules, IaC configurations).
  • Excellent collaboration and communication skills.

View Assessment Process

Think you'll be a good fit?