logo

Faire

Staff Machine Learning Platform Engineer

Department
Engineering
Job Type / Location
San Francisco
Experience Required
8+ years
Posted On

About this role

As a Staff Machine Learning Platform Engineer, you will help design, improve, and operate a scalable ML platform to accelerate model training, deployment, and governance. You are the technical bridge between data science and production engineering. You’ll be joining a small but deeply critical team that scales Faire’s ability to support tens of thousands of local businesses in a constantly narrowing retail landscape.

What You Will Do

  • Design and operate ML infrastructure, including workspaces, clusters, jobs, and workflows
  • Productionize ML workloads using Spark, Delta Lake, MLflow, and Databricks Workflows
  • Teach data scientists how to utilize our ML platform to advance development from notebook to production for our most critical models
  • Implement Unity Catalog for data governance, lineage, access control, and secure multi-tenant usage
  • Build CI/CD pipelines for ML using Terraform and Git-based workflows (e.g., GitHub Actions)
  • Optimize performance, reliability, and cost across training and inference workloads
  • Configure Identity and Access Management (IAM) and Role Based Authentication Controls (RBAC) for sensitive data sets
  • Establish observability for data quality, model performance, and platform health
  • Build and maintain ML Platform technical documentation

What it takes

  • 8+ years of experience building production ML or data platforms
  • A degree (preferably graduate level) in Computer Science, Engineering, Statistics, or a related technical field
  • Strong hands-on expertise with Databricks, Spark, Delta Lake, and MLflow.
  • Proficiency in Python, SQL, and distributed systems concepts
  • Experience with cloud platforms and infrastructure-as-code
  • Solid understanding of MLOps best practices: CI/CD, monitoring, reproducibility, and security
  • Experience supporting multiple ML teams in a shared platform environment
  • Are an active owner of orphaned problems and are willing to assimilate whatever knowledge you’re missing to get the job done

Tech Stack

Faire uses a modern cloud based tech stack. For this role, you’ll want to be proficient with the following:

Languages

  • Python
  • SQL
  • Kotlin

ML Frameworks

  • PyTorch
  • MLFlow

Big Data & Processing

  • Spark
  • Kafka
  • Databricks
  • Snowflake
  • Fivetran
  • Iceberg
  • Unity Catalog
  • Datadog
  • Airflow
  • Cockroach DB
  • MySQL

Cloud & Infrastructure

  • AWS
  • S3
  • SageMaker
  • Kubernetes
  • Docker
  • GitHub Actions
  • Terraform

Generative AI

  • Claude Sonnet 4.5
  • ChatGPT 5.2

View Assessment Process

Think you'll be a good fit?