logo

mistral

Field Hardware Engineer, HPC

Department
Engineering
Job Type / Location
remote
Experience Required
5+ years
Posted On

About Mistral

At Mistral AI, we believe in the power of AI to simplify tasks, save time, and enhance learning and creativity. Our technology is designed to integrate seamlessly into daily working life.

We democratize AI through high-performance, optimized, open-source and cutting-edge models, products and solutions. Our comprehensive AI platform is designed to meet enterprise needs, whether on-premises or in cloud environments. Our offerings include le Chat, the AI assistant for life and work.

We are a dynamic, collaborative team passionate about AI and its potential to transform society.

Our diverse workforce thrives in competitive environments and is committed to driving innovation. Our teams are distributed between France, USA, UK, Germany and Singapore. We are creative, low-ego and team-spirited.

Join us to be part of a pioneering company shaping the future of AI. Together, we can make a meaningful impact. See more about our culture on https://mistral.ai/careers .

Role summary

Our compute footprint is growing fast to support our science and engineering teams. We’re hiring a Field HW Engineer to understand end-to-end systems, execute complex/vendor-level interventions, and guide L1 engineers on site—without direct line management. You’ll work hands-on across compute, storage, interconnect and cooling to keep one of France’s largest GPU/CPU clusters healthy and scalable.

Location: Bruyères-le-Châtel — on-site, field role (multi-site mobility: Paris area and nearby)

Reporting line: Hardware Ops

Impact

• Compute is a key lever for Mistral’s success and our largest spend item.

• Direct impact on scale: you’ll restore service on complex incidents and raise the bar on reliability as we grow.

• Enable breakthrough AI: your work unlocks science & engineering teams to deliver state-of-the-art AI.

What you will do

• Lead complex interventions: plan and execute vendor-level or multi-node operations (e.g., full rack work, intricate recabling, post-restart diagnosis), own risk assessment/rollback, and coordinate with vendors (RMA/escalations).

• Advanced diagnostics: correlate symptoms across compute, storage, interconnect, cooling; read system indicators (LED/POST/beep), BMC/IPMI consoles, and logs to identify root causes.

• Guide and uplift L1s: coach on safe practices (ESD/LOTO), first-line triage, rack craftsmanship, documentation quality; pair on tricky procedures. (No people management.)

• Process & automation: improve SOPs/checklists; propose/build small automation (Python/Bash) for photo/serial capture, inventory sync, dashboards/alerts; shorten MTTR.

• Safety & compliance: enforce lockout/tagout, ESD, PPE; ensure audit-ready tickets, evidence and change traces.

• Parts & logistics (advanced): plan spares strategy, track failure trends, and drive proactive vendor actions.

About you

• 5+ years in data center/server h

View Assessment Process

Think you'll be a good fit?