The SBGrid Consortium at Harvard Medical School supports a large international research community by curating and distributing a scientific software platform used across structural biology, cryo-EM, and related fields. The platform includes approximately 650 software titles and 6,000 versions across macOS and Linux and is deployed across laptops, workstations, HPC clusters, and cloud environments. We are hiring a Scientific Platform Engineer to help lead the modernization, security, reliability, and engineering evolution of this platform. This is a platform engineering role with substantial independent responsibility for CI pipelines, reproducible packaging, deterministic installation, release engineering, runtime hardening, observability, and software supply-chain integrity. The role is designed to be primarily engineering and platform-development work, not routine support, and it directly impacts software delivery and platform reliability across a globally distributed scientific infrastructure.
What You Will Work On: This is an engineering-heavy role – expect 90%+ project/building time vs break-fix.
Build & Test Automation
- Design and implement CI pipelines for scientific software across macOS and Linux.
- Develop regression and smoke test harnesses for packaged software.
- Catch failures before distribution rather than after client installation.
- Support fast-moving development branches (e.g., nightly builds) safely.
Reproducible Packaging
- Help define and enforce a canonical build contract.
- Improve dependency tracking and version control.
- Enable deterministic rebuilds across environments.
- Contribute to artifact integrity and metadata tracking (e.g., SBOM readiness).
Runtime Platform Hardening
- Add tests and versioning discipline to SBGrid’s runtime wrapper system (“capsules”).
- Introduce feature flags and safer rollout mechanisms.
- Improve logging, observability, and error classification. Internal Tooling & Observability.
- Develop dashboards and structured signals around build failures and common error states.
- Reduce reliance on tribal knowledge by encoding workflows into systems.
Technologies You’ll Use (and can help shape):
Core platforms
- Linux (expert-level): shells, process model, filesystems, toolchains, debugging, perf basics
- macOS (strong): building, testing, and release workflows across Intel + Apple Silicon
Build/release + automation
- CI/CD: GitLab CI (or equivalent CI systems and concepts)
- Scripting and automation: Bash + Python (primary)
- Performance-oriented implementation as needed: Go and/or Rust (selectively, for the hot paths)
Packaging and reproducibility
- Current + future packaging direction: