Job Title: Senior Software Engineer

Company Name: Temporal

Job Url: https://boards.greenhouse.io/embed/job_app?token=5038882007&utm_source=jobright&jr_id=697d36333f57a33569670408

Job Description: We are hiring a Senior Software Engineer to join the Cloud Enablement team, part of Temporal’s Cloud Global Services (CGS) organization.

The Cloud Enablement team focuses on applying and extending the Temporal OSS replication stack to power critical Temporal Cloud capabilities. These include High Availability (HA) namespaces, error detection and automated failover, and migration of workloads and namespaces between self-hosted Temporal clusters and Temporal Cloud, as well as within Temporal Cloud.

As a Senior Engineer, you will work on backend systems that sit at the core of Temporal Cloud’s enterprise offerings. These systems must be correct, reliable, observable, and safe to operate at scale, even in the presence of partial failures, network partitions, and evolving customer workloads. You’ll collaborate closely with other engineers in CGS Replication Foundations, Cloud, Infrastructure, and OSS teams to deliver production-grade features used by customers running mission-critical workflows.
What You'll Do
Design and implement backend features that apply and extend the Temporal OSS replication stack to new Temporal Cloud capabilities
Contribute to Temporal Cloud High Availability features, including:
Namespace replication within and across regions and cloud providers
Monitoring replication health and lag
Supporting manual and automated failover workflows
Build and improve namespace migration systems, including:
Migration of namespaces and workloads between self-hosted Temporal clusters and Temporal Cloud
Migration between Temporal Cloud environments or regions
Tooling that supports safe cutover, validation, and rollback
Own medium-to-large features end-to-end, from design through production rollout and long-term maintenance
Write clear design documentation describing system behavior, tradeoffs, and failure modes
Ensure features are production-ready by delivering:
Service-level logs, metrics, and tracing
Alerts, dashboards, and operational runbooks
Participate in operational ownership, including on-call rotations, incident response, and postmortems
Collaborate with teammates to continuously improve reliability, operability, and development velocity