Job Url: https://jobs.lever.co/zoox/7bef3abc-9c10-44d2-9706-ade8f749f211 Job Description: Zoox is looking for an experienced Software Engineer to work on key new frameworks and infrastructure modernization for our custom High-Performance Computing infrastructure and its supporting ecosystem of tools and services. Zoox HPC services combine industry-best scheduling and workload orchestration technologies, such as Ray.io and SLURM, with value-add workflows specifically for Autonomous Vehicle development. These HPC services form the backbone of development workflows across all Zoox software teams, from data engineering to training our AI models in Perception, Planner, Prediction, to simulation, and more. You will take on a breadth of end-to-end responsibilities including distributed system design, algorithmic job scheduling, and adaptive cloud scaling in support of all of Zoox’s computational needs. The position comes with a high degree of independence and the opportunity to help define Zoox’s compute scaling strategy, both technically and organizationally. You will work closely with stakeholders in Autonomy and Software teams to iterate on world-class developer experiences, incorporating the latest industry tools and best practices. In this role, you will: Evaluate new distributed system paradigms and technologies to meet Zoox’s ever-growing computational and storage needs Strike a balance between incremental improvements to Zoox’s existing in-house HPC infrastructure and greenfield services and abstractions. Create production-grade web service APIs, SDKs, and other tools to provide a world-class developer experience for all of Zoox’s software teams. Qualifications 7+ years of experience Experience with Ray.io, particularly Ray Core and Ray Data Experience with Kubernetes, particularly for heterogeneous workloads and clusters Experience with Ray.io and Kubernetes deployed on Amazon Web Services (AWS) or other similar cloud providers such as Azure or GCP Proficiency with Python