Job Title: Sr. Site Reliability Engineer

Company Name: Intuitive Machines

Job Details: RemoteFull,Time

Job Url: https://hiring.cafe/viewjob/h69dnp0kbstlxta1

Job Description: Posted 5d agoSr. Site Reliability Engineer@ Intuitive MachinesView All JobsWebsiteUnited StatesRemoteFull TimeResponsibilities:Define SLOs, Lead incident response, Design infrastructureRequirements Summary:15+ years in enterprise infrastructure/SRE; virtualization, networking, Linux/Windows, automation, monitoring; remote work with quarterly travel; US citizenship or permanent resident preferred due to ITAR.Technical Tools Mentioned:VMware, Hyper-V, Citrix, VDI, AWS, PowerShell, Bash, Python, Linux, Windows Server, Active Directory, DNS, DHCP, SAN/NAS, VMware vSphere, Patching, Monitoring
Sr. Site Reliability Engineer Remote, USAbout Intuitive Machines:Intuitive Machines is an innovative and cutting-edge space company making cislunar space accessible to both public and private customers. Our mission is to further science and exploration, communications, and economic progress from the Earth to the Moon and beyond. With multiple NASA lunar missions in development and additional private missions on our manifest, we pride ourselves in supporting NASA, our customers, and the nation in paving the way to return humans to the surface of the Moon. Our world-class team includes experts in all aspects of spacecraft subsystems design, development, and test, on-orbit operations, and safety.About the Position: We’re looking for a Senior Site Reliability Engineer with deep enterprise infrastructure experience to help ensure the reliability, availability, and performance of systems supporting spacecraft design, manufacturing, and mission operations. In this role, you will bridge traditional infrastructure operations with modern SRE practices, focusing on proactive reliability, scalability, and performance.This is a remote position with quarterly travel to Bay Area facilities and occasional onsite support for critical incidents.Responsibilities:Define and maintain Service Level Objectives (SLOs) and error budgets for infrastructure servicesLead incident response efforts, perform root cause analysis, and implement preventive solutionsDesign, implement, and maintain hybrid and on-prem infrastructure with a focus on reliability and performanceEnsure availability and performance of virtualization platforms (VMware, Hyper-V, Citrix environments)Manage enterprise patching across Windows and Linux systemsMaintain and optimize storage platforms across SAN/NAS environmentsDesign and validate network architecture, including firewalls and switching infrastructureEstablish and maintain infrastructure baselines aligned with security and compliance frameworks (CIS, DISA STIG)Support compute infrastructure including enterprise server platformsCollaborate on hybrid cloud initiatives and AWS-based infrastructureAdminister core services including Active Directory, DNS, DHCP, and virtual desktop environmentsDevelop automation scripts to reduce operational overhead and improve efficiencyBuild and maintain monitoring, alerting, and documentation for infrastructure systemsParticipate in an on-call rotation supporting 24/7 mission-critical operationsMentor and guide engineers on reliability best practices and troubleshootingCollaborate cross-functionally with engineering, manufacturing, and security teamsTravel quarterly for onsite planning, coordination, and hands-on support as neededRequirements:15+ years of experience in enterprise infrastructure, systems administration, or Site Reliability Engineering rolesStrong experience with virtualization platforms (VMware, Hyper-V, Citrix, VDI environments)Deep understanding of networking and security architecture (firewalls, switching, secure design)Experience managing enterprise storage systems and SAN/NAS environmentsProficiency in Linux (RHEL) and Windows Server administrationExperience with Active Directory, DNS, DHCP, and patch management systemsFamiliarity with AWS and hybrid cloud infrastructure environmentsScripting experience (PowerShell, Bash, Python) for automation and operationsStrong troubleshooting skills and ability to resolve complex infrastructure issuesExperience with monitoring, performance tuning, and system reliability practicesAbility to work cross-functionally and communicate with technical and non-technical stakeholdersWillingness to travel quarterly and provide onsite support during critical eventsMust be a U.S. Citizen or Permanent Resident due to ITAR requirementsPreferred Requirements: Experience in aerospace, defense, or regulated manufacturing environmentsRelevant certifications (VMware, Cisco, Citrix, AWS)Familiarity with cloud reliability engineering practicesExperience supporting manufacturing execution systems (MES) or mission operations environmentsBackground in high-performance computing (HPC) infrastructureStrong experience working in remote environments with distributed teamsUS EEO Statement:Intuitive Machines is an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law.