Company Name: Abnormal Security

Job Details: Hiring Remotely in USA Remote 176K-207K Annually Senior level

Job Url: https://builtin.com/job/senior-software-engineer-site-reliability/6468556

Job Description: About the RoleAbnormal Security is looking for a Senior Software Engineer - Site Reliability to join our Infrastructure team. In this role, you will be responsible for the reliability, scalability, and operational excellence of our systems and services. You will lead initiatives to improve the operational maturity of both SRE-managed services and critical product systems, driving change across the organization in support of stable operations.As a senior member of the team, you will independently define and execute quarterly goals, create forward-looking roadmaps, and own cross-functional projects aligned with company-level objectives. You will serve as a key advocate for reliability, providing technical leadership, deep analysis, and mentorship while embedding with product teams as needed to improve service ownership and incident response practices.The ideal candidate:Has strong technical depth in distributed systems and operational excellencePossesses a product-focused mindset with the ability to translate business needs into reliability goalsIs a strong communicator and mentor, able to influence both within the SRE team and across engineeringHas demonstrated experience leading broad technical initiatives across teams and systemsWhat You Will DoOwn the operational maturity of services in the SRE software stack, driving architectural and tooling improvementsProactively partner with product teams to embed SRE best practices and support services with operational challengesIndependently define and drive quarterly goals for the SRE team with measurable impact on system reliability and developer productivityDesign and maintain systems that promote observability, automated recovery, scalability, and resilienceLead incident reviews and root cause analyses; ensure follow-up actions are implemented and shared across teamsCollaborate with engineering leadership to shape the team roadmap and contribute to company-wide reliability goalsMentor other engineers and drive adoption of SRE principles throughout the engineering organizationMust Have8+ years of experience in infrastructure, DevOps, or Site Reliability Engineering rolesDeep knowledge of production-grade distributed systems and cloud-native architecturesDemonstrated experience managing service availability, latency, and incident response in production environmentsStrong programming skills in Python, Go, or similar languagesExperience with Kubernetes, Terraform, and observability tools (e.g., Prometheus, Grafana, Datadog)Proven ability to lead complex, multi-team initiatives and influence system design for reliabilityNice To HavePrior experience embedding with product engineering teams to support operational goalsFamiliarity with AWS and multi-cloud environments (e.g., Azure, GCP)Experience in regulated environments or with FedRAMP-compliant systemsContributions to open-source SRE tooling or community knowledge sharing#LI-NT1At Abnormal AI, certain roles are eligible for a bonus, restricted stock units (RSUs), and benefits. Individual compensation packages are based on factors unique to each candidate, including their skills, experience, qualifications and other job-related reasons. We know that benefits are also an important piece of your total compensation package. Learn more about our Compensation and Equity Philosophy on our Benefits & Perks page.Base pay range:$176,000—$207,050 USDSan Francisco/New York Base pay range:$195,000—$230,000 USDAbnormal AI is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status or other characteristics protected by law. For our EEO policy statement please click here. If you would like more information on your EEO rights under the law, please click here.