Job Url: https://job-boards.greenhouse.io/wikimedia/jobs/6963387?gh_src=94cfded01us Job Description: You are responsible for: Design, implementation and maintenance of public facing infrastructure and services Use of configuration management and deployment tools Architectural design and operation at scale Monitoring of systems and services, optimization of performance and resource utilization Proactively identify sources of instability in distributed systems and analyze how complex systems fail from a reliability and resilience perspective. Common operating system level tasks such as logging and backup / restore Cookbook / runbook implementation for common maintenance actions  Participate in 24/7 on-call rotation and escalations for resolving production issues Lead incident response and post-incident reviews, contributing to failure analysis and implementing preventive measures Automation and streamlining of tasks as well as identifying process gaps Collaborating with a global and asynchronously communicating team (don’t worry if you have never worked remotely, we’ll help you get used to it) Mentoring peers in your areas of technical and operational strength Expected to travel domestically or potentially internationally 2-3 times in a year for team gatherings and conferences Skills and Experience: Candidates should be based within UTC -8 to UTC -3 time zones to ensure good collaboration overlap with the team. 5+ years of experience in an SRE/Operations/DevOps role Experience with operating highly available infrastructure Experience with running applications and services at scale Experience implementing containerization solutions (Docker, Kubernetes) Proficient with shell and a programming language used in an SRE/Operations engineering context (Python, Go, Ruby, etc.) Comfortable with Open Source configuration management and orchestration tools (Puppet, Ansible, TerraForm etc.) Communicative technical English Additionally, we’d love it if you have: Experience with package management for operating systems (Debian, etc) We are avid supporters (and users) of open source software; history of contributing to Open Source projects is valued Familiarity with RFC 2549 Prior participation in the Wikimedia movem