Job Title: Site Reliability Engineer - SRE (L1) Company Name: Sarvin Job Url: https://www.simplyhired.com/job/Ausx-WkgEsUEf_NrlSdjUlM4ZKvDeFjMgsFkNT1RFshBv783_dyeSQ Job Description: Site Reliability Engineer - SRE (L1) Sarvin Remote Job Details Full-time | Contract $35,000 - $48,000 a year Benefits Health insurance Dental insurance Paid time off Vision insurance Flexible schedule Qualifications Go System troubleshooting Incident management Automation Technical documentation Customer support experience (1–2 years) IT system monitoring High availability architecture MongoDB Scalable systems Bash Incident response Bachelor's degree PostgreSQL SRE Network protocols Improving database performance Systems engineering Scalability Customer support Linux Prometheus Grafana Kafka Root cause analysis Distributed computing Senior level Data warehousing projects Communication skills Linux administration Data warehouse Python High availability System performance monitoring Full Job Description Accepting candidates in Brazil ONLY. Professional Role Overview We are seeking a Site Reliability Engineer (L1) to ensure the continuous availability and performance of our mission-critical production services. This role is designed for a professional who possesses the technical rigor required to manage complex distributed systems under a 100% on-call mandate within South American time zones. You will be responsible for the stewardship of high-stakes data environments—specifically those involving message queuing, relational and non-relational databases, and enterprise data warehouses—with a primary objective of maintaining strict service-level objectives (SLOs) through proactive monitoring, rapid incident response, and automated intervention. Key Responsibilities Production Stewardship: Serve as the first responder for production anomalies, managing the end-to-end incident lifecycle from initial detection to post-incident resolution. Data Infrastructure Management: Ensure the reliability and scalability of high-throughput data platforms, including message brokers, relational (PostgreSQL or similar) and non-relational databases (MongoDB or similar), and data warehouse environments. Operational Excellence: Execute 100% on-call rotations, providing consistent coverage and rapid response to critical system alerts. Automation & Toil Reduction: Develop and maintain scripts (Python, Go, or Bash) to automate routine operational tasks, enhancing system resilience and reducing manual overhead. Observability & Telemetry: Configure and optimize monitoring suites (e.g., Prometheus, Grafana, Datadog) to ensure comprehensive visibility into application and system health. Must Have: Prior SRE/On-call Experience: A mandatory background in SRE or production support roles, with a demonstrated ability to manage high-pressure on-call rotations and running production services. Data Systems Proficiency: Message Queuing: Experience managing brokers (e.g., Kafka), topics, and troubleshooting throughput issues. Relational & Non-Relational Databases: Proficiency in managing database health, query optimization, and high-availability configurations. Data Warehouse: Experience in managing large-scale data warehouse performance and resource allocation. Systems Engineering: Strong competency in Linux internals and networking protocols. Regional Alignment: Must be based in and able to operate effectively within South American time zones to facilitate synchronized operations. Preferred Skills: Analytical Rigor: The ability to diagnose root causes in complex, interconnected systems rather than applying superficial fixes. Communication: Exceptional technical documentation skills and the ability to provide concise, professional updates during active incidents. Dedication: A steadfast commitment to system uptime and a proactive approach to identifying potential points of failure before they impact the user experience. Education: Bachelor’s degree in Technology, Computing, or a related field Job Types: Full-time, Contract Pay: $35,000.00 - $48,000.00 per year Benefits: Dental insurance Flexible schedule Health insurance Paid time off Vision insurance Application Question(s): Do you have previous on-call experience? Are you located in South America? Work Location: Remote