Job Url: https://boards.greenhouse.io/embed/job_app?token=7132056&utm_source=jobright Job Description: Platform Engineer II (Observability) at Iterable (View all jobs) REMOTE - US Iterable is the leading AI-powered customer engagement platform that helps leading brands like Redfin, SeatGeek, Priceline, Calm, and Box create dynamic, individualized experiences at scale. Our platform empowers organizations to activate customer data, design seamless cross-channel interactions, and optimize engagement—all with enterprise-grade security and compliance. Today, nearly 1,200 brands across 50+ countries rely on Iterable to drive growth, deepen customer relationships, and deliver joyful customer experiences. Our success is powered by extraordinary people who bring our core values—Trust, Growth Mindset, Balance, and Humility—to life. We foster a culture of innovation, collaboration, and inclusion, where ideas are valued and individuals are empowered to do their best work. That’s why we’ve been recognized as one of Inc’s Best Workplaces and Fastest Growing Companies, and were recognized on Forbes’ list of America’s Best Startup Employers in 2022. Notably, Iterable has also been listed on Wealthfront’s Career Launching Companies List and has held a top 10 ranking on the Top 25 Companies Where Women Want to Work. With a global presence—including offices in San Francisco, New York, Denver, London, and Lisbon, plus remote employees worldwide—we are committed to building a diverse and inclusive workplace. We welcome candidates from all backgrounds and encourage you to apply. Learn more about our story and mission on our Culture and About Us pages. Let’s shape the future of customer engagement together! How you will make an impact: At Iterable, the Observability team enables engineering teams to measure, diagnose, and improve system health. We own and evolve Iterable’s monitoring, logging, tracing, and metrics platforms—turning raw telemetry into actionable insight. As a Platform Engineer II – Observability on our tight-knit team, you’ll drive reliability by implementing modern monitoring, automation, and orchestration practices that keep our systems performing at their best. What you’ll do Own the full observability stack (Datadog, Prometheus, Grafana, Elasticsearch, Quickwit, OpenTelemetry)—design, deploy, and scale it to support petabyte-scale telemetry. Instrument and automate monitoring, logging, tracing, and metrics to ensure system visibility across 100+ services and multiple Kubernetes clusters. Ship platform features—contribute code that boosts reliability, performance, and developer experience across Iterable. Partner with engineering teams to improve instrumentation, refine dashboards/alerts, and embed observability into their SDLC. Reduce MTTR & cost—design cost-effective telemetry pipelines and create high-signal, low-noise alerting strategies. Participate in our on-call rotation that prioritizes recovery, postmortems, and continuous