Company Name: CoreWeave Job Details: $120-176k+,Discretionary,bonus,&,Equity,for,eligible,rolesKubernetesLinuxGoBashSplunkPrometheusGrafanalokiDatadogMid,and,Senior,levelNew,YorkSan,Francisco,Bay,AreaMore,information,about,locationOffice,located,in,Livingston,,NJ,or,New,York,,NY,or,Sunnyvale,,CA Job Url: https://app.welcometothejungle.com/jobs/iEXV2J7w?theme=take-another-look Job Description: RoleWho you areBring 3+ years of experience in production engineering, SRE, or large-scale infrastructure/platform rolesAre deeply knowledgeable in Kubernetes administration, container orchestration, and microservices architectures, with a bias for automating every aspect of operationsHave a proven track record managing high-uptime, customer-facing systems in a fast-moving environment, with experience delivering measurable improvements in reliability and performancePossess expertise in monitoring, observability, and incident management using tools like Prometheus, Grafana, Datadog, Splunk, Loki, or VictoriaMetricsDemonstrate strong proficiency in infrastructure-focused programming—especially in Go and Bash—and hold a deep understanding of Linux systemsExcel at troubleshooting complex production issues, from system failures to performance bottlenecks, and approach problems methodically with strong analytical skillsCommunicate clearly across technical and non-technical stakeholders, proactively sharing knowledge and advocating for operational best practicesAre passionate about building systems that are not just functional, but robust, self-healing, and easy to operate at scaleTake pride in driving continuous improvement, and helping set high standards for operational excellence and team cultureWhat the job involvesIn this role, you will play a key part in ensuring the availability, reliability, and scalability of one of the industry’s largest Kubernetes environmentsAs a senior member of the team, you will drive operational excellence, implement robust automation, and help shape the systems that keep CoreWeave’s cloud running smoothlyBuild, operate, and scale Kubernetes-based production infrastructure that delivers CoreWeave’s products with high reliability and performanceDevelop automation, tooling, and infrastructure as code in Go and other infrastructure-focused languages to enable zero-touch operations, rapid recovery, and seamless deploymentsDesign, implement, and maintain monitoring, alerting, and observability solutions—leveraging the Grafana ecosystem and related tools—to proactively identify and resolve production issuesDrive incident response efforts, participate in on-call rotations, and lead root cause analysis to prevent recurrence and improve incident handling processesPartner with internal and cross-functional teams to ensure platform capabilities meet rigorous operational requirements and customer SLAsEngineer for resiliency, implementing best practices for redundancy, fault tolerance, and disaster recovery across complex distributed systemsAdvocate for security, reliability, and performance improvements throughout the stack, continuously seeking opportunities to strengthen operational standardsContribute to the development of custom Kubernetes operators and intelligent orchestration frameworks that optimize AI workload performance and resource utilization at scaleMentor and support other engineers in production best practices, fostering a culture of high accountability and operational awarenessWhat Success Looks LikeYou deliver stable, robust, and highly-available systems that consistently meet or exceed uptime and performance targetsYou champion initiatives that drive automation, reduce operational toil, and increase the efficiency of incident responseYour leadership in root cause analysis, postmortems, and process improvement makes the Kubernetes platform more resilient and scalableYou actively contribute to a blameless culture of learning, mentoring others in operational best practices and production engineering principlesYou help CoreWeave maintain industry leadership through flawless execution in supporting demanding, AI-powered workloads at scaleShare this jobReport a problem with this jobHide companyView 92 more jobs at CoreWeaveInsightsTop investors194% employee growth in 12 monthsGlassdoor (4.4)CompanyCompany benefitsMedical, dental, and vision insurance - 100% paid for by CoreWeaveCompany-paid Life InsuranceVoluntary supplemental life insuranceShort and long-term disability insuranceFlexible Spending AccountHealth Savings AccountTuition ReimbursementAbility to Participate in Employee Stock Purchase Program (ESPP)Mental Wellness Benefits through Spring HealthFamily-Forming support provided by CarrotPaid Parental LeaveFlexible, full-service childcare support with Kinside401(k) with a generous employer matchFlexible PTOCatered lunch each day in our office and data center locationsA casual work environmentA work culture focused on innovative disruptionFunding (last 2 of 5 rounds)May 2024$1.1bnSERIES CMay 2023$200mSERIES BTotal funding: $1.7bnOur takeAs AI continues to dominate the tech landscape, the demand for faster, more efficient cloud infrastructure is only intensifying. Traditional cloud providers often struggle with the sheer speed and scale required to power advanced AI models, leaving a gap for more specialized players. And that's where CoreWeave comes in, a company purpose-built to handle the heavy computational lift that the AI era demands.CoreWeave's story is an interesting one too. What began in 2017 as a small crypto mining venture has evolved into one of the leading AI-focused cloud providers in the world. Its data centers are packed with high-end GPUs optimized for machine learning and generative AI, serving major clients like Microsoft and OpenAI. By designing its infrastructure around the needs of modern AI workloads rather than retrofitting older systems, CoreWeave has carved out a distinct and highly valuable niche in the market.That focus in paying off in a big way too. Analyst project CoreWeave's revenue to soar from $5.3B in 2025 to the mid-$20Bs by 2028, up from just $16M in 2022. While there is competition from tech giants like Amazon and Google, CoreWeave's speed, specialization, and bold investment strategy put in in avery powerful position to keep riding (and helping shape) the AI boom.StephCompany Specialist at Welcome to the Jungle