Job Title: Staff AI Ops Engineer

Company Name: Calix

Job Details: $136k-$231k/yrRemoteFull,Time

Job Url: https://hiring.cafe/viewjob/g02e9mbvpbv3s0p5

Job Description: Posted 3mo agoStaff AI Ops Engineer@ CalixView All JobsWebsiteUnited States or Canada$136k-$231k/yrRemoteFull TimeResponsibilities:Design infrastructure, Deploy pipelines, Scale resourcesRequirements Summary:8+ years software engineering; 3+ years DevOps/AIOps or similar ML infrastructure; Python; Terraform; Docker; Kubernetes; GCP; Airflow/Kubeflow; CI/CD; ML/GenAI; Vertex AI; PyTorch; strong problem solving and cross-functional collaboration.Technical Tools Mentioned:Terraform, Docker, Kubernetes, Airflow, Kubeflow, MLflow, Prometheus, Grafana, Vertex AI, PyTorch
Calix provides the cloud, software platforms, systems and services required for communications service providers to simplify their businesses, excite their subscribers and grow their value.Calix is where passionate innovators come together with a shared mission: to reimagine broadband experiences and empower communities like never before. As a true pioneer in broadband technology, we ignite transformation by equipping service providers of all sizes with an unrivaled platform, state-of-the-art cloud technologies, and AI-driven solutions that redefine what’s possible. Every tool and breakthrough we offer is designed to simplify operations and unlock extraordinary subscriber experiences through innovation. Calix is seeking a highly skilled Staff AI Ops Engineer with hands-on experience with GCP to join our cutting-edge AI/ML team. In this role, you will be responsible for building, scaling, and maintaining the infrastructure that powers our machine learning and generative AI applications. You will work closely with data scientists, ML engineers, and software developers to ensure our ML/AI systems are robust, efficient, and production ready.This is a remote-based position that can be located anywhere in the United States or Canada.Key Responsibilities:Design, implement, and maintain scalable infrastructure for ML and GenAI applicationsDeploy, operate, and troubleshoot production ML/GenAI pipelines/servicesBuild and optimize CI/CD pipelines for ML model deployment and servingScale compute resources across CPU/GPU architectures to meet performance requirementsImplement container orchestration with KubernetesArchitect and optimize cloud resources on GCP for ML training and inferenceSetup and maintain runtime frameworks and job management systems (Airflow, KubeFlow, MLflow, etc.)Establish monitoring, logging and alerting for systems observabilityOptimize system performance and resource utilization for cost efficiencyDevelop and enforce AIOps best practices across the organizationQualifications:Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience). 8+ years of overall software engineering experience3+ years of focused experience in DevOps/AIOps or similar ML infrastructure rolesProficient in IaC, using Terraform.Strong experience with containerization and orchestration using Docker and KubernetesDemonstrated expertise in cloud infrastructure management on GCPProficiency with workflow management such as Airflow & KubeflowStrong CI/CD expertise with experience implementing automated testing and deployment pipelinesExperience with scaling distributed compute architectures utilizing various accelerators (CPU/GPU)Solid understanding of system performance optimization techniquesExperience implementing comprehensive observability solutions for complex systemsKnowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack).Strong proficiency in PythonFamiliarity with ML frameworks such as PyTorch and ML platforms like Vertex AIExcellent problem-solving skills and ability to work independentlyStrong communication skills and ability to work effectively in cross-functional teams#LI-RemoteThe base pay range for this position varies based on the geographic location. More information about the pay range specific to candidate location and other factors will be shared during the recruitment process. Individual pay is determined based on location of residence and multiple factors, including job-related knowledge, skills and experience.San Francisco Bay Area:156,400 - 265,700 USD AnnualAll Other US Locations:136,000 - 231,000 USD AnnualAs a part of the total compensation package, this role may be eligible for a bonus. For information on our benefits click here.