Job Title: AI/ML Ops Engineer Company Name: Foundation EGI Job Details: RemoteFull,Time Job Url: https://hiring.cafe/viewjob/h9cizsxhir0n8wza Job Description: Posted 7mo agoAI/ML Ops Engineer@ Foundation EGIView All JobsWebsiteUnited StatesRemoteFull TimeResponsibilities:Architect ML pipelines, Instrument monitoring and logging, Automate CI/CD for ML artifactsRequirements Summary:5+ years in AI/ML Ops, DevOps, or infrastructure; expert Python and TypeScript; strong Docker/Kubernetes; GCP; CI/CD; LLMs and prompt engineering; excellent communication.Technical Tools Mentioned:Python, TypeScript, Docker, Kubernetes, Terraform, Google Cloud, GitHub Actions, Prometheus, Grafana, ELK, Vertex AI, gRPC, PostgreSQL, Next.js, React.js We are an MIT-born, venture-backed Silicon Valley startup building a real-life 'Jarvis'—an AI Copilot for design and manufacturing. Our goal is to utilize advanced AI, physics simulation, and computer graphics to reduce costs and improve engineering productivity across all steps of the design and manufacturing process.We are an MIT-born, venture-backed Silicon Valley startup building a real-life 'Jarvis'—an AI Copilot for design and manufacturing. Our goal is to utilize advanced AI, physics simulation, and computer graphics to reduce costs and improve engineering productivity across all steps of the design and manufacturing process. ResponsibilitiesArchitect, build, and operate end-to-end ML pipelines for training, validation and deployment on Google Cloud.Define, instrument, and maintain logging, monitoring, and alerting for model performance and data drift.Automate CI/CD for ML artifacts and infrastructure using GitHub Actions or equivalent.Collaborate with cross-functional teams, including frontend engineers, backend engineers, research engineers, and infrastructure engineers.Write clean, well-documented, fast, and maintainable code.Help ensure our systems have high availability and performance.What we're looking forBS in Computer Science or a related field.5+ years of experience as a AI/ML Ops, DevOps, Infrastructure Engineer or equivalent.Expert-level Python and TypeScripts skills.Experience with Docker, Kubernetes, Terraform, and Google Cloud.Deep understanding of large language models (LLMs) and prompt-engineering best practices.Experience designing and maintaining CI/CD pipelines to fine-tune or train LLM models.Excellent written and verbal communication skills.Bonus PointsExperience in computer graphics or physics-based simulation.Background in setting up Prometheus/Grafana, ELK, or similar monitoring stacks.Experience with Vertex AI.Experience working with custom Domain-Specific Languages.Our tech stackGoogle CloudPython, TypeScriptProtobuf, gRPCNext.JS, React.JSGitHub ActionsDocker, Kubernetes, SpinnakerPostgreSQL