Company Name: Sully.ai

Job Details: Hiring Remotely in US Remote 200K-230K Annually Senior level

Job Url: https://builtin.com/job/senior-ai-systems-engineer-llm-inference-infra-optimization/7080683

Job Description: About UsAt Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.The RoleWe’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.What You’ll DoLLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.What We’re Looking ForProficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.Deep understanding of GPU architectures, inference optimization, and large model serving techniques.Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.(Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.(Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.Why Join UsCollaborate with a highly technical team solving hard problems at the edge of AI and healthcare.Work with bleeding-edge GPU infrastructure and build systems that push what's possible.Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.Help accelerate a meaningful product that improves how clinicians work and patients are cared for.Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment.