Job Title: Senior Backend Engineer

Company Name: Kula AI

Job Url: https://careers.kula.ai/archetype-ai/27516?jr_id=69a6a33bf707784310afff66

Job Description: About Job
We’re looking for a highly motivated backend engineer with extensive experience in designing and developing performant, scalable, and resilient inference services.

You’ll work closely with researchers, ML engineers, and product teams to bring cutting-edge AI capabilities into production—at scale, with reliability, and under real-world constraints.

This is an opportunity to own key services across our inference platform, from intelligent request routing to fleet-wide orchestration across diverse AI accelerators, and to contribute to some of the most advanced real-time AI serving systems in production today.

Core Responsibilities
Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms.

Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models.

Continuously optimize inference performance—including batching, caching, and request routing strategies—to maximize compute efficiency under explosive customer growth.

Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability.

Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability.

Own problems end-to-end—from design to deployment—with a strong bias toward quality, automation, and continuous improvement.

Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness.

Contribute to a culture of engineering excellence, mentorship, and team-first collaboration.

Minimum Qualifications
7+ years of professional software engineering experience, with a focus on inference.

Deep understanding of machine learning systems at scale including load balancing, request routing, or traffic management.

Experience with inference optimization, batching, and caching strategies

Ability to design APIs and service interfaces for real-time and latency-sensitive use cases..

Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP).

Strong debugging, instrumentation, and observability skills across distributed systems.

Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.

Preferred Qualifications
Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands.

Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing.

Strong understanding of failure modes in distributed systems and mitigation techniques.

Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing.

Proficiency in multiple programming languages (e.g., Rust, C++, Python).

Experience designing internal tools or platforms to support developer productivity and experimentation.

Strong product intuition, and ability to collaborate closely with cross-functional teams including research and design.

What We Value
Ownership – You take initiative, follow through, and care deeply about quality and outcomes.

Motivation – You’re driven to solve complex problems and continuously raise the bar for yourself and your team.

Excellence – You bring discipline, clarity, and rigor to your craft—and help others do the same.

Collaboration – You work well with others, mentor generously, and contribute to a high-trust, high-performance culture.