Job Title: Distributed Systems Engineer / Architect

Company Name: RapidFort

Job Url: https://job-boards.greenhouse.io/embed/job_app?for=rapidfortinc&jr_id=69b8a1d33b74eb1e2c86866c&token=4184534009&utm_source=jobright

Job Description: Overview
We are looking for a Distributed Systems Engineer / Architect to design and build highly scalable custom systems that process large volumes of data across CPU, disk, and network intensive workloads. This role is deeply hands-on and requires strong systems thinking, algorithm design, and performance optimization skills.

You will work on core infrastructure and algorithms, building systems that maximize resource utilization across distributed environments. The ideal candidate enjoys working close to the metal, writing efficient code and tooling (primarily in Python and Bash) while building the instrumentation needed to continuously measure, analyze, and improve system performance.

This role requires a data-driven mindset and a passion for building reliable, scalable systems from first principles.

Responsibilities
System Architecture
Design and implement scalable distributed systems that handle heavy CPU, disk, and network workloads.

Architect systems for high throughput, reliability, and efficient resource utilization.

Develop distributed algorithms and data processing pipelines.

Performance & Optimization
Analyze system behavior to identify bottlenecks across compute, storage, and network layers.

Optimize workloads for maximum efficiency and minimal resource waste.

Develop strategies for parallelization, batching, and workload scheduling.

Engineering & Implementation
Implement system components and tooling primarily in Python and Bash.

Build custom orchestration, automation, and distributed job execution mechanisms.

Write efficient algorithms and low-level logic to manage large-scale workloads.

Observability & Data-Driven Engineering
Build instrumentation, metrics, and telemetry to measure system performance.

Develop dashboards and analysis workflows to guide optimization decisions.

Use empirical data and experimentation to improve system behavior.

Infrastructure & Reliability
Design systems that operate reliably across distributed environments.

Implement monitoring, debugging, and recovery mechanisms for large-scale systems.

Collaborate with infrastructure and platform teams to ensure smooth deployment and operation.

Requirements
Core Experience
Strong experience building distributed systems or large-scale backend infrastructure

Deep understanding of systems performance (CPU, memory, disk I/O, networking)

Experience optimizing workloads for throughput and efficiency

Programming
Strong Python development skills

Strong Bash / shell scripting

Ability to implement and reason about algorithms and system-level logic