Job Title: Staff Software Engineer, Training (Bay Area / Paris / Remote) Company Name: Genesis AI Job Details: RemoteFull,Time Job Url: https://hiring.cafe/viewjob/16nmzxpm2kx8b36o Job Description: Posted 6mo agoStaff Software Engineer, Training (Bay Area / Paris / Remote)@ Genesis AIView All JobsWebsiteSan Carlos or Paris or RemoteRemoteFull TimeResponsibilities:profiling bottlenecks, designing systems, implementing kernelsRequirements Summary:8+ years in distributed systems or ML infrastructure; production Python; CUDA/cuDNN/Triton; PyTorch training with data, context, pipeline, and model parallelism; strong system-level tuning.Technical Tools Mentioned:Python, CUDA, cuDNN, Triton, PyTorch, GPU, CPU What You’ll DoDrive down wall-clock time to convergence by profiling and eliminating bottlenecks across the foundation model training stack stack, from data pipelines to GPU kernelsDesign, build, and optimize distributed training systems (PyTorch) for multi-node GPU clusters, ensuring scalability, robustness, and high utilizationImplement efficient low-level code (CUDA, cuDNN, Triton, custom kernels) and integrate it seamlessly into high-level training frameworksOptimize workloads for hardware efficiency: CPU/GPU compute balance, memory management, data throughput, and networkingDevelop monitoring and debugging tools for large-scale runs, enabling rapid diagnosis of performance regressions and failuresWhat You’ll BringDeep experience in distributed systems, ML infrastructure, or high-performance computing (8+ years)Production-grade expertise in PythonLow-level performance mastery: CUDA/cuDNN/Triton, CPU–GPU interactions, data movement, and kernel optimizationScaling at the frontier: experience with PyTorch and training jobs using data, context, pipeline, and model parallelismSystem-level mindset with a track record of tuning hardware–software interactions for maximum utilization