Job Title: ML Infrastructure Engineer Company Name: Echo Neurotechnologies Job Details: $180k-$230k/yrRemoteFull,Time Job Url: https://hiring.cafe/viewjob/q27gr9hb6mlnwvhm Job Description: Posted 2mo agoML Infrastructure Engineer@ Echo NeurotechnologiesView All JobsWebsiteSan Francisco, California, United States$180k-$230k/yrRemoteFull TimeResponsibilities:designing infrastructure, building pipelines, collaborating teamsRequirements Summary:Senior ML infrastructure engineer with 5+ years in software/data infra, strong Python, PyTorch, distributed training, and collaboration with research teams.Technical Tools Mentioned:Python, PyTorch, DeepSpeed, Megatron-LM, Ray, CUDA, Kubernetes, Docker, C++, Go Company OverviewEcho Neurotechnologies is an exciting new startup in the Brain-Computer Interface (BCI) space, driving innovation through advanced hardware engineering and AI solutions. Our mission is to deliver cutting-edge technologies that restore autonomy to people living with disabilities and improve their quality of life.Team CultureJoin a small, dedicated team of knowledgeable and motivated professionals. Our early-stage environment offers the opportunity to take ownership of broad decisions with significant and long-lasting impact. We emphasize continuous learning and growth, fostering cross-functional collaboration where your contributions are vital to our success.Job SummaryWe are seeking a Senior Machine Learning Infrastructure Engineer to join our team. The person who fills this role will design, build, and scale infrastructure to power massive-scale data, modeling, and analysis platforms, playing a critical role in shaping a high-performance, production-grade ML ecosystem to support rapid experimentation with diverse datasets spanning neural signals, behavior, and more. This person will have significant ownership over the ML R&D platform, working closely with domain experts to architect new cloud infrastructure, data pipelines, and modeling flows. The work will ultimately enable the development of cutting-edge models for neuroscientific discovery and neural decoding, empowering brain-computer interface technology to improve the lives of patients living with severe neurological conditions.Key ResponsibilitiesCreate flexible and performant ML infrastructureDesign and build systems ML cloud infrastructure to enable massive-scale modeling and analyticsSupport diverse model exploration, hyperparameter optimization, pretraining, fine-tuning, and evaluation processesDesign and optimize scalable distributed training pipelines, with support for features such model sharding, cross-GPU communication, and real-time training monitoringCreate, operate, and maintain robust ML platforms and services across the model lifecycleMake informed architecture decisions that balance performance, cost, reliability, and scalabilityBuild diverse and scalable data platformsDesign, build, and optimize massive-scale databases and data pipelines for scalable, flexible, and reliable data accessExplore research-driven, tailored data solutions using existing and simulated data, comparing performance and efficiency across solutions for typical data-access patternsCreate infrastructure and pipelines for ingesting internal and external datasets with varied shapes, formats, and associated metadataDesign and assess custom data formats for efficient storage and slicing of high-dimensional time-series dataEnable efficient data movement, preprocessing, and artifact management for data lineage and modeling reproducibilityMeet company standards for delivered solutionsEstablish best practices for reliability, observability, reproducibility, and operational excellence across the ML ecosystemMake informed and collaborative decisions with domain experts across the software & ML teamsFoster visibility and reproducibility within the company by maintaining extensive documentation of design decisions, evaluations of viable alternatives for selected solutions, pipeline assessments, etc.Support ML R&D operations while preparing for eventual incorporation into product pipelinesRequired QualificationsBachelor's degree in Computer Science, Electrical Engineering, or a related technical discipline5+ years of industry experience in software engineering, large-scale data infrastructure, or systems MLExtensive proficiency in PythonFamiliarity with PyTorchExperience designing, building, and maintaining high-throughput data pipelines for large and diverse datasetsExperience working with distributed-training frameworks (e.g. FSDP, DeepSpeed, Megatron-LM, Ray, etc.)Experience building or optimizing ML training pipelines for transformers or other large neural-network modelsDemonstrated ability to partner closely with research and modeling teams to productionize workflowsExcellent communication and collaboration skills to work effectively on cross-functional and interdisciplinary teamsExperience having technical ownership over at least one successfully implemented collaborative projectPreferred QualificationsAdvanced degree (MS or PhD) in Computer Science, Electrical Engineering, or a related technical disciplineProficiency in C++, Go, CUDA, Rust, and/or JavaExperience in data engineering and systems ML for time-series dataDeep understanding of the fundamentals of distributed systems, including scalability, fault tolerance, monitoring, observability, scheduling, performance tuning, and resource managementExperience with cloud-native environments and orchestration (Kubernetes, Docker, etc.)Experience scaling foundation-model training infrastructure or multi-cluster computing environmentsWhat We OfferAn opportunity to work on exciting, cutting-edge projects to transform patients’ lives in a highly collaborative work environment.Competitive compensation, including stock options.Comprehensive benefits package.401(k) program with matching contributions.Equal Opportunity EmployerEcho Neurotechnologies is an Equal Opportunity Employer (EOE). We celebrate diversity and are committed to creating an inclusive environment for all employees.ConfidentialityAll applications will be treated confidentially. Applicants may be asked to sign an NDA after the initial stages of the interview process.