Job Url: https://www.remoterocketship.com/company/cornelisnetworks/jobs/senior-linux-infrastructure-manager-united-states Job Description: Cornelis Networks Website LinkedIn All Job Openings Cornelis Networks is a leading provider of intelligent, scalable, high-performance interconnects designed for AI applications. The company specializes in delivering end-to-end purpose-built high-performance fabrics to commercial, scientific, academic, and government organizations. Cornelis Networks’ solutions are aimed at enhancing performance, scalability, and efficiency across hyperscale, cloud AI, and on-premises AI/HPC environments. Their offerings are known for scalable architecture, high bandwidth solutions, and universal compatibility with accelerators and GPUs. Originating as an Intel spin-off, the company is positioned to challenge existing technologies like InfiniBand and Ethernet, providing advanced interconnects that power modern AI infrastructure. 51 - 200 employees πŸ€– Artificial Intelligence πŸ”§ Hardware 🏒 Enterprise πŸ’° $29M Series B on 2022-11 Senior Linux Infrastructure Manager Yesterday πŸ‡ΊπŸ‡Έ United States – Remote ⏰ Full Time 🟠 Senior πŸ§‘β€πŸ’» Full-stack Engineer πŸ¦… H1B Visa Sponsor Ansible Cloud Docker Linux NFS Python Apply Now Receive Emails with Similar Jobs Report problem πŸ“‹ Description β€’ Design, implement, and manage a Linux-based HPC environment with 200+ compute nodes β€’ Oversee the administration of batch compute systems including SLURM or LSF for optimal workload management β€’ Manage and optimize NFS systems and storage infrastructure to support engineering workflows β€’ Oversee observability systems (monitoring, logging, alerting) and drive continuous improvements in automation and root-cause analysis β€’ Drive adoption of "Infrastructure as Code" and automated workflows to reduce manual intervention β€’ Implement and enforce best practices for system availability, performance tuning, capacity planning, and lifecycle management β€’ Ensure high availability and performance of critical infrastructure services including VNC, NFS, license servers and GitHub β€’ Collaborate with engineering teams to understand compute requirements and optimize infrastructure accordingly β€’ Lead capacity planning and infrastructure expansion initiatives β€’ Manage resources responsible for the on-prem hardware installation, maintenance, and monitoring β€’ Drive adoption of AI within the infrastructure team and workflows 🎯 Requirements β€’ Bachelor's degree in Computer Science, Engineering, or related field (Master's preferred) β€’ Minimum 10 years of experience in Linux systems administration with focus on HPC environments β€’ Deep expertise with HPC workload managers (SLURM or LSF) β€’ Strong knowledge of NFS and distributed storage systems β€’ Experience implementing and managing monitoring solutions for large-scale computing environments β€’ Proficiency with infrastructure automation tools and scripting languages (Python, Bash, etc.) β€’ Strong troubleshooting and problem-solving skills and leadership abilities β€’ Hands-on technical expertise to be able to drive issue rootcause analysis and remediations πŸ–οΈ Benefits β€’ Competitive compensation package that includes equity, cash, and incentives β€’ Health and retirement benefits β€’ Medical, dental, and vision coverage β€’ Disability and life insurance β€’ Dependent care flexible spending account β€’ Accidental injury insurance β€’ Pet insurance β€’ Generous paid holidays β€’ 401(k) with company match β€’ Open Time Off (OTO) for regular full-time exempt employees β€’ Sick time, bonding leave, and pregnancy disability leave Apply Now