Job Title: Senior Site Reliability Engineer

Company Name: Lenovo

Job Url: https://www.simplyhired.com/job/DtB9Dm5O679Ol56XV2elB6IHktHsI0Ve8_k-RM8CfZGYhkmjWax-eg

Job Description: 
Senior Site Reliability Engineer
Lenovo
Morrisville, NC

Job Details
Full-time
$190,000 - $230,000 a year
7 hours ago
Qualifications
Performance dashboard reports
Dashboard development
Azure
Go
Computer Science
Incident management
Continuous Delivery (CD) implementation
Operational risk management
Software deployment
Engineering
IT system monitoring
System design
Google Cloud Platform
Scalable systems
Java
Automating deployment processes
AWS
C++
Incident response
Bachelor’s degree in engineering
Bachelor's degree
SRE
Scalability
System architecture design
Linux
Prometheus
Grafana
Distributed computing
Senior level
AI
Bachelor's degree in computer science
Python
Analytics
System performance monitoring
10 years
Full Job Description
General Information
Req #
WD00095779
Career Area:
Software Engineering
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Monday, March 2, 2026
Working Time:
Full-time
Additional Locations:
United States of America - Illinois - Chicago
Why Work at Lenovo
We are Lenovo. We do what we say. We own what we do. We WOW our customers.

Lenovo is a US$69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).


This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.
Description and Requirements
About Our Team

Lenovo is building Quantum, a next‑generation hybrid AI platform that spans Windows, Android, and cloud. As part of this vision, we are expanding the reliability engineering organization that powers Qira, Lenovo’s cross‑device Personal AI.

We are looking for Senior Site Reliability Engineers (SREs) to help us build and evolve the foundational reliability, observability, and operations capabilities that ensure Qira is fast, safe, and dependable for millions of users.
This role may support one of several teams within the SRE organization (e.g., Observability, Operations, or Service Reliability), depending on your strengths and interests.

Qira is operating with the speed, ownership, and creative latitude of a startup—yet supported by the scale, resources, and technical depth of Lenovo. We are building new systems, new tooling, and new operational models from the ground up, and we are doing so with clarity, intention, and high engineering standards.


Location: Open to remote work in the US. The preferred work location is Chicago, IL.


What You Might Work On

As a Senior SRE, you may be responsible for a subset of the following, depending on team placement and skill alignment:

Reliability & Performance Engineering

Improving the availability, scalability, and performance of distributed systems across device, edge, and cloud.

Defining or refining SLIs, SLOs, and error budgets for critical services.

Leading initiatives to remove single points of failure, improve resilience, and reduce operational risk.

Operational Excellence

Participating in on‑call rotations and contributing to incident response, triage, and post-incident reviews.

Developing automation, runbooks, and self‑healing systems to reduce alert noise and MTTR.

Enhancing operational readiness and supporting incident prevention programs.

Observability & Insight

Designing or improving observability systems using OpenTelemetry, Grafana, and modern signal pipelines.

Building dashboards, analytics, and alerting that illuminate system health and AI service behavior.

Ensuring telemetry is reliable, actionable, and tied to real‑world outcomes.

Deployments & Change Safety

Improving reliability of CI/CD workflows, including phased rollouts, canaries, shadow testing, and safe rollback mechanisms.

Contributing to the evolution of deployment tooling for device+edge+cloud hybrid systems.

Systems Design & Collaboration

Influencing architectural decisions by injecting reliability, observability, and operational considerations early in design.

Collaborating with AI/ML engineers, platform engineers, firmware teams, and product partners to deliver robust, dependable user experiences.


Basic Qualifications

10+ years of experience in Site Reliability Engineering, Production Engineering, DevOps, or large‑scale distributed systems operations

Bachelor’s Degree in Computer Science, Engineering, or a related technical discipline

Strong experience running production distributed systems at scale

Proficiency in at least one modern programming language (e.g., Python, Go, Java, C++)

Strong understanding of Linux systems, networking fundamentals, and system performance tuning

Experience with monitoring/observability (metrics, logs, tracing)

Hands‑on experience with cloud environments (Azure, AWS, or GCP)

Experience in incident management, on‑call rotations, and postmortem processes


Preferred Qualifications

Deep experience with Azure cloud services

Experience with OpenTelemetry for end‑to‑end instrumentation

Strong familiarity with Grafana, Prometheus, Loki, Tempo, or similar tools

Experience supporting AI/ML systems, model serving, or data‑intensive workloads

Background with hybrid architectures (device + edge + cloud)

Experience improving deployment reliability and progressive delivery systems

Passion for automation, reliability engineering, and reducing operational friction


What Success Looks Like

Systems become more observable, reliable, and predictable.

Incidents are resolved quickly, and follow‑up improvements prevent recurrence.

Alerting becomes more accurate, actionable, and trusted.

Deployments become safer and more consistent.

Teams move faster because reliability foundations are strong and intuitive.


The base salary budgeted range for this position is $190K - $230K. Individuals may also be considered for bonus and/or commission.


Lenovo’s various benefits can be found on www.lenovobenefits.com.
We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.
Additional Locations:
United States of America - Illinois - Chicago