Job Title: Senior Site Reliability Engineer

Company Name: DemandBridge

Job Url: https://apply.workable.com/valsoft-corp/j/5F35CE24EC/?utm_source=linkedin.com&jr_id=69bd7abbb1060245628265d1

Job Description: Description
About DemandBridge 
DemandBridge (a Fluent Software Group/Valsoft company) operates mission-critical platforms that support core business and customer-facing systems. Our infrastructure runs on Microsoft Azure and Cloud Foundry, supporting production workloads with high availability, security, and compliance requirements. 

Reliability, automation, and operational excellence are foundational to how we operate. We invest in systems and practices that scale responsibly, reduce risk, and enable engineering teams to ship with confidence. 

 

The Opportunity 
DemandBridge is seeking a Senior Site Reliability Engineer to own the day-to-day reliability, availability, and operational readiness of our cloud platform and Azure-based infrastructure. This role serves as the primary DevOps / SRE owner for platform stability, automation, and compliance-related tooling. 

This is a hands-on, high-autonomy role ideal for someone who enjoys troubleshooting across layers, improving systems through automation and documentation, and thoughtfully adopting modern tooling—including AI-assisted operational tools—to improve incident response, observability, and operational efficiency. 

You’ll work closely with a junior teammate and coordinate with external vendors, while remaining the lead for systems reliability and operational excellence. 

 

What You’ll Do 
Platform & Reliability Ownership 

Own and operate a production cloud platform running on Microsoft Azure and Cloud Foundry (or comparable platforms) 
Ensure availability, performance, and reliability across infrastructure and platform components 
Serve as the primary escalation point for platform-level incidents 
 

Incident Response & Operational Excellence 
Lead incident response, root cause analysis, and post-incident remediation 
Use modern monitoring, alerting, and AI-assisted observability tools to improve detection, diagnosis, and resolution of incidents 
Drive continuous improvements to reduce operational risk, after-hours incidents, and manual intervention 
 

Security, Certificates & Secrets 

Own certificate and secrets lifecycle management, including TLS automation and secure secrets handling (e.g., CredHub, Vault) 
Ensure secure and compliant practices around identity, access, and credential management 
Partner with engineering teams to embed security and reliability best practices into platform workflows 
 

Automation & Infrastructure 

Automate common operational tasks using Bash and/or PowerShell 
Support and extend infrastructure-as-code using Terraform and/or Bicep 
Improve platform consistency and repeatability through Git-driven, automation-first workflows 
Leverage AI-assisted tooling to support scripting, troubleshooting, and operational documentation 
 

Compliance & Documentation 

Support PCI and other compliance activities, including technical control implementation, audit support, and remediation tracking 
Maintain clear runbooks, diagrams, and documentation to enable repeatable operations and knowledge transfer 
Partner with internal teams and external auditors to support compliance requirements 
 

Collaboration & Leadership 

Work closely with application engineers, junior SRE/support staff, and vendor partners 
Provide technical guidance and mentorship to junior teammates 
Act as a trusted partner to engineering teams on reliability, performance, and operational readiness 
 

Qualifications & Experience 
Required 

5+ years of experience in SRE, DevOps, or infrastructure engineering roles supporting production environments 
Hands-on experience with Cloud Foundry, Kubernetes, or Docker in production (Cloud Foundry preferred) 
Strong experience with Microsoft Azure, including networking, compute, IAM, and monitoring 
Strong Linux systems administration experience (RHEL preferred); comfort with Windows Server environments 
Proficiency in PowerShell and/or Bash scripting 
Solid understanding of TLS/PKI workflows, including certificate management and rotation 
Proven experience managing incidents end-to-end and performing root cause analysis 
Strong written communication skills and a disciplined approach to documentation 
Experience using modern automation, observability, or AI-enabled operational tools to improve reliability and efficiency 
 

Preferred (Nice to Have) 

Experience with BOSH, CredHub, Vault, or similar infrastructure tooling 
Exposure to PCI or other compliance frameworks and audit cycles 
Familiarity with VPN gateways, DNS management, or email infrastructure (SMTP, SPF/DKIM/DMARC) 
Experience operating in Git-driven, automation-heavy environments 
 

About the Team  
You’ll be joining a small, high-impact team responsible for the infrastructure and reliability of critical systems used across DemandBridge. We operate production workloads on Azure and Cloud Foundry and prioritize stability, security, and automation in everything we do. 

While much of the engineering organization focuses on application development, this role is the operational backbone—working closely with developers, junior staff, and external partners to ensure systems remain reliable, secure, and compliant. 

If you enjoy owning real systems, improving operational maturity, and building infrastructure that engineers and customers can rely on, this is a role where your impact will be immediate and meaningful. 

#DemandBridge