Job Title: LLM Operations Engineer

Company Name: Health Business Solutions LLC

Job Url: https://www.simplyhired.com/job/tN8MtGqGPpQgbsT7L239yREI_wLqO5tRUQlb4ebdMHC_2IMe_xqclw

Job Description: 
LLM Operations Engineer
Health Business Solutions LLC
Remote

Job Details
1 day ago
Qualifications
Data encryption
SSO
Computer science
Databricks
Computer Science
Data lake
Software deployment
Bachelor of Science
Engineering
IT system monitoring
Spark
GitHub Actions
Technology security practices
3 years
Automating deployment processes
Microservices
SQL
Information security compliance
RBAC
Analysis skills
Incident response
Bachelor's degree
Cloud-based systems
Model deployment
Azure DevOps proficiency
Feature extraction
Data quality monitoring
Master of Science
Data interpretation
Scalability
Master’s degree in computer science
Integration testing
Unit testing
Data validation
Senior level
Batch data processing
Cross-functional collaboration
Bachelor's degree in computer science
Model evaluation
GitLab
Communication skills
Data warehouse
Python
MLOps
Generative AI
Cross-functional communication
Bachelor's degree in data science
Data Science
System performance monitoring
Database software proficiency
Full Job Description
We are looking for an LLM Ops Engineer with deep Databricks experience to build, automate, and scale our machine learning delivery pipelines on the Lakehouse. You’ll own the model lifecycle end‑to‑end—from data ingestion and feature engineering to CI/CD, deployment, monitoring, and governance—ensuring our ML systems are reliable, auditable, secure, and cost‑efficient.

You will partner closely with Leadership, Data Engineers, and subject matter experts to productionize models using Databricks (Delta Lake, Unity Catalog, MLflow, Feature Store, Workflows) and modern DevOps practices across our cloud environments.

Key Responsibilities

Lakehouse & Databricks Platform

Design and maintain Databricks workspaces, clusters, SQL Warehouses, cluster policies, and workspace governance (RBAC, SCIM, SSO, secret scopes).
Implement robust data pipelines with Delta Lake (ACID tables, Z‑ordering, OPTIMIZE/VACUUM), Delta Live Tables (DAGs, expectations), and Workflows (jobs, task orchestration).
Set up Unity Catalog for cross-workspace governance: data & model lineage, permissions, catalogs/schemas, data tags, and auditability.
Operationalize ML models using MLflow (tracking, artifacts, metrics, model registry, approvals, stages: Staging/Production).
Build/maintain Feature Store entities and feature pipelines; enforce reproducibility and feature governance.
Establish model deployment patterns (batch scoring, streaming, microservices) using Model Serving.
Create scalable CI/CD for notebooks, repos, and jobs using Azure DevOps, including unit/integration tests, data/feature validation, and registry promotions.
Implement data quality and ML quality controls (e.g., Great Expectations/Delta expectations, statistical tests, drift detection, canary releases).
Build robust monitoring & alerting for data freshness, pipeline SLAs, model performance, drift, and operational metrics.
Optimize performance and cost (autoscaling, spot instances, DBR runtimes, caching, storage tiers).
Enforce compliance and security best practices (PII handling, encryption at rest/in transit, network controls, secret management).
Partner with data engineers and subject matter experts to standardize templates for experiments, pipelines, model packaging, and deployment.
Document patterns and build internal tooling (CLI utilities, Python packages) to streamline model release and observability.
Contribute to incident response, post‑mortems, and continuous improvements.
ML Lifecycle & MLOps

Operationalize ML models using MLflow (tracking, artifacts, metrics, model registry, approvals, stages: Staging/Production).
Build/maintain Feature Store entities and feature pipelines; enforce reproducibility and feature governance.
Establish model deployment patterns (batch scoring, streaming, microservices) using Model Serving.
Create scalable CI/CD for notebooks, repos, and jobs using Azure DevOps, including unit/integration tests, data/feature validation, and registry promotions.
Implement data quality and ML quality controls (e.g., Great Expectations/Delta expectations, statistical tests, drift detection, canary releases).
Build robust monitoring & alerting for data freshness, pipeline SLAs, model performance, drift, and operational metrics.
Operationalize ML models using MLflow (tracking, artifacts, metrics, model registry, approvals, stages: Staging/Production).
Build/maintain Feature Store entities and feature pipelines; enforce reproducibility and feature governance.
Establish model deployment patterns (batch scoring, streaming, microservices) using Model Serving.
Create scalable CI/CD for notebooks, repos, and jobs using Azure DevOps, including unit/integration tests, data/feature validation, and registry promotions.
Implement data quality and ML quality controls (e.g., Great Expectations/Delta expectations, statistical tests, drift detection, canary releases).
Build robust monitoring & alerting for data freshness, pipeline SLAs, model performance, drift, and operational metrics.
Design, deploy, and operate LLMOps pipelines for Retrieval‑Augmented Generation (RAG), including document ingestion, embedding generation, vector storage, retrieval strategies, prompt/version management, and evaluation, using Databricks (Delta Lake, MLflow, Model Serving) to ensure secure, auditable, and production‑grade GenAI systems.
Infrastructure & Security

Optimize performance and cost (autoscaling, spot instances, DBR runtimes, caching, storage tiers).
Enforce compliance and security best practices (PII handling, encryption at rest/in transit, network controls, secret management).
Collaboration & Process

Partner with data engineers and subject matter experts to standardize templates for experiments, pipelines, model packaging, and deployment.
Document patterns and build internal tooling (CLI utilities, Python packages) to streamline model release and observability.
Contribute to incident response, post‑mortems, and continuous improvements.
Qualifications

Required

BS/MS in Computer Science, Engineering, Data Science, or equivalent practical experience.
3+ years of MLOps/ML Engineering/Platform Engineering experience in Databricks.
Hands‑on expertise with Databricks: Delta Lake, Unity Catalog, MLflow (Tracking/Registry), Feature Store, Workflows/Jobs, Repos, and Model Serving.
Strong Python engineering skills (packaging, testing, virtual environments); familiarity with Spark (PySpark) and SQL.
Experience with CI/CD (GitHub Actions/Azure DevOps/GitLab), artifact registries, and environment management.
Solid understanding of data/machine learning pipeline design (batch/streaming), data quality checks, and ML evaluation/monitoring.
Soft Skills

Excellent communication and organizational abilities.
Ability to work independently and as a part of cross-functional teams.
Comfortable operating in a fast-paced, changing environment.
Strong analytical and problem-solving skills, with the ability to interpret data and drive recommendations.
HBiz Approval & Disclaimer

This job description is intended to describe the general nature and level of work performed by individuals assigned to this position. It is not intended to be an exhaustive list of all duties, responsibilities, or qualifications required. Responsibilities may change based on business needs, client requirements, or operational priorities.

HBiz reserves the right to modify this job description at any time, with or without notice.

Employment with HBiz is at-will, meaning either the employee or the company may terminate employment at any time, with or without cause or notice, subject to applicable law.

HBiz is an Equal Opportunity Employer and is committed to providing a workplace free from discrimination and harassment. We celebrate diversity and are committed to creating an inclusive environment for all employees.