Job Title: Senior Data Engineer

Company Name: Codebridge

Job Url: https://boards.greenhouse.io/embed/job_app?token=5052277007&utm_source=jobright&jr_id=69b0a1320b2db6275c04e743

Job Description: About the Role
We are seeking a talented Senior Data Engineer to own and evolve our data processing pipeline. You'll work across a production-scale medallion architecture that ingests, transforms, and delivers customer data through a multi-stage pipeline serving clients in the utility and financial services industries. This role sits at the center of our data infrastructure - building the pipelines and tooling that power everything from daily data refreshes to ML feature engineering to platform delivery.

Primary Responsibilities
Pipeline Development & Operations (Primary)

Design, develop, and maintain our core Python ETL framework by writing reusable, well-tested modules that power data transformations across client pipelines.
Develop and optimize our automated refresh pipeline orchestrated through AWS Batch, Lambda, Step Functions, and EventBridge.
Build Python integrations with external systems (SFTP, third-party APIs, client platforms) that are robust, testable, and reusable.
Identify and eliminate manual bottlenecks in data onboarding and analysis through well-designed automation.
Build and extend internal web applications (FastAPI, SQLAlchemy, PostgreSQL) that support pipeline orchestration, client configuration, and data platform operations.
Ensure data integrity and security throughout project lifecycles.
Client Data Support (Secondary)

Write efficient server-side Python code, leveraging the Pandas and PySpark DataFrame APIs for scalable data transformations and aggregations.
Optimize Spark jobs for cost and performance at scale.
Debug complex data quality issues across client pipelines.
Mentor junior engineers on data transformation patterns, aggregation frameworks, and best practices.
Internal Tooling (Tertiary)

Contribute to our internal metadata management application (FastAPI backend, React/TypeScript frontend).
Build API endpoints, write database migrations, and occasionally develop frontend features.
Maintain the metadata layer that drives pipeline configuration and data governance.
What We're Looking For
Required

Bachelor's degree in a related field like Data Engineering, Computer Science, Data Science, Math, Statistics with 3+ years of experience or 5+ years of relevant experience.
Experience designing and maintaining production ETL/ELT pipelines with proper error handling, idempotency, and monitoring.
Advanced proficiency in Python, with deep experience in Pandas and PySpark (DataFrame API, SQL, performance tuning, distributed joins).
Strong SQL skills with PostgreSQL, including query optimization, indexing strategies, and schema design.
Hands-on experience with AWS services including but not limited to: S3, Lambda, Batch, SageMaker, and StepFunctions.
Experience with PyArrow and columnar data formats (Parquet) and data lake patterns.
Strong problem-solving skills with the ability to work autonomously, make architectural decisions, and manage multiple concurrent projects.
Excellent communication skills with the ability to drive cross-functional collaboration, proactively engaging stakeholders to align on requirements and solutions.
Experience using Git for version control and repository management.
Authorized to work in the United States.