Job Title: Contract Data Engineer

Company Name: Backstroke

Job Details: RemoteContract

Job Url: https://hiring.cafe/viewjob/r7tuawrpeka8d02y

Job Description: Posted 1mo agoContract Data Engineer@ BackstrokeView All JobsWebsiteUnited StatesRemoteContractResponsibilities:ingesting data, building pipelines, monitoring healthRequirements Summary:Contract Data Engineer to build production data pipelines on AWS for ML workflows; 6-week part-time engagement.Technical Tools Mentioned:AWS S3, AWS IAM, AWS Lambda, AWS CloudWatch, AWS SNS, AWS EventBridge, AWS Glue, AWS ECS, AWS EKS, AWS Step Functions, Python, SQL, Spark, dbt, Airflow, Dagster, Prefect
Contract Duration: 6 weeksHours: Part-time, 15–20 hours/week Start Date: ASAPReports To: Chief Data Scientist, Backstroke.com
About the Role
Backstroke.com is seeking a part-time contract Data Engineer to support critical data engineering work powering our predictive modeling efforts. In this 6-week engagement, you’ll help bring raw data into production-grade pipelines, improve data reliability and observability, and help maintain a large-scale dataset used for machine learning and embedding-based predictive models.
This role is hands-on and execution-focused, working closely with the Chief Data Scientist to accelerate modeling throughput and strengthen the stability and usability of our data foundation.
Key Responsibilities

Ingest raw data into production data pipelines used for data science modeling (batch and/or near real-time as needed)
Build and enhance AWS-based data workflows, leveraging best practices for scalability and security
Set up alerts and notifications in AWS to monitor pipeline health, failures, latency, and data quality issues
Create and manage a database layer that stores transformed data, including embeddings used for predictive models
Support management of a large-scale dataset, including movement, cleaning, normalization, and maintaining consistency for modeling use

Required Qualifications

Strong experience as a Data Engineer supporting machine learning or data science teams
Deep working knowledge of AWS services, such as (or similar):

S3, IAM, Lambda, CloudWatch, SNS, EventBridge
Glue, ECS/EKS, Step Functions (nice to have)

Experience building data pipelines (e.g., Python, SQL, Spark, dbt, Airflow, Dagster, Prefect, or similar tools)
Experience designing and maintaining databases for ML workflows, including embedding stores and feature-like datasets
Comfort working with large datasets and ensuring performance, reliability, and correctness
Ability to work independently, communicate clearly, and deliver quickly in a contractor environment

Preferred / Nice-to-Have

Familiarity with vector databases and embedding storage patterns (e.g., pgvector, OpenSearch, Pinecone, FAISS, etc.)
Exposure to MLOps concepts (feature pipelines, training dataset versioning, model monitoring)
Experience with data quality tooling (e.g., Great Expectations, Monte Carlo, custom checks)

Deliverables & Outcomes (6-Week Goals)

Reliable ingestion of raw data into modeling pipelines
Monitoring and alerting for critical pipeline workflows in AWS
Operational database/storage system for embedding-ready transformed data
Improved processes for handling and cleaning a large dataset used in predictive models
Clear documentation of pipeline architecture and handoff notes for the internal team

Working Style
You’ll collaborate directly with the Chief Data Scientist and contribute to a fast-moving, data-driven team. We value pragmatic engineering, clear documentation, and systems that are reliable and easy to operate.