Job Title: Contract Data Engineer Company Name: Backstroke Job Details: RemoteContract Job Url: https://hiring.cafe/viewjob/r7tuawrpeka8d02y Job Description: Posted 1mo agoContract Data Engineer@ BackstrokeView All JobsWebsiteUnited StatesRemoteContractResponsibilities:ingesting data, building pipelines, monitoring healthRequirements Summary:Contract Data Engineer to build production data pipelines on AWS for ML workflows; 6-week part-time engagement.Technical Tools Mentioned:AWS S3, AWS IAM, AWS Lambda, AWS CloudWatch, AWS SNS, AWS EventBridge, AWS Glue, AWS ECS, AWS EKS, AWS Step Functions, Python, SQL, Spark, dbt, Airflow, Dagster, Prefect Contract Duration: 6 weeksHours: Part-time, 15–20 hours/week Start Date: ASAPReports To: Chief Data Scientist, Backstroke.com About the Role Backstroke.com is seeking a part-time contract Data Engineer to support critical data engineering work powering our predictive modeling efforts. In this 6-week engagement, you’ll help bring raw data into production-grade pipelines, improve data reliability and observability, and help maintain a large-scale dataset used for machine learning and embedding-based predictive models. This role is hands-on and execution-focused, working closely with the Chief Data Scientist to accelerate modeling throughput and strengthen the stability and usability of our data foundation. Key Responsibilities Ingest raw data into production data pipelines used for data science modeling (batch and/or near real-time as needed) Build and enhance AWS-based data workflows, leveraging best practices for scalability and security Set up alerts and notifications in AWS to monitor pipeline health, failures, latency, and data quality issues Create and manage a database layer that stores transformed data, including embeddings used for predictive models Support management of a large-scale dataset, including movement, cleaning, normalization, and maintaining consistency for modeling use Required Qualifications Strong experience as a Data Engineer supporting machine learning or data science teams Deep working knowledge of AWS services, such as (or similar): S3, IAM, Lambda, CloudWatch, SNS, EventBridge Glue, ECS/EKS, Step Functions (nice to have) Experience building data pipelines (e.g., Python, SQL, Spark, dbt, Airflow, Dagster, Prefect, or similar tools) Experience designing and maintaining databases for ML workflows, including embedding stores and feature-like datasets Comfort working with large datasets and ensuring performance, reliability, and correctness Ability to work independently, communicate clearly, and deliver quickly in a contractor environment Preferred / Nice-to-Have Familiarity with vector databases and embedding storage patterns (e.g., pgvector, OpenSearch, Pinecone, FAISS, etc.) Exposure to MLOps concepts (feature pipelines, training dataset versioning, model monitoring) Experience with data quality tooling (e.g., Great Expectations, Monte Carlo, custom checks) Deliverables & Outcomes (6-Week Goals) Reliable ingestion of raw data into modeling pipelines Monitoring and alerting for critical pipeline workflows in AWS Operational database/storage system for embedding-ready transformed data Improved processes for handling and cleaning a large dataset used in predictive models Clear documentation of pipeline architecture and handoff notes for the internal team Working Style You’ll collaborate directly with the Chief Data Scientist and contribute to a fast-moving, data-driven team. We value pragmatic engineering, clear documentation, and systems that are reliable and easy to operate.