Job Url: https://wellfound.com/jobs/3673243-founding-ai-engineer-contract-project-based

Job Description: Founding AI Engineer (Contract / Project-Based)
$60k – $90k • 2.0% – 5.0%
|
Remote (
Everywhere
)
|3 years of exp
|Contract
Reposted: 3 days ago• Recruiter recently active
Hires remotely in
Everywhere
Remote Work Policy
Remote only

Company Location
Los Angeles • 
Kansas City
Visa Sponsorship
Not Available

Preferred Timezones
Central Time
Collaboration Hours
8:00 AM - 12:00 PM Central Time
Relocation
Allowed
Skills
Python
Machine Learning
Computer Vision
Artificial Intelligence
Natural Language Processing
Speech Recognition
Deep Learning
Data Processing
PyTorch
Generative AI
LLMs
About the job
The Mission
We are building the generative voice infrastructure for the Global South.
Current models are optimized for clean, formal English in high-resource environments. We are solving for the inverse: Low-resource languages, high-noise environments, and heavy code-switching.

We are looking for a Systems Mechanic—a single, highly capable engineer who can own the technical spine of a generative audio engine. This is not a research role for writing papers. This is an applied engineering role for someone who can take open-source foundations and force them to perform in the real world.

The Engagement

Structure: 3-Month Contract with clear deliverables.
Objective: Deliver a functional, scalable inference engine that meets specific latency and quality benchmarks.
Future: Successful delivery opens the door to a Founding Engineer role with significant equity.
The Engineering Challenge
You will be responsible for architecting and building the engine from the ground up. You must solve three specific constraints:

The Data Reality: You will not have clean studio data. You must build a pipeline that can ingest "noisy" real-world audio (radio archives, podcasts, street interviews) and autonomously clean, align, and diarize it to create a high-fidelity training set.
The Linguistic Complexity: The model must handle Code-Switching (fluidly mixing two languages in one sentence) and Tonal markers without breaking prosody. You must understand how to modify tokenizers to respect these nuances.
The Inference Economics: We are not burning venture capital on infinite compute. You must quantize and optimize the model to run on consumer-grade GPUs with low latency. Efficiency is a constraint, not a nice-to-have.
What You Will Own

End-to-End Pipeline: From raw audio ingestion to served API response.
Model Fine-Tuning: Adapting foundation models to highly specific, low-resource dialects.
Inference Architecture: Building a stateless, containerized inference server that handles concurrent requests with sub-200ms latency.
The DNA We Need

Systems Thinker: You don't just train models; you build products. You understand how the model sits inside a container, how the API handles backpressure, and how the tokenizer affects the runtime.
Data Realist: You know that 80% of the work is in the dataset. You are comfortable writing custom scripts to slice, denoise, and filter terabytes of audio.
First-Principles Optimiser: You understand why a model is slow. You are comfortable with quantization, distillation, and kernel-level optimizations to squeeze performance out of limited hardware.
How to Apply
We do not read generic cover letters. To demonstrate your understanding of the problem space, please answer the following question in your application:

> "We need to fine-tune a generative voice model on a low-resource dialect that heavily mixes English with a tonal local language. The training data comes from noisy radio broadcasts.
> Describe your specific technical workflow to turn this raw audio into a clean, aligned dataset. How would you handle the tokenizer issues caused by the mixed languages?"

*(Answer in 3-5 sentences. Focus on the architectural approach, not specific tool names).