Job Url: https://wellfound.com/jobs/3673243-founding-ai-engineer-contract-project-based Job Description: Founding AI Engineer (Contract / Project-Based) $60k – $90k • 2.0% – 5.0% | Remote ( Everywhere ) |3 years of exp |Contract Reposted: 3 days ago• Recruiter recently active Hires remotely in Everywhere Remote Work Policy Remote only Company Location Los Angeles •  Kansas City Visa Sponsorship Not Available Preferred Timezones Central Time Collaboration Hours 8:00 AM - 12:00 PM Central Time Relocation Allowed Skills Python Machine Learning Computer Vision Artificial Intelligence Natural Language Processing Speech Recognition Deep Learning Data Processing PyTorch Generative AI LLMs About the job The Mission We are building the generative voice infrastructure for the Global South. Current models are optimized for clean, formal English in high-resource environments. We are solving for the inverse: Low-resource languages, high-noise environments, and heavy code-switching. We are looking for a Systems Mechanic—a single, highly capable engineer who can own the technical spine of a generative audio engine. This is not a research role for writing papers. This is an applied engineering role for someone who can take open-source foundations and force them to perform in the real world. The Engagement Structure: 3-Month Contract with clear deliverables. Objective: Deliver a functional, scalable inference engine that meets specific latency and quality benchmarks. Future: Successful delivery opens the door to a Founding Engineer role with significant equity. The Engineering Challenge You will be responsible for architecting and building the engine from the ground up. You must solve three specific constraints: The Data Reality: You will not have clean studio data. You must build a pipeline that can ingest "noisy" real-world audio (radio archives, podcasts, street interviews) and autonomously clean, align, and diarize it to create a high-fidelity training set. The Linguistic Complexity: The model must handle Code-Switching (fluidly mixing two languages in one sentence) and Tonal markers without breaking prosody. You must understand how to modify tokenizers to respect these nuances. The Inference Economics: We are not burning venture capital on infinite compute. You must quantize and optimize the model to run on consumer-grade GPUs with low latency. Efficiency is a constraint, not a nice-to-have. What You Will Own End-to-End Pipeline: From raw audio ingestion to served API response. Model Fine-Tuning: Adapting foundation models to highly specific, low-resource dialects. Inference Architecture: Building a stateless, containerized inference server that handles concurrent requests with sub-200ms latency. The DNA We Need Systems Thinker: You don't just train models; you build products. You understand how the model sits inside a container, how the API handles backpressure, and how the tokenizer affects the runtime. Data Realist: You know that 80% of the work is in the dataset. You are comfortable writing custom scripts to slice, denoise, and filter terabytes of audio. First-Principles Optimiser: You understand why a model is slow. You are comfortable with quantization, distillation, and kernel-level optimizations to squeeze performance out of limited hardware. How to Apply We do not read generic cover letters. To demonstrate your understanding of the problem space, please answer the following question in your application: > "We need to fine-tune a generative voice model on a low-resource dialect that heavily mixes English with a tonal local language. The training data comes from noisy radio broadcasts. > Describe your specific technical workflow to turn this raw audio into a clean, aligned dataset. How would you handle the tokenizer issues caused by the mixed languages?" *(Answer in 3-5 sentences. Focus on the architectural approach, not specific tool names).