Job Title: Founding AI Research Engineer

Company Name: Deep Reasoning Labs

Job Url: https://www.simplyhired.com/job/9jYt-icHuPfDqQTghmAabjNxWJkEG7Bart7NZG50KWIUFNF2xVYHHA

Job Description: Founding AI Research Engineer
Deep Reasoning Labs
Remote

Job Details
Full-time
$90,000 - $200,000 a year
Qualifications
Reinforcement learning
Automation
PyTorch
Calibration
Quantization
Analysis skills
Experimental design
Developing data pipelines
Scalability
Model training
Senior level
AI
Research & development
Machine learning frameworks
Artificial intelligence research
Python
Debugging
Full Job Description
Company

Deep Reasoning Labs is building a deep reasoning layer for LLMs focused on long-horizon coding: 10–30 minute solve loops with branch exploration, parallel execution, and search strategies grounded in real tool/runtime feedback and deep code analysis signals. The goal is a system that reliably improves on complex codebase tasks with more test-time compute.

Role

You will own core R&D and implementation of test-time scaling for coding: search/branching policies, verifiers/PRMs, SFT fine-tuning, RLVR-style training signals, and rigorous evaluation. A major emphasis is integrating program analysis and code transformation signals (static + dynamic analysis) into the reasoning loop. This is a hands-on role: you will ship systems that run continuously, generate data, and measurably improve.

What you’ll work on

Test-time scaling + search for code generation and debugging:

Best-first / beam variants, MCTS-like approaches, branch pruning, reranking, caching
Tool- and execution-grounded loops (compile/test/run/linters/typecheck)
Branch management policies: when to explore vs exploit, how to allocate compute, how to stop early
PRM / verifier systems:

Step-level scoring and trajectory evaluation
Training data generation, label quality control, calibration, and failure analysis
Building “decision policies” that use verifier signals to steer branching/search or prune early.
Post-training strategy:

Parameter-efficient fine-tuning (e.g., QLoRA) with SFT and related methods
RL-style post-training driven by verifiable rewards (e.g., tests/compilation/static checks; methods such as GRPO-style updates when appropriate.
Use verifiable signals (tests/compilation/static checks) and preference signals where useful
Program analysis + transformations:

Using signals such as AST/IR structure, type system feedback, build graphs, lint/static checks, and deeper dataflow/CPG-style analysis
Applying automated code transformations and repair actions that are validated by compilation/tests
Evaluation + ablations:

Build suites for long-horizon coding tasks and measure compute vs success curves
Design experiments to isolate what truly drives gains
What success looks like (first ~90 days)

End-to-end “deepthink for code” loop running on a real task suite with branching + execution feedback
Clear win vs a strong baseline using controlled ablations
A verifier/PRM stack that improves selection and branch allocation under compute constraints
Requirements (must-have)

Strong implementation ability in Python + PyTorch, with good engineering hygiene
Demonstrated experience building inference-time search/reranking/verifiers for LLMs or sequence models
Comfort with data pipelines, eval harnesses, and running systematic experiments
Clear thinking about failure modes, measurement, and falsifiable hypotheses
Nice-to-have

Experience with any of: PRMs, reward models, preference learning, RLHF/RLAIF, RL for sequence generation
Strong background in compilers/tooling, program synthesis, automated debugging, or code agents
Familiarity with code analysis signals (AST, type systems, static checks, CPG/dataflow concepts)
Tech stack

Python, PyTorch, vLLM/SGLang or equivalent serving, distributed training/inference tooling (will be managed by the platform engineer), internal eval infrastructure.

Location / work model

Remote-first (US/Canada). Strong preference for overlap with Pacific Time. Periodic in-person sprints in SF are a plus.

Compensation

Market-competitive base (location-based) + meaningful founding-level equity.

Pay: $90,000.00 - $200,000.00 per year

Work Location: Remote