Job Title: Founding AI Research Engineer Company Name: Deep Reasoning Labs Job Url: https://www.simplyhired.com/job/9jYt-icHuPfDqQTghmAabjNxWJkEG7Bart7NZG50KWIUFNF2xVYHHA Job Description: Founding AI Research Engineer Deep Reasoning Labs Remote Job Details Full-time $90,000 - $200,000 a year Qualifications Reinforcement learning Automation PyTorch Calibration Quantization Analysis skills Experimental design Developing data pipelines Scalability Model training Senior level AI Research & development Machine learning frameworks Artificial intelligence research Python Debugging Full Job Description Company Deep Reasoning Labs is building a deep reasoning layer for LLMs focused on long-horizon coding: 10–30 minute solve loops with branch exploration, parallel execution, and search strategies grounded in real tool/runtime feedback and deep code analysis signals. The goal is a system that reliably improves on complex codebase tasks with more test-time compute. Role You will own core R&D and implementation of test-time scaling for coding: search/branching policies, verifiers/PRMs, SFT fine-tuning, RLVR-style training signals, and rigorous evaluation. A major emphasis is integrating program analysis and code transformation signals (static + dynamic analysis) into the reasoning loop. This is a hands-on role: you will ship systems that run continuously, generate data, and measurably improve. What you’ll work on Test-time scaling + search for code generation and debugging: Best-first / beam variants, MCTS-like approaches, branch pruning, reranking, caching Tool- and execution-grounded loops (compile/test/run/linters/typecheck) Branch management policies: when to explore vs exploit, how to allocate compute, how to stop early PRM / verifier systems: Step-level scoring and trajectory evaluation Training data generation, label quality control, calibration, and failure analysis Building “decision policies” that use verifier signals to steer branching/search or prune early. Post-training strategy: Parameter-efficient fine-tuning (e.g., QLoRA) with SFT and related methods RL-style post-training driven by verifiable rewards (e.g., tests/compilation/static checks; methods such as GRPO-style updates when appropriate. Use verifiable signals (tests/compilation/static checks) and preference signals where useful Program analysis + transformations: Using signals such as AST/IR structure, type system feedback, build graphs, lint/static checks, and deeper dataflow/CPG-style analysis Applying automated code transformations and repair actions that are validated by compilation/tests Evaluation + ablations: Build suites for long-horizon coding tasks and measure compute vs success curves Design experiments to isolate what truly drives gains What success looks like (first ~90 days) End-to-end “deepthink for code” loop running on a real task suite with branching + execution feedback Clear win vs a strong baseline using controlled ablations A verifier/PRM stack that improves selection and branch allocation under compute constraints Requirements (must-have) Strong implementation ability in Python + PyTorch, with good engineering hygiene Demonstrated experience building inference-time search/reranking/verifiers for LLMs or sequence models Comfort with data pipelines, eval harnesses, and running systematic experiments Clear thinking about failure modes, measurement, and falsifiable hypotheses Nice-to-have Experience with any of: PRMs, reward models, preference learning, RLHF/RLAIF, RL for sequence generation Strong background in compilers/tooling, program synthesis, automated debugging, or code agents Familiarity with code analysis signals (AST, type systems, static checks, CPG/dataflow concepts) Tech stack Python, PyTorch, vLLM/SGLang or equivalent serving, distributed training/inference tooling (will be managed by the platform engineer), internal eval infrastructure. Location / work model Remote-first (US/Canada). Strong preference for overlap with Pacific Time. Periodic in-person sprints in SF are a plus. Compensation Market-competitive base (location-based) + meaningful founding-level equity. Pay: $90,000.00 - $200,000.00 per year Work Location: Remote