Company Name: Doctor Evidence

Job Details: RemoteFull,Time

Job Url: https://hiring.cafe/viewjob/7gbqeqp75tubm7cg

Job Description: Posted 1mo agoSenior AI Engineer@ Doctor EvidenceView All JobsSanta Monica, California, United StatesRemoteFull TimeResponsibilities:Research prototyping, Model development, RAG designRequirements Summary:Experienced AI/ML engineer with NLP, RAG, and large language model experience; proficient in Python, PyTorch, and ML tooling; familiar with Docker and cloud environments.Technical Tools Mentioned:Python, PyTorch, Hugging Face Transformers, PEFT, LoRA, QLoRA, LangChain, LangGraph, Docker, Elasticsearch, PGVector
Title:                        Senior AI Engineer
Reports to:           VP, Software Engineering / Chief Architect
Status:                  Employee, Exempt
Location:              Remote (must be located in the US)
Hire Date:             Immediate          
 
COMPANY SUMMARY 
Dr.Evidence is the preeminent AI-powered insights platform for life sciences, enabling teams to generate rapid and relevant insights grounded in published medical literature, clinical trial data, drug labels, regulatory documents and beyond, delivering value from drug discovery through commercialization. We push the boundaries of healthcare technology and allow for new possibilities in science, enabling more informed decision making and faster time-to-market for accelerated impact.
 
RESEARCH & DEVELOPMENT (R&D) ORGANIZATION SUMMARY
The Dr.Evidence R&D organization includes software engineers, QA/QC engineers, product management, scrum masters, and business analysts. R&D is responsible for all aspects of product development, from ideation to maintenance. Our software engineers are experienced, talented, precise, and move at a fast pace to ensure speed-to-market goals for our customers are met.
 
Responsibilities

Research & Prototyping

Rapidly prototype state-of-the-art ML/NLP/LLM architectures using frameworks like Python, PyTorch, Hugging Face, and LangChain/LangGraph.
Assess open-source and commercial models (e.g., Llama 3, Mistral, GPT-series, Qwen, and other modern LLM architectures) for accuracy, latency, cost, hallucination risk, and compliance.
Run structured model evaluations (accuracy, relevance, precision/recall, hallucination checks).


Model Development & Training

Build scalable training pipelines for supervised and self-supervised learning paradigms.
Implement efficient techniques such as LoRA/QLoRA, instruction tuning, and quantization when needed.
Develop agentic workflows (tool-calling, iterative reasoning, ranking, and multi-step pipelines) for real-world use cases.


Improve and maintain existing ML pipelines.
Optimize inference performance with batching, quantization, and efficient serving frameworks.

 
Retrieval-Augmented Generation (RAG) & Search

Design and optimize RAG systems that integrate with internal data sources.
Build hybrid retrieval using Elasticsearch/BM25 + vector search (PGVector or other vector DBs).
Tune chunking, embeddings, reranking, and hallucination-prevention strategies for long documents.

 
Data Engineering for AI

Prepare and refine text datasets for training, evaluation, and fine-tuning.
Generate synthetic training data using LLMs to improve extraction accuracy, classification, and reasoning.
Build small, targeted datasets for fine-tuning domain-specific models.


LLM Specialization

Fine-tune and align open-source LLMs for domain-specific tasks (RAG, agents, tool-calling, reasoning).
Implement safety layers: prompt guards, output filters, adversarial robustness testing.


Productionization & MLOps

Containerize models with Docker; orchestrate with Docker Swarm.
Build monitoring for model quality, latency, hallucinations, and regressions.


Collaborate with DevOps and Architecture teams on cost, performance, and scalability decisions.
Implement observability around prompts, retrieval, and model outputs.


Cross-Functional Impact

Collaborate with Product to define AI roadmaps and success metrics.
Partner with Product and Engineering to design new AI capabilities across modules.

 
Qualifications

Education: MS/PhD in CS, ML, Data Science, or related field; or BS + 5+ years of intensive AI industry experience.

Core Technical Skills:

Languages: Python (expert); familiarity with modern web stacks (e.g., PERN) is a plus.
Experience with PyTorch, Hugging Face Transformers, and common LLM tooling (PEFT, LoRA, QLoRA).
Strong understanding of NLP: tokenization, embeddings, chunking, NER, classification, summarization.
Experience with RAG pipelines, vector databases, retrieval/ranking strategies.
Experience designing and running model evaluation pipelines.
Comfortable with Docker-based deployments and cloud environments.


Knowledge of Elasticsearch.

 
Proven Track Record:

Shipped at least 3 production LLM-powered features (e.g., chatbots, summarizers, agents).
Experience working with modern large-scale models.
Experience designing clean, maintainable architectures for LLM features.
Strong documentation and communication skills.