Job Title: AI Engineer, Senior Staff

Company Name: Lattice Semiconductor

Job Details: RemoteFull,Time

Job Url: https://hiring.cafe/viewjob/zsih7l5dk60phu33

Job Description: Posted 2w agoAI Engineer, Senior Staff@ Lattice SemiconductorView All JobsWebsiteUnited StatesRemoteFull TimeResponsibilities:selecting models, building pipelines, mentoring engineersRequirements Summary:8–12 years in software/ML engineering; 3–5+ years training or fine-tuning large models; expertise in LLM fine-tuning, transformers, PyTorch, HuggingFace, DeepSpeed, vector databases; experience building RAG pipelines and model serving; strong Python and MLOps skills.Technical Tools Mentioned:PyTorch, HuggingFace, DeepSpeed, FAISS, Milvus, Chroma, vLLM, TGI, Triton, Weights & Biases, MLflow, Ray, Airflow
Lattice Overview:
There is energy here…energy you can feel crackling at any of our international locations. It’s an energy generated by enthusiasm for our work, for our teams, for our results, and for our customers. Lattice is a worldwide community of engineers, designers, and manufacturing operations specialists in partnership with world-class sales, marketing, and support teams, who are developing programmable logic solutions that are changing the industry. Our focus is on R&D, product innovation, and customer service, and to that focus, we bring total commitment and a keenly sharp competitive personality.Energy feeds on energy. If you flourish in a fast paced, results-oriented environment, if you want to achieve individual success within a “team first” organization, and if you believe you can contribute and succeed in a demanding yet collegial atmosphere, then Lattice may well be just what you’re looking for.

Responsibilities & Skills:
Key ResponsibilitiesSelect, evaluate, and benchmark large foundation models; lead model distillation/quantization for efficiency. Prepare high‑quality datasets; perform data curation, filtering, labeling, and quality analysis. Tune task‑specific cost functions, hyperparameters, and training loops for optimal performance. Build domain‑aware RAG systems with strong retrieval metrics, evaluation pipelines, and citations. Architect and deploy model‑serving pipelines using vLLM / TGI / Triton, including batching, caching, and streaming. Lead experimentation cycles: dataset → training → evaluation → deployment → monitoring. Mentor junior engineers and influence model architecture decisions. Required Qualifications8–12 years total experience in software/ML engineering, with 3–5+ years hands‑on experience training or fine‑tuning LLMs or large models.Strong expertise in LLM fine‑tuning techniques (LoRA, QLoRA, PEFT, SFT, RLHF).Deep understanding of transformers, attention mechanisms, tokenization, embeddings, and model architecture trade‑offs.Strong experience with PyTorch, HuggingFace ecosystem, DeepSpeed/FSDP, and vector DBs (FAISS, Milvus, Chroma).Experience building and optimizing RAG pipelines, including embedding optimization, chunking strategies, and retrieval evaluation.Hands‑on experience with model serving using vLLM/TGI/Triton, GPU utilization optimization, and inference‑time acceleration.Practical experience with quantization (INT4/FP8), sparse/structured pruning, and distillation.Strong Python engineering fundamentals and familiarity with MLOps tooling (Weights & Biases, MLflow, Ray, Airflow).