Company Name: Luma AI

Job Details: Be an Early Applicant 2 Locations In-Office or Remote 200K-300K Mid level

Job Url: https://builtin.com/job/research-scientist-engineer-multimodal-capabilities/6869148

Job Description: About the RoleThe Multimodal Capabilities team at Luma focuses on unlocking advanced capabilities in our foundation models through strategic research into multimodal understanding and generation. This team tackles fundamental research questions around how different modalities can be combined to enable new behaviors and capabilities, working on the open-ended challenges of what makes multimodal AI systems truly powerful and versatile.ResponsibilitiesCollaborate with the Foundation Models team to identify capability gaps and research solutionsDesign datasets, experiments, and methodologies to systematically improve model capabilities across vision, audio, and languageDevelop evaluation frameworks and benchmarking approaches for multimodal AI capabilitiesCreate prototypes and demonstrations that showcase new multimodal capabilitiesExperienceStrong programming skills in Python and PyTorchExperience with multimodal data processing pipelines and large-scale dataset curationUnderstanding of computer vision, audio processing, and / or natural language processing techniques(Preferred) Expertise working with interleaved multimodal data(Preferred) Hands-on experience with Vision Language Models, Audio Language Models, or generative video models