Staff / Principal Research Scientist - USA

This listing is synced directly from the company ATS.

Role Overview

This is a senior-level role (Staff/Principal) where you will research, build, optimize, and deploy production ML systems for speech modeling (STT & TTS) at Inworld AI. You'll work on challenging problems like data collection, training infrastructure, RL alignment, and low-latency inference, impacting thousands of developers integrating AI-driven software. The role involves collaborating with a top-tier research and engineering team to push the boundaries of voice AI technology.

Perks & Benefits

The job is remote with relocation assistance offered for in-person collaboration in Mountain View, suggesting a hybrid-friendly culture. It includes a competitive base salary range of $260,000-$385,000 plus bonus, equity, and benefits, with opportunities for career growth in a fast-paced, research-oriented environment. The company values learning and staying updated with ML advancements, fostering a collaborative team culture focused on solving complex AI problems.

⚠️ This job was posted over 31 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

About Inworld

Inworld is a product-oriented research lab of top AI researchers and engineers, developing best-in-class realtime multimodal models and the only realtime orchestration platform optimized for thousands of queries per second.

We’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn's Top 10 Startups in the USA.

About the role

Voice is one of the key interfaces humans will interact with AI at scale. To make this reality, we are building the engine for the next generation of AI-driven software. Our primary focus is pushing the boundaries of speech modeling (STT & TTS). We approach this by researching and utilizing ML ideas that allow us to achieve state-of-the-art results (we recently ranked #1 on Artificial Analysis for Text-to-Speech models).

Working with audio is uniquely complex - arguably more so than text - because the solution space for how a specific phrase can be spoken is effectively infinite. This creates a vast landscape of challenges, from data collection and efficient training infra to creating RL alignment environments and ultra-low latency inference optimizations.

We are seeking Staff and Principal level Research Scientists to solve these challenges. You will be responsible for researching, building, optimizing, and deploying the production ML systems that thousands of developers integrate with their systems. Your work will focus on the difficult research and engineering problems of building the engine for the next generation of AI-driven software.

Qualifications

  • Power user of AI agents for work automation.

  • A PhD in a relevant technical field, or a BA/BS degree with equivalent research and/or engineering experience.

  • 5+ years of combined experience in software development (e.g. with Python or C++) and applied ML engineering.

  • Demonstrated experience applying or researching Machine Learning in one or more of the following domains:

    • Speech or video processing

    • Natural Language Processing (NLP)

    • Action planning

  • Strong foundation in data structures, algorithms, and neural network architectures.

  • Proficiency with ML frameworks such as PyTorch.

A good fit for this role may have

  • A passion for learning and staying up-to-date with the latest advancements in ML/Voice AI research and its applications.

  • Ability to work collaboratively in a fast-paced environment with shifting priorities.

  • Familiarity with pre-training, fine-tuning, RLHF and evaluation of large language and speech models.

  • Knowledge of working with embedded systems and/or running ML on edge devices.

  • Strong background in mathematics and/or physics.

We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.

The base salary range for this full-time position is $270,000 - $400,000+ bonus + equity + benefits.

Similar jobs

Found 6 similar jobs