Staff / Principal AI Engineer - USA

This listing is synced directly from the company ATS.

Role Overview

This is a senior-level role (Staff/Principal) where you will research, build, optimize, and deploy production ML systems for speech modeling (STT & TTS), focusing on challenges like data collection, training infrastructure, RL alignment, and low-latency inference. You will work on the engine for next-generation AI-driven software, impacting thousands of developers who integrate these systems. The role involves solving complex audio processing problems in a fast-paced, collaborative team environment.

Perks & Benefits

The job is remote, but the company emphasizes in-person collaboration in Mountain View, offering relocation assistance. Benefits include a competitive base salary range of $260,000-$385,000, plus bonus, equity, and benefits. The culture values learning and staying updated with ML advancements, with a focus on solving hard problems in a team setting, though time zone expectations are not specified, typical remote tech roles may require flexibility for collaboration.

⚠️ This job was posted over 29 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

About Inworld

At Inworld, we believe that the benefits of AI should extend beyond business workflows to the applications and experiences that we enjoy every day. We began by pushing the frontier of lifelike, interactive characters for games and entertainment, pioneering realtime conversational AI at scale. Today, we apply that expertise to provide the multimodal models, pipelines and tools needed to build and evolve consumer-scale, real-time conversational AI applications across learning, health, social, assistants, games and media.

We’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn's Top 10 Startups in the USA.

About the role

Voice is one of the key interfaces humans will interact with AI at scale. To make this reality, we are building the engine for the next generation of AI-driven software. Our primary focus is pushing the boundaries of speech modeling (STT & TTS). We approach this by researching and utilizing ML ideas that allow us to achieve state-of-the-art results (we recently ranked #1 on Artificial Analysis for Text-to-Speech models).

Working with audio is uniquely complex - arguably more so than text - because the solution space for how a specific phrase can be spoken is effectively infinite. This creates a vast landscape of challenges, from data collection and efficient training infra to creating RL alignment environments and ultra-low latency inference optimizations.

We are seeking Staff and Principal level AI Engineers to solve these challenges. You will be responsible for researching, building, optimizing, and deploying the production ML systems that thousands of developers integrate with their systems. Your work will focus on the difficult research and engineering problems of building the engine for the next generation of AI-driven software.

Qualifications

  • A PhD in a relevant technical field, or a BA/BS degree with equivalent research and/or engineering experience.

  • 5+ years of combined experience in software development (e.g. with Python or C++) and applied ML engineering.

  • Demonstrated experience applying or researching Machine Learning in one or more of the following domains:

    • Speech or video processing

    • Natural Language Processing (NLP)

    • Action planning

  • Strong foundation in data structures, algorithms, and neural network architectures.

  • Proficiency with ML frameworks such as PyTorch.

A good fit for this role may have

  • A passion for learning and staying up-to-date with the latest advancements in ML/Voice AI research and its applications.

  • Ability to work collaboratively in a fast-paced environment with shifting priorities.

  • Familiarity with pre-training, fine-tuning, RLHF and evaluation of large language and speech models.

  • Knowledge of working with embedded systems and/or running ML on edge devices.

  • Strong background in mathematics and/or physics.

We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.

The base salary range for this full-time position is $260,000 - $385,000+ bonus + equity + benefits.

Similar jobs

Found 6 similar jobs

Browse more jobs in: