Machine Learning Engineer — Multilingual Data

This listing is synced directly from the company ATS.

Role Overview

As a Machine Learning Engineer at Featherless AI, you will design and maintain multilingual datasets, develop data pipelines, and implement quality filters to enhance model performance across diverse languages and contexts. This mid-level role requires collaboration with researchers and engineers, focusing on ensuring data quality and model generalization beyond English-speaking markets.

Perks & Benefits

This remote position offers real ownership over crucial product components and the opportunity to work with a small, highly skilled team. Employees can expect competitive compensation, meaningful equity, and the chance to impact global models. The culture promotes collaboration and continuous improvement in a supportive environment.

⚠️ This job was posted over 5 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

We’re looking for a Machine Learning Engineer to own and scale our multilingual data pipeline—from sourcing and curation to evaluation and continuous improvement. You’ll work closely with researchers and infra engineers to ensure our models perform robustly across languages, scripts, and cultural contexts.

This role sits at the intersection of data, research, and production ML and is ideal for someone who cares deeply about data quality, linguistic diversity, and model generalization beyond English.

What You’ll Do

Design, build, and maintain large-scale multilingual datasets across high- and low-resource languages
Develop data pipelines for collection, cleaning, normalization, deduplication, and labeling
Implement quality filters using statistical, heuristic, and model-based methods
Work with researchers to define language coverage, benchmarks, and evaluation metrics
Analyze dataset bias, coverage gaps, and failure modes across regions and scripts
Support training, fine-tuning, and distillation workflows with high-quality multilingual data
Continuously iterate on datasets based on model performance and real-world usage

What We’re Looking For

3+ years of experience as an ML Engineer, Applied Scientist, or similar role
Strong experience working with multilingual or non-English datasets
Solid understanding of NLP fundamentals (tokenization, embeddings, language modeling)
Experience building scalable data pipelines (Python, Spark, Ray, or similar)
Familiarity with Unicode, scripts, tokenization challenges, and language-specific quirks
Comfort collaborating with researchers and translating research needs into production systems

Nice to Have

Experience with low-resource languages or multilingual benchmarks (e.g. FLORES, XTREME)
Exposure to LLM training, fine-tuning, or distillation
Linguistics background or experience working with native language experts
Contributions to open-source datasets or ML tooling
Experience with data quality evaluation at scale

Why Join

Real ownership over a core differentiator of the product
Work on models used globally, not just in English-speaking markets
Small, high-caliber team with deep ML and systems experience
Competitive compensation + meaningful equity at Series A stage

Apply on original site

Similar jobs

Found 6 similar jobs

Founding Business Development Rep (AI Cloud US/CA)

Featherless AI • Remote

Content Marketer

Featherless AI • Remote

Founding Account Executive (AI Cloud)

Featherless AI • Remote

Business Development Rep (AI Cloud)

Featherless AI • Remote

AI Researcher — Training Optimization

Featherless AI • Remote

AI Researcher – Multilingual Data

Featherless AI • Remote

Browse more jobs in:

Machine Learning Engineer Jobs

Featherless AI

featherless.ai

Featherless AI specializes in developing lightweight and efficient artificial intelligence solutions tailored for resource-constrained environments. Their typical customers include tech startups, IoT device manufacturers, and enterprises seeking to integrate AI into mobile and edge computing applications. The company's main product is a suite of optimized AI models and tools that reduce computational overhead while maintaining high performance. As a fully remote organization, Featherless AI fosters a distributed work culture that emphasizes asynchronous communication and flexible scheduling to support a global team.

Industry

Artificial Intelligence

Fully remote

23 open positions

About this company (remote-wise)

Headquarters:: Distributed / remote-first
Team style:: Async-ish, remote-first

View company profile →

About the job

Posted onJan 22, 2026

LocationRemote

Skills

Machine LearningMultilingual DataData PipelinesNatural Language Processing

Python

SparkRayUnicodeDataset AnalysisCollaboration

Share this job

💌 Get remote jobs in your inbox

Subscribe to get the latest curated remote jobs every week.