AI Researcher – Multilingual Data

This listing is synced directly from the company ATS.

Role Overview

This senior-level AI Researcher role involves designing and executing research on multilingual datasets, focusing on data collection, filtering, and quality measurement for low-resource languages. The hire will work closely with engineers and researchers in a fast-moving startup environment to translate research insights into production systems, with a strong emphasis on publishing high-quality papers at top conferences. They will have real ownership over research direction and impact, contributing to the development and scaling of next-generation language models across diverse languages.

Perks & Benefits

The role offers remote work with likely flexible hours, though time zone expectations may align with team collaboration. Benefits include competitive compensation, meaningful equity at an early stage, access to large datasets and modern infrastructure for fast iteration, and a culture that values both academic papers and production impact. Career growth is supported through real ownership over research direction and opportunities to publish at top venues and contribute to open-source projects.

⚠️ This job was posted over 5 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

About the Role

We’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.

This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.

What You’ll Do

Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement
Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)
Research and improve cross-lingual transfer, alignment, and robustness in large language models
Build and maintain evaluation benchmarks for multilingual performance
Collaborate with engineers and researchers on training pipelines and model architecture decisions
Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate
Translate research insights into practical improvements in production models

What We’re Looking For

Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling
Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)
Experience working with large-scale text datasets across multiple languages
Solid understanding of:
- Tokenization and vocabulary design for multilingual models
- Data quality metrics, filtering, and dataset bias
- Transfer learning and multilingual representation learning
Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)
Ability to operate independently and ship research in a startup pace environment

Nice to Have

Experience with low-resource languages or non-Latin scripts
Open-source contributions in NLP or data tooling
Experience training or evaluating large language models
Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)

Why Join Us

Real ownership over research direction and impact
A team that values papers and production
Access to meaningful scale: large datasets, modern infrastructure, and fast iteration
Competitive compensation and meaningful equity at an early stage

Apply on original site

Similar jobs

Found 6 similar jobs

Founding Business Development Rep (AI Cloud US/CA)

Featherless AI • Remote

Content Marketer

Featherless AI • Remote

Founding Account Executive (AI Cloud)

Featherless AI • Remote

Business Development Rep (AI Cloud)

Featherless AI • Remote

AI Researcher — Training Optimization

Featherless AI • Remote

AI Researcher — AI Architecture Research

Featherless AI • Remote

Featherless AI

featherless.ai

Featherless AI specializes in developing lightweight and efficient artificial intelligence solutions tailored for resource-constrained environments. Their typical customers include tech startups, IoT device manufacturers, and enterprises seeking to integrate AI into mobile and edge computing applications. The company's main product is a suite of optimized AI models and tools that reduce computational overhead while maintaining high performance. As a fully remote organization, Featherless AI fosters a distributed work culture that emphasizes asynchronous communication and flexible scheduling to support a global team.

Industry

Artificial Intelligence

Fully remote

23 open positions

About this company (remote-wise)

Headquarters:: Distributed / remote-first
Team style:: Async-ish, remote-first

View company profile →

About the job

Posted onJan 23, 2026

LocationRemote

Skills

Python

PyTorchJAXNLPMachine LearningMultilingual ModelingData CurationResearch Publication

Share this job

💌 Get remote jobs in your inbox

Subscribe to get the latest curated remote jobs every week.