AI Researcher — Inference Optimization

This listing is synced directly from the company ATS.

Role Overview

As an AI Researcher specializing in inference optimization, you will focus on designing and deploying high-performance inference systems for large-scale machine learning models. This senior-level role involves collaborating with engineering teams, implementing optimizations, and translating research insights into production-ready improvements, ultimately driving measurable gains in latency and cost efficiency.

Perks & Benefits

This fully remote position offers flexibility in work hours, allowing you to collaborate across time zones. Featherless AI values a culture of innovation and encourages career growth through hands-on experience with cutting-edge technologies. You'll have the opportunity to contribute to impactful projects while working with a team of experts in the field.

⚠️ This job was posted over 4 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

Role Overview

We are seeking an AI Researcher with deep experience in inference optimization to design, evaluate, and deploy high-performance inference systems for large-scale machine learning models. You will work at the intersection of model architecture, systems engineering, and hardware-aware optimization, improving latency, throughput, and cost efficiency across real-world production environments.

Key Responsibilities

Research and develop techniques to optimize inference performance for large neural networks.
Improve latency, throughput, memory efficiency, and cost per inference.
Design and evaluate model-level optimizations (quantization, pruning, KV-cache optimization, architecture-aware simplifications).
Implement systems-level optimizations (dynamic batching, kernel fusion, multi-GPU inference, prefill vs decode optimization).
Benchmark inference workloads across hardware accelerators.
Collaborate with engineering teams to deploy optimized inference pipelines.
Translate research insights into production-ready improvements.

Required Qualifications

Strong background in machine learning, deep learning, or AI systems.
Hands-on experience optimizing inference for large-scale models.
Proficiency in Python and modern ML frameworks (e.g., PyTorch).
Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
Ability to design experiments and communicate results clearly.

Preferred / Nice-to-Have Qualifications

Experience deploying production inference systems at scale.
Familiarity with distributed and multi-GPU inference.
Experience contributing to open-source ML or inference frameworks.
Authorship or co-authorship of peer-reviewed research papers in machine learning, systems, or related fields.
Experience working close to hardware (CUDA, ROCm, profiling tools).

What Success Looks Like

Measurable gains in latency, throughput, and cost efficiency.
Optimized inference systems running reliably in production.
Research ideas successfully translated into deployable systems.
Clear benchmarks and documentation that inform product decisions.

Relevant Research Areas (Bonus)

Long-context inference optimization
Speculative decoding
KV-cache compression and paging
Efficient decoding strategies
Hardware-aware inference design

Apply on original site

Similar jobs

Found 6 similar jobs

Founding Account Executive (AI Cloud)

Featherless AI • Remote

Business Development Rep (AI Cloud)

Featherless AI • Remote

AI Researcher — Training Optimization

Featherless AI • Remote

AI Researcher – Multilingual Data

Featherless AI • Remote

AI Researcher — AI Architecture Research

Featherless AI • Remote

AI Researcher — Distillation

Featherless AI • Remote

Featherless AI

featherless.ai

Featherless AI specializes in developing lightweight and efficient artificial intelligence solutions tailored for resource-constrained environments. Their typical customers include tech startups, IoT device manufacturers, and enterprises seeking to integrate AI into mobile and edge computing applications. The company's main product is a suite of optimized AI models and tools that reduce computational overhead while maintaining high performance. As a fully remote organization, Featherless AI fosters a distributed work culture that emphasizes asynchronous communication and flexible scheduling to support a global team.

Industry

Artificial Intelligence

Fully remote

21 open positions

About this company (remote-wise)

Headquarters:: Distributed / remote-first
Team style:: Async-ish, remote-first

View company profile →

About the job

Posted onJan 23, 2026

LocationRemote

Skills

Machine LearningDeep Learning

Python

PyTorchInference OptimizationCUDATritonTensorRTMulti-GPU Inference

Share this job

💌 Get remote jobs in your inbox

Subscribe to get the latest curated remote jobs every week.