Staff Research Engineer, Model Efficiency

This listing is synced directly from the company ATS.

Role Overview

As a Staff Research Engineer on the Model Efficiency team, you will develop, prototype, and deploy techniques to optimize the speed and efficiency of Large Language Model inference in production. This senior-level role involves exploring breakthroughs across the model execution stack, including architecture optimization, decoding improvements, and software/hardware co-design for GPU acceleration, with a focus on enhancing performance without compromising model quality. You'll work in a fast-paced, high-ambiguity startup environment, contributing directly to pushing the limits of AI inference capabilities.

Perks & Benefits

This role offers a remote-flexible setup with offices in multiple cities and a co-working stipend, though the team is concentrated in EST and PST time zones. Benefits include full health and dental coverage, mental health support, 6 weeks of vacation, and personal enrichment budgets for arts, fitness, and workspace improvement. The culture is open and inclusive, emphasizing collaboration with a team on the cutting edge of AI research, and provides opportunities for mentorship and career growth in a diverse environment.

⚠️ This job was posted over 8 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

Who are we?

Cohere is the leading security-first enterprise AI company. We build cutting-edge foundation AI models and end-to-end products that are designed to solve real-world business problems.

We’re training and deploying frontier models for enterprises who are building AI systems. We believe that our work is instrumental to the widespread adoption of AI and we are looking for folks that want to be part of that.

We obsess over what we build. Each one of us is responsible for contributing to increasing the capabilities of our models and the value they drive for our customers. Cohere is a team of researchers, engineers, designers, and more, who are all passionate about their craft.

We are a global technology company co-headquartered in Toronto and San Francisco, with key offices in London, New York City, Montreal, Seoul, Germany and Paris. Join us!

Why this role?

Large Language Models (LLMs) continue to push the boundaries of what AI systems can do — but inference is still the bottleneck. The Model Efficiency team is responsible for pushing the limits of LLM inference efficiency across our foundation models. We explore and ship breakthroughs across the model execution stack, including:

model architecture and MoE routing optimization
decoding and inference-time algorithm improvements
software/hardware co-design for GPU acceleration
performance optimization without compromising model quality

Please Note: We have offices in Toronto, Montreal, San Francisco, New York, Paris, Seoul and London. We embrace a remote-friendly environment, and as part of this approach, we strategically distribute teams based on interests, expertise, and time zones to promote collaboration and flexibility. You'll find the Model Efficiency team concentrated in the EST and PST time zones, these are our preferred locations.

As a Staff Research Engineer, you will develop, prototype, and deploy techniques that materially improve how fast and efficiently our models run in production.

You may be a good fit for the model efficiency team if you:

Have a PhD in Machine Learning or a related field
Understand LLM architecture, and how to optimize LLM inference given resource constraints
Have significant experience with one or more techniques that enhance model efficiency
Strong software engineering skills
An appetite to work in a fast-paced high-ambiguity start-up environment
Publications at top-tier conferences and venues (ICLR, ACL, NeurIPS)
Passion to mentor others

Full-Time Employees at Cohere enjoy these Perks:

A weekly lunch stipend of $75/£75 or equivalent in your local currency for lunch.
Full health and dental benefits, including a separate budget for mental health.
RRSP matching, 401K, Pension Scheme.
100% Parental Leave top-up for up to 6 months, for either parent.
Annual enrichment benefits:
Arts & culture, fitness/wellness, quality time, and a workspace improvement credit.
Education & learning stipend for conferences, courses, and coaching.

6 weeks of paid vacation (30 working days!)
Budget for traveling to other offices if you are remote, plus an annual company offsite.

How and Where We Work:

Cohere is remote-friendly. We have offices in Toronto, San Francisco, New York City, London, Paris, Montreal, and more coming soon.
For those in the office: a daily lunch program, plenty of snacks, and regular community and social events.
For those not near an office: a co-working benefit so you can work alongside others in your city.
Everyone receives a $500 home office stipend to set up your workspace properly.

If any of the above doesn’t line up exactly with your experience, we still encourage you to apply.

We strive to create an inclusive work environment for all; we welcome applicants from all backgrounds and are committed to providing equal opportunities. Should you require any accommodations during the recruitment process, please submit an Accommodations Request Form, and we will work together to meet your needs.

We may use AI-enabled tools to screen and assess applicants against the criteria for this position. This helps our recruiters identify potentially qualified candidates, but it doesn't limit the applications our recruiters may review or consider.

Apply on original site

Similar jobs

Found 6 similar jobs

Head of Strategic Finance

Cohere • Remote

Data Annotation Specialist, Data Science

Cohere • Remote

Senior Product Designer

Cohere • Remote

Forward Deployed Engineer, Sovereign AI

Browse more jobs in:

Seo Specialist Jobs

Cohere

cohere.com

Cohere is an AI company that specializes in natural language processing and understanding. Their primary offering is a suite of language models designed to help businesses and developers integrate advanced AI capabilities into their applications. Typical customers include tech companies, developers, and enterprises looking to leverage AI for tasks such as text generation, sentiment analysis, and more. Cohere fosters a remote-first work culture, allowing employees to collaborate seamlessly from various locations, which enhances flexibility and work-life balance.

Industry

Artificial Intelligence

Fully remote

247 open positions

About this company (remote-wise)

Headquarters:: Distributed / remote-first
Team style:: Async-ish, remote-first

View company profile →

About the job

Posted onNov 7, 2025

LocationRemote

Skills

Machine LearningLLM ArchitectureGPU AccelerationModel OptimizationSoftware EngineeringPerformance TuningResearch PublicationsMentoring

Share this job

💌 Get remote jobs in your inbox

Subscribe to get the latest curated remote jobs every week.