Senior Software Engineer II Applied AI and Evaluations

Role Overview

This senior-level role involves owning agent quality end-to-end for Smartsheet's AI-powered SmartAssist platform, focusing on diagnosing failures, designing evaluation systems, and driving improvements across orchestrator and subagents. The position is deeply technical, requiring work at the intersection of LLM evaluation, prompt engineering, and retrieval-augmented generation, with impact on scaling production-grade agents and enhancing the Agent Development Lifecycle.

Perks & Benefits

The role offers remote work flexibility, likely with collaboration across time zones given the team's use of tools like Databricks/MLflow. It provides high autonomy and opportunities for career growth in AI and agent development, within a culture that emphasizes innovation and measurable impact on AI-powered work management solutions.

Full Job Description

For over 20 years, Smartsheet has helped people and teams achieve–well, anything. From seamless work management to smart, scalable solutions, we've always worked with flow. We're building tools that empower teams to automate the manual, uncover insights, and scale smarter. But more than that, we're creating space– space to think big, take action, and unlock the kind of work that truly matters. Because when challenge meets purpose, and passion turns into progress, that's magic at work, and it's what we show up for everyday.

Smartsheet is building the next generation of AI-powered work management through SmartAssist, our intelligent agent platform. As we scale from early demos to production-grade agents, quality is the critical frontier and we are looking for a Agent Quality Engineer to own it.

This is not a QA role. It's a deeply technical, high-autonomy position at the intersection of LLM evaluation, prompt and context engineering, and retrieval-augmented generation. You will diagnose why our agents fail, design the systems that catch regressions, and drive measurable improvements across our orchestrator and subagent fleet.

You will work closely with our Agent Engineering and AI Platform teams, embedded in a team that has already shipped evaluation infrastructure on Databricks/MLflow and is building toward a mature Agent Development Lifecycle (ADLC).

You Will:

  • Own agent quality end-to-end: diagnosis, improvement, and validation across SmartAssist's orchestrator and subagents
  • Identify failure modes across quality dimensions factual accuracy, completeness, tone, actionability, and latency and prioritize what to fix
  • Drive quality improvements through prompt engineering,Please mention the word **FRESHER** and tag RODguMTk4Ljk5LjE0Mw== when applying to show you read the job post completely (#RODguMTk4Ljk5LjE0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.

Similar jobs

Found 2 similar jobs