Agentic Systems Engineer
Role Overview
As an Agentic Systems Engineer at Joist AI, you will be responsible for building and enhancing modular agent components that integrate into a complex tech stack aimed at streamlining proposal writing for the architecture, engineering, and construction (AEC) industry. This mid-level role requires strong Python skills and a solid understanding of agentic and LLM concepts, where your contributions will directly impact the efficiency and functionality of the software used by professionals in the AEC sector.
Perks & Benefits
The role offers a fully remote work setup, encouraging flexibility and work-life balance. Joist AI emphasizes a culture of trust, allowing team members to take ownership from day one. The interview process is streamlined, typically lasting two weeks, which reflects the company's commitment to efficiency. There are opportunities for career growth in a rapidly evolving field, particularly for those who are curious and engaged with the latest developments in AI and agent design.
Full Job Description
About the company
Joist AI is a technology company revolutionizing the way professionals in the architecture, engineering, and construction (AEC) industry manage marketing and revenue operations. Our AI-powered software streamlines workflows, making it easier for teams to collaborate, innovate, and succeed.
About the role
Joist AI is looking for an engineer with 2–4 years of experience to help build the next generation of agentic applications that streamline proposal writing for the AEC industry. These are systems that reason, use tools, remember, and collaborate with users. The stack spans multi-agent orchestration, MCP servers, skills, long-term memory, evals, retrieval, and the plumbing that makes all of it hold up in production. We're looking for someone to help us build it.
What you'll do
Build agents as modular, plug-and-play components that slot cleanly into the wider stack.
Add memory layers (short-term, long-term, summarization, retrieval-backed) into running systems.
Wire up tool integrations, MCP servers, and skills.
Own quality of the features you put out: tests, evals, observability, the works.
Dig into production traces to understand what the system is actually doing, and close the loop with fixes.
Background we're looking for
2–4 years of writing production software.
Strong Python skills. You write good Python and can tell good Python from bad, especially now that a lot of code comes out of an LLM. Separation of concerns, clean OOP, idiomatic syntax, well-structured modules, tests that actually test something.
Solid grounding in core agentic and LLM concepts: RAG, prompting patterns, tool use, structured outputs, streaming, context management, basic generative AI fundamentals.
You've built something non-trivial with the modern agent toolkit, whether that's a side project, a prototype at work, or a hackathon thing that got out of hand.
Able to drop into an unfamiliar codebase and find your way around fast.
A keen eye for detail. You sit with a problem before reaching for a solution. No jumping to the shiny fix because it sounds clever. You understand what's actually broken before you touch anything.
Data-driven by default. Decisions come from production traces, eval numbers, and logs, not vibes. Comfortable slicing through trace data to find the real signal.
Hands-on experience with Langfuse or LangSmith (or equivalent tracing/observability for LLM systems).
Genuine curiosity about the frontier. You read the blog posts, try the frameworks, and have opinions about where agent design is headed.
Experience we'd be particularly excited about
Search and retrieval: embeddings, vector databases, hybrid retrieval, rerankers, and the gap between a retrieval system that demos well and one that survives real data.
LLM evaluations end-to-end: designing evals, choosing what to measure, building the harness, keeping scores honest as models and prompts shift.
LangGraph depth: building custom graphs, understanding checkpointers, working with context-management nodes (summarizers, windowing, state pruning) inside larger agent graphs.
What to expect
We conduct a rigorous interview process based on integrity, talent, and drive. We trust our teammates from day one and move quickly to evaluate your fit for the role. The entire interview process typically takes two weeks. Here's what to expect:
A 30 minute Zoom meeting to talk about Joist AI, your background, and answer any questions about the role. (Getting to know each other)
45-minute Python proficiency / agentic coding proficiency test. 2 problems. 1 to be coded by hand. Other using Gen AI.
60 min project, deep dive into the work they have done. A short presentation followed by a Q&A. Presentation should conclude between 20-25 min.
45 min interview on Gen AI / LLM fundamentals.
30 min culture fit.
Similar jobs
Found 6 similar jobs