Cloud Infrastructure Engineer
Role Overview
This is a senior-level Cloud Infrastructure Engineer role at Braintrust, focusing on building and maintaining scalable infrastructure using Terraform, Kubernetes, and CI/CD pipelines. The engineer will work directly with customers to support self-hosting, troubleshoot issues, and improve observability across multi-cloud environments like AWS, Azure, and GCP. This high-impact position involves partnering with engineering teams to enhance deployment reliability and support enterprise customers, contributing significantly to both internal and external platform scaling.
Perks & Benefits
The role is fully remote, offering medical, dental, and vision insurance, along with flexible time off and a competitive salary with equity. While not explicitly stated, typical remote tech roles may involve flexible hours with some time zone overlap for collaboration, and the company culture emphasizes direct customer interaction and support, fostering growth in a fast-paced AI observability environment. An AI stipend is included, and benefits like daily lunch and snacks suggest a supportive, resource-rich workplace, though remote-specific perks like home office allowances are not mentioned but could be assumed based on industry norms.
Full Job Description
About the company
Braintrust is the AI observability platform. By connecting evals and observability in one workflow, Braintrust gives builders the visibility to understand how AI behaves in production and the tools to improve it.
Teams at Notion, Stripe, Zapier, Vercel, and Ramp use Braintrust to compare models, test prompts, and catch regressions — turning production data into better AI with every release.
About the role
We’re looking for a Cloud Infrastructure Engineer to help us build reliable, scalable infrastructure and give developers a world-class platform to ship code with speed and confidence. You’ll lead efforts across Terraform, Kubernetes, CI/CD, observability, and support, and play a key role in how we scale Braintrust both internally and for customers self-hosting our platform.
This is a high-impact role where you’ll contribute across our internal AWS environment and help customers deploy our stack in AWS, Azure, and GCP.
What you’ll do
Build and maintain Terraform modules for both internal infrastructure and customer deployments
Work directly with customers in Slack to support self-hosting and troubleshoot infrastructure issues. Build tools to make it easier for them to support themselves.
Own and improve our CI/CD pipeline: reduce build times, improve failure visibility, and enable safer, faster releases
Centralize and scale observability - including logs, metrics, dashboards, and alerts
Partner with engineering teams to build and evolve a secure, developer-friendly infrastructure platform
Support multi-cloud deployment patterns (AWS primarily, with Azure and GCP support for enterprise customers)
Implement tools and automation to improve deployment, rollback, and infrastructure reliability
Ideal candidate credentials
5+ years of experience in DevOps, SRE, or Infrastructure Engineering roles
Deep experience with Terraform and at least one major cloud provider (AWS strongly preferred)
Strong Kubernetes skills: deploying, debugging, and scaling real workloads
Proficient in scripting or programming (Python, Typescript, or Go)
Experience supporting production systems and responding to incidents
Comfortable working directly with customers in a support or deployment context
Bonus: experience with multi-cloud environments or self-hosted enterprise software
Benefits include
Medical, dental, and vision insurance
Daily lunch, snacks, and beverages
Flexible time off
Competitive salary and equity
AI Stipend
Equal opportunity
Braintrust is an equal opportunity employer. All applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status.
Similar jobs
Found 6 similar jobs