Staff+ Software Engineer Observability
Role Overview
This is a senior-level Staff+ Software Engineer role on the Observability team within the Infrastructure organization, focusing on building and maintaining monitoring and telemetry systems. Day-to-day responsibilities include developing high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers, and agentic diagnostic tools to handle massive operational data from GPU, TPU, and Trainium clusters. The hire will directly impact the reliability and operational excellence of Anthropic's research and product systems by enabling engineers to detect and resolve issues quickly.
Perks & Benefits
The role is based in London, UK, suggesting a hybrid or on-site setup, with potential flexibility typical in tech roles. It offers career growth in a quickly growing company at the forefront of AI, with a culture focused on building safe and beneficial AI systems. Benefits likely include competitive compensation and opportunities to work on cutting-edge observability systems with a direct impact on company-wide infrastructure.
Full Job Description
About Anthropic Anthropic's mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.About the Role Anthropic is seeking talented and experienced Software Engineers to join our Observability team within the Infrastructure organization. The Observability team owns the monitoring and telemetry infrastructure that every engineer and researcher at Anthropic depends on—from metrics and logging pipelines to distributed tracing, error analytics, alerting, and the dashboards and query interfaces that make it all actionable. By joining this team, you'll have a direct impact on the reliability and operational excellence of Anthropic's research and product systems. As Anthropic scales its infrastructure across massive GPU, TPU, and Trainium clusters, the volume and complexity of operational data is growing by orders of magnitude. We're building next-generation observability systems—high-throughput ingest pipelines, cost-efficient columnar storage, unified query layers across signals, and agentic diagnostic tools—to ensure that engineers can detect, diagnose, and resolve issues in minutes rather than hours, even as the systems they operate become exponentially more complex. Please mention the word **PRODIGIOUS** and tag RODguMTk4Ljk5LjE0Mw== when applying to show you read the job post completely (#RODguMTk4Ljk5LjE0Mw==). This is a beta feature to avoid spam applicants. Companies can search these words to find applicants that read this and see they're human.
Similar jobs
Found 4 similar jobs