Staff Network Engineer

This listing is synced directly from the company ATS.

Role Overview

This senior-level Staff Network Engineer role involves scaling and automating Lambda's high-performance cloud network, focusing on design, implementation, and maintenance of software-defined networks, spine and leaf fabrics, and ensuring high availability and predictable performance. The engineer will work on large production-scale projects, manage network monitoring tools, and collaborate with a team to support AI infrastructure, requiring presence in San Francisco or Seattle offices 4 days per week. This position has a significant impact on building and optimizing the network backbone for AI training and inference, contributing directly to the company's mission of ubiquitous compute access.

Perks & Benefits

The role offers a hybrid remote setup with required office presence in San Francisco or Seattle 4 days per week and a designated work-from-home day on Tuesday, along with generous cash and equity compensation, health, dental, and vision coverage for dependents, and a 401k plan with a 2% company match. Benefits include flexible paid time off, wellness and commuter stipends for select roles, and opportunities for career growth in a fast-growing, profitable company with a diverse and inclusive culture, though specific time zone expectations are not stated but likely align with U.S. Pacific Time given the office locations.

⚠️ This job was posted over 3 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU.

If you'd like to build the world's best AI cloud, join us.


*Note: This position requires presence in our San Francisco or Bellevue office location 4 days per week; Lambda’s designated work from home day is currently Tuesday.

What You'll Do

  • Help scale Lambda’s high performance cloud network

  • Contribute to the reproducible automation of network configuration

  • Contribute to the design and development of software defined networks

  • Help manage Spine and Leaf networks

  • Ensure high availability of our network through monitoring, failover, and redundancy

  • Ensure VMsclients have predictable networking performance through the use of QoS and other applicable technologies

  • Help with deploying and maintaining network monitoring and management tools

You

  • Have 15+ years of experience in designing and operating production datacenter networks

  • Have led the implementation of large production-scale networking projects

  • Expert in CLOS/Spine and Leaf fabrics,EVPN/VXLAN, ECMP, BGP, and fast convergence techniques.

  • Have experience with multi-data center networks, backbone and hybrid cloud networks

  • Production experience with at least two switches/routers vendors (e.g., Arista, Juniper, Cisco, NVIDIA/Mellanox, Cumulus/SONiC)

  • Experience with Next-Generation Firewalls (NGFW)(e.g. Fortigate, Juniper)

  • Experience with LoadBalancers like F5, NetScaler

  • Are comfortable on the Linux command line, and have an understanding of the Linux networking stack

  • Strong automation skills (Python, Ansible) and network APIs

Nice To Have

  • Hands-on with HPC/AI networking: RoCEv2 and/or InfiniBand (Congestion Control, VLs, partitions), GPUDirect RDMA concepts.

  • Experience with DWDM technologies and SD-WAN

  • Understanding of data center power/space/cooling trade-offs and their impact on topology choices

  • Experience with Observability tools like Datadog, Splunk, Grafana, Prometheus

  • Experience automating network configuration within public clouds, with tools like Terraform

  • Have led implementation of production-scale SDNs in a cloud context (e.g. helped implement the infrastructure that powers an AWS VPC-like feature)

  • Deep understanding of the Linux networking stack and its interaction with network virtualization

  • Experience with SDN ecosystem (e.g. OVS, Neutron, DPDK, Cisco ACI or Nexus Fabric Controller, Arista CVP)

Salary Range Information

The annual salary range for this position has been set based on market data and other factors. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • Founded in 2012, with 500+ employees, and growing fast

  • Our investors notably include TWG Global, US Innovative Technology Fund (USIT), Andra Capital, SGW, Andrej Karpathy, ARK Invest, Fincadia Advisors, G Squared, In-Q-Tel (IQT), KHK & Partners, NVIDIA, Pegatron, Supermicro, Wistron, Wiwynn, Gradient Ventures, Mercato Partners, SVB, 1517, and Crescent Cove

  • We have research papers accepted at top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG

  • Our values are publicly available: https://lambda.ai/careers

  • We offer generous cash & equity compensation

  • Health, dental, and vision coverage for you and your dependents

  • Wellness and commuter stipends for select roles

  • 401k Plan with 2% company match (USA employees)

  • Flexible paid time off plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Similar jobs

Found 6 similar jobs

Browse more jobs in: