Senior AI Infrastructure Engineer

This listing is synced directly from the company ATS.

Role Overview

This senior-level role involves designing and developing high-performance AI infrastructure for multimodal data, serving as an expert in AI engineering frameworks like PyTorch or JAX. The engineer will work in a small, autonomous team to enhance developer experience and collaborate with customers and the open-source community, impacting cutting-edge AI applications. Responsibilities include end-to-end project delivery, feature engineering, and infrastructure scaling for production environments.

Perks & Benefits

The position is fully remote, offering flexibility with likely asynchronous collaboration and no strict time zone requirements. You'll join a world-class team of open-source builders, providing opportunities for career growth in cutting-edge AI infrastructure and a culture of autonomy and fast iteration. Benefits include working on groundbreaking projects with external customers and shaping scalable production systems in a high-caliber environment.

⚠️ This job was posted over 5 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

About LanceDB

LanceDB is a developer-friendly, open-source data lake for multimodal AI. From hyper-scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large-scale AI datasets, LanceDB is the best foundation for your AI application, and powers some of the most groundbreaking applications and challenging requirements today.

About the role

We are seeking an engineer who brings both hands-on model training, model fine-tuning, feature engineering expertise and a strong background in data/AI/ML infrastructure to join our world-class team, pushing the frontiers of multimodal data infrastructure.

Your responsibilities will include

  • Resident expert on AI engineering, bringing familiarity with frameworks such as PyTorch or JAX.

  • Champion a superior Developer Experience, maximizing productivity for AI engineers.

  • Drive the end-to-end design and development of a high-performance and large-scale feature engineering infrastructure for leading multimodal AI companies.

  • Collaborate closely with the customers, design partners, and the Lance/LanceDB community

Requirements

  • You like working with a small, high-caliber team with a lot of autonomy and drive, and you can iterate fast.

  • You have 3+ years of experience building and deploying ML/DL models in production environments or supporting infrastructure for AI researchers and AI engineers performing these tasks, using Python and libraries such as PyTorch or Tensorflow.

  • You have a proven ability to deliver projects end-to-end, from scoping and resourcing to implementation and delivery.

  • You have a working knowledge of cloud platforms (AWS, GCP, Azure) including managed storage (S3, GCS) and compute (EC2, GKE, AKS).

  • You have knowledge of monitoring/logging stacks (Prometheus, Grafana, ELK/EFK) for alerting on data pipeline failures, resource saturation, or model skew.

It would be even better if you are someone who

  • Has a deep understanding of training architecture, from PyTorch or Jax experience to CUDA kernel fusion or TPU programming.

  • Has experience designing or operating a feature store (e.g., Feast, Tecton) or building a custom feature registry.

  • Has understanding Docker layered file system, K8s, Slurm, scheduling algorithms, and orchestration services.

  • Loves python wizardry or Rust

  • Has experience directly working with external customers

  • Has built sophisticated monitoring and observability features

  • Has hands on knowledge of Kubernetes, Terraform, Docker, CI/CD. Or experience operating Kubernetes services in production environments.

  • Is familiar with one of Apache Spark, Apache Flink, Delta Lake, Ray, Google Dataflow, Kafka, Airflow, Kubeflow or other similar systems.

Why Join Us

You’ll join a world-class team of open-source builders (co-authors of pandas, and contributors to HDFS, Arrow, Iceberg, and HBase) working on cutting-edge AI infrastructure. You’ll collaborate on systems that power next-generation AI workloads while shaping how LanceDB operates and scales production environments.

Similar jobs

Found 6 similar jobs