Machine Learning Engineer — AI Architecture Research

This listing is synced directly from the company ATS.

Role Overview

This senior-level Machine Learning Engineer role involves designing and prototyping next-generation AI model architectures, focusing on alternatives to Transformers and long-context systems. You'll conduct architecture-level experiments, collaborate with inference and systems engineers for deployable solutions, and contribute to research papers and open-source projects, directly influencing the technical direction of a Series-A startup.

Perks & Benefits

The role is fully remote with a small, high-caliber team offering fast feedback loops and direct impact on core model architecture. Benefits include competitive compensation, meaningful equity, and opportunities to ship research into production, with an emphasis on a research-driven, collaborative culture typical of tech startups.

Full Job Description

About the Role

We’re looking for a Machine Learning Engineer focused on AI architecture research to help design, prototype, and validate next-generation model architectures. You’ll work at the intersection of research and production — turning new ideas into scalable, real-world systems.

This role is ideal for someone who enjoys questioning architectural assumptions, experimenting with novel model designs, and pushing beyond standard Transformer-style approaches.

What You’ll Work On

  • Research and develop new neural network architectures (e.g. alternatives or extensions to Transformers, recurrent / hybrid models, long-context systems)

  • Design and run architecture-level experiments (scaling laws, memory mechanisms, compute trade-offs)

  • Prototype models end-to-end — from research code to training-ready implementations

  • Collaborate with inference and systems engineers to ensure architectures are deployable and efficient

  • Analyze model behavior, failure modes, and inductive biases

  • Read, reproduce, and extend cutting-edge research papers

  • Contribute to internal research notes, benchmarks, and open-source efforts (where applicable)

What We’re Looking For

  • Strong background in machine learning fundamentals and deep learning

  • Hands-on experience implementing model architectures from scratch

  • Solid understanding of:

    • Attention mechanisms, RNNs, state-space models, or hybrid architectures

    • Training dynamics, scaling behavior, and optimization

    • Memory, latency, and compute constraints at the model level

  • Comfortable working in PyTorch or JAX

  • Ability to move fluidly between theory, experimentation, and engineering

  • Clear communicator who can explain architectural trade-offs

Nice to Have

  • Experience with non-Transformer architectures (RNN variants, SSMs, long-context models)

  • Background in research-driven startups or open-source ML projects

  • Experience with large-scale training or custom training loops

  • Publications, preprints, or notable research contributions

  • Familiarity with inference optimization and deployment constraints

Why Join

  • Work on core model architecture, not just fine-tuning

  • Direct influence on the technical direction of a Series-A company

  • Small, high-caliber team with fast feedback loops

  • Opportunity to ship research into production

  • Competitive compensation + meaningful equity

Similar jobs

Found 6 similar jobs

Browse more jobs in: