Site Reliability Manager (all genders)
Role Overview
This is a mid-level Site Reliability Manager role where you will monitor and ensure the reliability of an IoT hardware fleet, focusing on observability, automated recovery, and platform scalability. You will spend 80% of your time on reliability engineering and automation projects, working within a cross-functional team to reduce error rates and improve system performance across global operations.
Perks & Benefits
The role offers a hybrid remote work model with flexibility to tailor your schedule, including anchor days in the office from Tuesday to Thursday. Benefits include workation options, mobility subsidies, measurable goals through OKRs, team events, health and fitness memberships, and a choice of equipment. The company emphasizes a collaborative culture with a focus on innovation and personal growth.
Full Job Description
Are you ready to embark on an exciting journey with us? We live #own(y)our growth and foster an environment in which innovation and personal development go hand in hand.
As a Site Reliability Manager (all genders) you are an essential part of an experienced team that is responsible for accompanying exciting projects and getting our product onto the parking lots. You are essential to industrializing our platform, ensuring it scales calmly and reliably as we grow.
What to expect
Fleet Observability: You monitor system health across our entire Hardware / IoT fleet, identifying anomalies and building predictive maintenance models to ensure revenue-critical performance at scale.
Durable Problem Solving: You design automated recovery paths and durable improvements to optimize recognition accuracy, reducing Error Rate and operational pain across our global fleet.
Observability & Tooling: You are passionate about automating manual intervention and eliminating operational toil. To do this, you build and refine observability tools and dashboards that turn raw fleet telemetry into actionable, real-time insights
Defining Platform Capabilities: You identify necessary platform features based on recurring operational signals, help define Service Level Objectives (SLOs) for fleet health, and assess their impact on fleet-wide reliability after implementation by our AIoT engineering teams.
Cross-Functional: You spend approximately 20% of your time on operational triage and incident response. The remaining 80% is dedicated to high-impact reliability engineering and automation projects that transform operational pain into platform capability
Inspire us by
Your Background: You have a degree in Computer Science, Mechatronics, Industrial Engineering, Business Informatics, Electrical Engineering or a comparable field.
Professional Experience: You have at least 2–4 years of experience in Site Reliability Engineering (SRE), Technical Operations, or System Performance Engineering, preferably within complex IoT or distributed systems environments.
Strengths and Interests: You possess a data-driven mindset and the spatial awareness needed to translate diverse physical edge environments (cameras, scanners, payment terminals) into digital performance insights. You are passionate about root cause analysis and have a relentless focus on maintaining high reliability and accuracy standards across a growing fleet.
Technical Skills: Advanced knowledge of Python and SQL to automate workflows and extract deep-system insights. Experience with observability and monitoring frameworks (e.g., ELK, Prometheus, or Grafana) and a strong command of data analysis tools like Metabase.
Your Working Style: You have a high degree of ownership and excel at thinking in a solution-oriented way, for example when designing automated recovery paths or performing deep-dive root-cause analysis on fleet-wide anomalies.
Language Skills: You are a strong communicator in English (at least C1) and ideally also speak German.
What we offer
Flexibility: With our hybrid work model, you can tailor your work schedule individually and spend time with your team in the office on our Anchor Days (Tuesday through Thursday).
Workation: Work from inspiring locations during your workation for fresh ideas.
Mobility subsidy: You have the choice between bike leasing or a travel allowance.
Measurable goals: Our OKRs allow you to directly measure your impact on our product and company success
Events: Celebrate our successes at our legendary team events and OKR parties
Catering: Fresh coffee from our portafilter machine around the clock for your energy and productivity. Discover the variety of Bella&Bona, our online cafeteria, or help yourself to our fruit basket or enjoy breakfast at the cereal bar
Health: Stay fit and work out with EGYM Wellpass or Urban Sports Club in over a thousand sports and health facilities throughout Germany.
Equipment: Decide on your own equipment to work efficiently and comfortably
Dress code: Dress in a way that makes you feel most comfortable
Innovation through diversity
Regardless of your background, origin, gender identity, or individual circumstances, it's your personality that interests us. That's why we're committed to building a culture of collaboration and respect, where every team member has a voice, can grow, and feels valued with us.
Still here?
Then we could be a perfect match!
So why not get down to business right away? Feel free to reach out to Anna-Lena Kramny at anna-lena.kramny@wemolo.com, and let's find out together if your expectations align with ours. Ready to own (y)our growth?
Similar jobs
Found 6 similar jobs