Senior SRE/DevOps (Platform Tribe)

This listing is synced directly from the company ATS.

Role Overview

This senior-level role involves managing daily alerts, providing 24x7 on-call support, and deploying to EKS/K8s clusters using Terraform and Helm/Flux. The engineer will enhance infrastructure health, implement new technologies, and collaborate with teams to ensure minimal impact on a high-traffic iGaming platform. Responsibilities include proactive monitoring, issue documentation, and conducting root cause analyses to prevent recurrence.

Perks & Benefits

The position offers 100% remote work with a flexible schedule, unlimited paid vacation and sick leave, and competitive salary plus quarterly bonuses of 15-20%. Benefits include medical insurance, financial support for life events, extended parental leave, and paid professional development courses, fostering a supportive and growth-oriented culture.

⚠️ This job was posted over 19 months ago and may no longer be open. We recommend checking the company's site for the latest status.

Full Job Description

About the Role

We’re looking for a Senior SRE / DevOps Engineer to join our Platform Tribe - a lean & senior team where ownership is high and expectations are even higher. This is a deeply hands-on role at the core of a high-traffic system, where you’ll be directly responsible for maintaining reliability, performance, and stability in a fast-paced environment.

You’ll be working on real-time production challenges, handling incidents, managing alerts, and being part of a critical on-call rotation. This role requires resilience, strong decision-making under pressure, and a proactive mindset to continuously improve systems operating at scale.

If you thrive in high-load environments, enjoy solving complex production issues, and want to have a direct impact on systems used by millions - this is the place for you.

Key Responsibilities

Own system reliability by actively monitoring platform health, managing alerts, and responding to incidents in real time
Participate in 24/7 on-call rotations, taking full ownership of production stability in a high-traffic (5–7k RPS) environment
Investigate incidents, perform root cause analysis, and implement long-term fixes to prevent recurrence
Build and continuously improve monitoring, alerting, and observability across the Kubernetes (EKS) ecosystem
Deploy, manage, and optimise infrastructure using Terraform, Helm, and GitOps tools (Flux/ArgoCD)
Drive automation and proactively improve system resilience, reducing manual intervention and recurring issues
Maintain and evolve CI/CD pipelines and infrastructure-as-code practices
Collaborate closely with engineering teams to support deployments and minimise user impact in a live environment
Introduce and integrate new tools and technologies to enhance scalability, reliability, and performance
Handle environment-specific requests and ensure smooth day-to-day platform operations under constant load

Requirements

Strong hands-on experience with Kubernetes (deployment, scaling, troubleshooting) in high-load environments
Experience with GitOps tools such as FluxCD or ArgoCD
Proven experience in incident response, root cause analysis, and postmortems in production systems
Solid experience with AWS, Terraform, Docker, and CI/CD pipelines
Experience with monitoring and observability tools such as Datadog, Prometheus, Grafana, and logging stacks like ELK or CloudWatch
Strong understanding of networking concepts and protocols
Proficiency in at least one scripting language (e.g. Python, Go, Node.js)
Experience working with version control systems (Git)
Familiarity with incident management tools like PagerDuty, Opsgenie, or similar
Ability to operate effectively in a fast-paced, high-pressure environment with strong ownership and accountability
Proactive, resilient mindset with a focus on continuous improvement and system stability

What We Offer

Competitive Salary
Quarterly Bonuses
Unlimited Paid Time Off
Unlimited Paid Sick Leave
Remote & Flexible Working
Private Medical Insurance
Financial Support for Life Events
Professional Development Budget
International Exposure
Regular Company Events

*Benefits may vary depending on location and contractual agreement

Recruitment Process

1. HR Interview (30-45 min)

2. Technical interview (90 min)

4. Final Interview with C-level (60 min)

Apply on original site

Similar jobs

Found 6 similar jobs

UI Artist for Games

Playson • Remote

Promo Features Marketing & Commercialisation Lead

Playson • Remote

Senior Software Engineer (Game Engine)

Playson • Remote

Technical Program Manager

Playson • Remote

People Operations Partner

Playson • Remote

People Business Partner

Playson • Remote

Browse more jobs in:

Devops Engineer Jobs

Playson

playson.com

Playson is a leading iGaming content provider that develops and distributes high-quality online casino games. Their primary customers are online casino operators and gaming platforms seeking engaging slot games and gaming solutions. The company's main products include a diverse portfolio of HTML5 slot games featuring innovative mechanics, captivating themes, and mobile-optimized gameplay. Playson operates with a distributed team structure that supports flexible working arrangements, allowing employees to collaborate effectively across different locations while maintaining a strong focus on creativity and technical excellence in game development.

Industry

Gaming

Hybrid

50 open positions