Site Reliability Engineer II (Remote)

Agile Lab · WorkFromHome, Lombardia, Italia · · 50€ - 70€


Descrizione dell'offerta

Agile Lab is a company founded in 2014 with the mission to create value for its customers in data-intensive environments through customisable solutions that establish performance-driven processes, sustainable architectures and automated platforms based on data governance best practices.

Having delivered over 100 successful Elite Data Engineering initiatives, we have used this experience to create Witboost : a modular, technology-agnostic platform that enables modern organisations to discover, value and produce their data in both traditional environments and fully compliant Data Mesh architectures.

With a highly skilled team of over 260 data engineers based in Europe, Agile Lab helps organisations with their data-driven transformation.

Take a look at our handbook to discover our core values and processes.

The opportunity :

We are looking for a Site Reliability Engineer II (SRE II) to join our growing team. You will play a key role in maintaining the reliability, observability, and operational efficiency of enterprise-level distributed systems.

In this role, you’ll coordinate a small technical team (3–4 people) in managing microservices in complex production environments. You will be involved in monitoring, incident management, release coordination, and performance tuning, with a strong focus on OpenShift platforms.

You’ll also work closely with multiple cross-functional teams to ensure high availability and performance of our cloud-native services.

This role includes on-call availability.

RAL : 38.5K-48.5K

Responsibilities :

  • Ensure high reliability of microservices running in OpenShift environments
  • Lead and coordinate a technical team of 3–4 engineers for operational excellence
  • Manage incident resolution and ticketing workflows via ServiceNow
  • Collaborate with development teams to drive performance optimization and tuning
  • Design, configure and maintain monitoring dashboards (Grafana, Prometheus, etc.)
  • Coordinate with Service Control Room to maintain effective alerting and response
  • Oversee release processes of new features, hotfixes, and updates in production

Candidatura e Ritorno (in fondo)