Senior DevOps Engineer - Highload, Cloud & Data-Intensive Systems (EU / Remote)
Descrizione dell'offerta
About The Project
The team develops and maintains distributed services around analytics, APIs, and transaction monitoring. The systems process very large volumes of data — terabytes of storage, trillions of records, continuously growing load.
Infrastructure
- 100 servers (bare metal + VPS)
- active use of IaC
- Kubernetes clusters in production
- focus on stability, observability, and automation
- The project is long-term — not a hype startup, but a mature product with real users.
What The Work Looks Like
This is a hands‑on role with a clear time allocation:
- 60% — operations and incidents (including helping teams)
- 20% — infrastructure automation
- 20% — prototyping, improvements, technical initiatives
There is on‑call responsibility, but normally after‑hours incidents happen 2‑3 times a year, not every week.
Responsibilities
- Operation of production services and infrastructure (server provisioning/decommissioning, updates, replacements, performance troubleshooting)
- Support and development of Infrastructure as Code (Terraform / Ansible: modules, roles, standards, reviews)
- Monitoring, alerting, backups, and regular recovery checks
- Development of service and infrastructure automation
- Development of CI/CD and release procedures
- Incident diagnosis and resolution, support for product teams
- Traffic analytics, bot and attack protection tools
- Responsibility for 24/7 platform stability
Requirements
- 4+ years of experience operating Linux/Ubuntu infrastructure and production services
- Strong understanding of networking and troubleshooting
- Kubernetes (cluster operations), Rancher, Docker / containerd
- Hands‑on experience with Ansible and Terraform
- Monitoring: Prometheus / Thanos / Telegraf / Grafana / Sentry
- CI/CD: Jenkins
- Automation: Bash, Python
- Experience working with LVM
Nice to have
- Experience working with blockchain nodes
- Diagnosis and tuning of ClickHouse and MongoDB in high‑load clusters
- Providers: Hetzner / OVHcloud
- Cloudflare (edge, DDoS), experience with AWS
- Handling abuse tickets with hosting providers
Technology stack
- VPN: WireGuard, OpenVPN
- Databases: ClickHouse, MongoDB, Redis, PostgreSQL
- Applications: Node.js (pm2), php-fpm, Lua, Tarantool
- Supporting services: Go (operatorSDK), Ruby, Node.js, PHP
Benefits
- 5,000 - 8,000 € net
- Format: office / hybrid / remote
- Location: Spain (Barcelona and suburbs) or remote (CET ±2)
- Full-time
- Opportunity to genuinely influence architecture and processes
- Mature engineering team and reasonable expectations