Cloud / DevOps Engineer (AI Infrastructure)
Descrizione dell'offerta
Organisation/Company AI4I Research Field Engineering » Computer engineering Researcher Profile Other Profession Positions Master Positions Application Deadline 5 Mar 2027 - 12:59 (Europe/Rome) Country Italy Type of Contract To be defined Job Status Full-time Hours Per Week 40 Is the job funded through the EU Research Framework Programme? Not funded by a EU programme Is the Job related to staff position within a Research Infrastructure? No
Offer Description
The Italian Institute of Artificial Intelligence (AI4I) is seeking a senior, hands‑on Cloud / DevOps Engineer to design, build, and operate the cloud foundation of its HPC/AI infrastructure.
You will own the Cloud layer that enables AI and HPC workloads to run reliably at scale, with a strong focus on OpenStack-based private cloud infrastructure and the evolution toward container‑based environments such as Kubernetes, Rancher, and Harvester.
This role is central to AI4I’s industrial deployment mission: you will create and operate the infrastructure that powers real AI projects for enterprises, public institutions, and strategic partners.
Location: AI4I, OGR – Turin, Italy
Hybrid work: Flexible arrangements may be negotiated
The position will remain open until filled and multiple candidates may be hired.
About the Role
As Cloud / DevOps Engineer at AI4I, you will take ownership of the end-to-end cloud infrastructure, from architecture and automation to day‑to‑day operations and continuous improvement.
Beyond compute orchestration, this role includes responsibility for the design and operational management of distributed storage systems that support AI and HPC workloads. You will ensure that storage performance, durability, and scalability meet the needs of GPU‑intensive training, fine‑tuning, and inference environments.
This is a cross‑unit role supporting multiple AI4I teams and projects, working closely with engineering, deployment, and compute specialists.
You will work closely with:
- AI4I Deployment and Engineering teams, enabling production‑grade AI services
- HPC / AI engineers, integrating cloud and compute environments
- External technology partners and vendors
- Internal stakeholders delivering industrial AI projects
This is a strongly execution‑oriented role, combining Cloud engineering, DevOps practices, storage architecture, and operational responsibility. You will help build a robust, scalable, and secure infrastructure that supports both current deployments and future growth.
Key Responsibilities
- Design, deploy, and operate AI4I’s private cloud infrastructure, with strong ownership of OpenStack environments
- Lead the evolution toward container‑based environments leveraging tools such as Kubernetes, Rancher, and Harvester, enabling Container-as-a-Service capabilities
- Design, deploy, and manage distributed and software‑defined storage systems supporting HPC and AI workloads, ensuring high‑performance block, object, and file services integrated with GPU clusters
- Optimize storage performance, data durability, replication strategies, and overall resource utilization for compute‑intensive workloads
- Implement infrastructure automation and CI/CD practices (Infrastructure-as-Code) for reliable and reproducible operations
- Define and enforce operational standards, including monitoring, alerting, backup, disaster recovery, and incident response
- Support internal engineering teams by providing reliable infrastructure, documentation, and best practices
- Contribute to infrastructure architecture decisions and long‑term evolution of the AI4I cloud
Required Qualifications
- Strong hands‑on experience operating and troubleshooting OpenStack production environments
- Proven experience managing container orchestration environments (e.g., Kubernetes) in production settings
- Solid hands‑on experience with distributed and software‑defined storage systems in HPC or cloud environments
- Strong Linux system administration and networking fundamentals
- Experience automating infrastructure using Infrastructure-as-Code and CI/CD practices
- Demonstrated experience operating mission‑critical production services with uptime, reliability, and incident response responsibility
Additional Strengths (from candidate profile)
- Experience with Rancher or Harvester
- Experience integrating cloud environments with GPU/HPC workloads
- Experience operating multi‑tenant cloud environments
- Experience with monitoring and observability stacks (Prometheus, Grafana, ELK, etc.)
- Security hardening and identity management in private cloud environments
- Experience supporting internal engineering teams
Key Performance Metrics
- Infrastructure availability and reliability
- Mean time to detect and resolve incidents
- Time required to onboard new internal projects or users
- Resource utilization efficiency of the infrastructure
What We Offer
- A collaborative environment with engineers and researchers working on real industrial AI deployments
- Direct impact: your infrastructure will run daily AI workloads and production systems
- An office at the epicenter of tech: OGR Torino technology hub
- Competitive compensation and access to advanced computing infrastructure
How to Apply
Submit your application exclusively through the online form:
- Cover letter (max. 1 page) describing how your profile fits this specific position
- CV and optional links to technical projects or operational experience
About Us
The Italian Institute for Artificial Intelligence (AI4I) was founded as a research institute to perform transformative, application‑oriented research in Artificial Intelligence, driving innovation and industrial progress. The Institute is designed to engage and empower gifted, entrepreneurial, and ambitious researchers who are committed to generating real-world impact at the intersection of science, technology, and industrial transformation.
Competitive salaries, performance-based incentives, access to dedicated high-performance computing resources, state-of-the-art laboratories, and strong industrial collaborations are among the distinctive features that define AI4I. The Institute fosters a dynamic international environment and an ecosystem that supports the creation and growth of innovative startups.
AI4I’s mission is to advance scientific research, technology transfer, and, more broadly, Italy’s innovation capacity, promoting positive impact across industry, services, and public administration. To achieve this, the Institute contributes to building a research and innovation infrastructure that leverages AI methods, with a special focus on manufacturing processes and the broader Industry 4.0 value chain.
AI4I also maintains strategic relationships with leading organizations in Italy and abroad, including Competence Centers and European Digital Innovation Hubs (EDIHs), positioning itself as an attractive destination for researchers, companies, and startups seeking collaboration and impact.
Number of offers available 1 Company/Institute AI4I-The Italian Institute of Artificial Intelligence Country Italy Geofield
#J-18808-Ljbffr