HPC & AI Infrastructure Engineer
Descrizione dell'offerta
Who We Are
CDS, a Hewlett Packard Enterprise company is a wholly owned subsidiary of Hewlett Packard Enterprise (HPE) , providing field service capabilities to Hewlett Packard Enterprise customers as well delivering the difference in business oriented services, such as development, support and maintenance of applications, virtualisation, automation, cloud and infrastructure administration.
CDS is present in 11 European countries with more than 1,600 employees, and embraces all of Hewlett Packard Enterprise’s values and commitments to employees and customers alike.
Role Overview
We are looking for an DevOps/CloudOps Engineer to join our international team. The successful candidate will be involved in the design, implementation, and management of HPC and cloud infrastructures supporting large-scale AI and Machine Learning applications.
The ideal candidate combines strong system administration skills in HPC environments with a solid background in DevOps/CloudOps, helping to ensure our advanced computing systems and workflows are scalable, performant, and secure.
The tasks will include (but not limited to):
- Design, configure, and maintain HPC infrastructures and cloud environments for AI/ML workloads.
- Optimize CPU/GPU clusters, high-speed networking, and parallel storage systems.
- Automate deployment, monitoring, and management processes (Infrastructure as Code).
- Integrate orchestration tools (Kubernetes, Slurm, PBS, etc.) to manage distributed workloads.
- Support users and research teams in running AI/HPC applications.
- Collaborate with DevOps engineers, data scientists, and system engineers to ensure performance, reliability, and security.
- Develop and maintain scripts and backend APIs to support infrastructure operations.
Required skills and attributes
- Master’s degree or PhD in Computer Science, Engineering, or a related field.
- Proven experience in designing and managing HPC clusters, including GPU-based environments and technologies such as CUDA and the NVIDIA stack.
- Knowledge of cluster management tools (e.g., Slurm, PBS, etc.).
- Strong expertise in Linux.
- Experience with containers and orchestration (Docker, Kubernetes).
- Proficiency in scripting (Python).
- Familiarity with CI/CD tools (Jenkins, GitHub Actions) and Infrastructure as Code (Ansible, Terraform).
- Good understanding of high-performance networking and storage.
- Knowledge of cloud environments (AWS, Azure, GCP).
- Awareness of cybersecurity best practices.
#SuccessThroughPeople #WeDeliverTheDifference
Visit our privacy policy here:
Here you can access our GDPR policy:
- CDS Recruitment policy and HPE GDPR policy
CDS supports and applies the principle of equal opportunity. We recruit and retain the most qualified individuals, regardless of race, ethnicity, religion, gender, sexual orientation, age, or disability.