Descrizione dell'offerta
This role is based in Saudi Arabia on a permanent, residential basis.
Overview
We are seeking an AI/ML/LLM Systems Engineer to join our Digital & AI Center of Excellence and contribute to the development of enterprise‑scale AI platforms that support advanced machine learning and language model inference across Saudi Aramco’s operations. The Digital & AI Center of Excellence is responsible for delivering scalable, secure, and high‑performance AI/ML/LLM systems that drive innovation and operational efficiency. In this role, you will design and maintain infrastructure for deploying and optimizing large language models and vision models, hosted on NVIDIA SuperPods/Cloud and containerized environments. Your primary responsibility is to ensure the efficient and scalable operation of AI models within enterprise platforms, deploying, monitoring, and optimizing inference workloads, integrating vector and relational databases, and implementing orchestration and DevOps pipelines to support continuous model improvement and delivery.
Duties & Responsibilities
- Deploy and manage LLMs and vision models on NVIDIA SuperPods, Cloud, ensuring high performance and efficient use of GPU resources.
- Build and maintain scalable inference pipelines using Kubernetes (K8s), Docker, and OpenShift for enterprise AI platforms.
- Optimize inference performance through multiple techniques.
- Benchmark and evaluate LLMs for performance, accuracy, latency, and resource utilization across different hardware and software configurations.
- Implement and support LLMOps frameworks with full observability, including logging, tracing, and model performance tracking.
- Integrate and manage vector databases (Elasticsearch) and relational databases (PostgreSQL) for efficient data retrieval and user interaction history tracking.
- Implement and maintain CI/CD (Continuous Integration/Continuous Delivery) pipelines for model and platform updates using Git, Bitbucket, Jenkins, and ArgoCD.
- Ensure high availability and reliability of AI application workflows using frameworks like Haystack.
- Collaborate with infrastructure teams on GPU provisioning and resource allocation for AI workloads.
- Develop and maintain monitoring, alerting, and dashboarding systems for AI/ML workloads to ensure SLA/SLO compliance.
Qualifications
- Hold a master’s degree in computer science, Software Engineering, or a related field.
- Have 8 years of experience in AI/ML systems or cloud‑native infrastructure, including at least 4 years in LLM deployment and optimization.
- Proficiency in Python and SQL is required, with experience in building and optimizing AI/ML applications.
- Ability to work with Kubernetes (K8s), Docker, and OpenShift in production environments.
- Experience deploying and optimizing LLMs and vision models on NVIDIA GPU clusters and high-performance computing (HPC) environments and Cloud environments.
- Ability to demonstrate proficiency in inference scaling, distributed computing, and SLA/SLO planning for AI workloads.
- Strong knowledge in Elasticsearch, PostgreSQL, and workflow frameworks like Haystack for AI application development.
- Ability to implement CI/CD pipelines using tools like Git, Bitbucket, Jenkins, and ArgoCD.
- Experience in benchmarking and evaluating LLMs for performance, accuracy, and efficiency is required.
- Monitoring and dashboarding for AI/ML systems is also necessary.