Senior Machine Learning Engineer
Descrizione dell'offerta
Role overview
You will own end-to-end ML systems: model training, fine-tuning, deployment, monitoring, and cost/performance optimization. Partner closely with organizational psychologists, people scientists, and software engineering to productionize LLMs, real-time conversational agents, and ML pipelines. You will report to the Head of AI & Science and drive engineering best practices, reliability, and reproducibility across the stack.
Key responsibilities
- Design, build, and maintain end-to-end ML platforms and pipelines: data ingestion, feature engineering, training, validation, deployment, and monitoring.
- Develop, fine-tune, and deploy LLMs and GenAI services for assessment tasks (prompt engineering, instruction tuning, RLHF/IL, retrieval-augmented generation).
- Implement scalable, low-latency inference systems (serverless and/or containerized), real-time voice/text conversational agents, and batching strategies for cost-effective throughput.
- Build infrastructure-as-code (Terraform/CloudFormation) for reproducible environments and secure, compliant deployments.
- Create automated CI/CD for data, models, and infra (model/data versioning, reproducible training runs, canary/blue-green deployments).
- Optimize model size and inference cost using quantization, pruning, distillation, sharding, and hardware-aware optimizations.
- Implement monitoring, observability, drift detection, and alerting for model performance and data pipeline health; run A/B and multivariate experiments to validate model changes.
- Integrate vector databases, retrieval pipelines, and caching strategies for RAG systems; manage embeddings lifecycle and similarity search performance.
- Ensure data and model governance: lineage, access controls, privacy safeguards, and auditability.
Required qualifications
- Bachelor or Master’s degree in Computer Science or related field.
- 7+ years experience in ML engineering/ML-Ops delivering production ML products.
- 3+ years practical experience training and deploying GenAI/LLMs in production.
- Strong production experience on AWS (SageMaker, Lambda, ECS/EKS, Bedrock experience is a plus).
- Proven track record building highly scalable services and real-time systems.
- Experience with infrastructure-as-code (Terraform, CloudFormation) and container orchestration (Docker, Kubernetes).
- Hands-on experience with ML pipeline and experiment platforms (MLflow, Weights & Biases, Kubeflow, Airflow/Prefect).
- Proficiency in Python and TypeScript; solid software engineering practices and Git workflows.
- Experience implementing model monitoring, drift detection, and A/B testing for ML models.
- Familiarity with vector DBs (e.g., Qdrant), retrieval pipelines, and prompt/agent design.
- Fluency in English (C1) and strong communication for cross-functional collaboration.
Highly desirable
- Experience with Bedrock, SageMaker, or other managed LLM infrastructures.
- Experience of deploying in multimodal models, speech-to-text, text-to-speech, and building voice-based conversational agents.
- Experience with distributed training frameworks (Horovod, DeepSpeed, ZeRO) and model parallelism.
- Knowledge of model compression, quantization toolchains (ONNX, TensorRT, Optimum), and cost-optimization strategies.
- Familiarity with feature stores and online/offline serving (Feast, Tecton).
- Prior experience in HR tech, assessment, or conversational assessment/coaching systems.
- Contributions to open-source ML infra or published ML blog posts or conference papers.
What we offer
- Opportunity to shape and scale AI systems at an early-stage company with real product impact.
- Close collaboration with researchers and product teams to deploy scientifically grounded ML features.
- Remote work (within the EU timezone)
- Competitive compensation, flexible work, and budget for conferences, training, and research resources.
- A collaborative, flat environment where engineering leadership influences product and research direction.