Data Engineer - System Design & AI
Descrizione dell'offerta
Who we are
Bitrock is a high-end consulting and system integration company, strongly committed to offering cutting-edge and innovative solutions . Our tailored consulting services enable our clients to preserve the value of legacy investments while migrating to more efficient systems and infrastructure. We take a holistic approach to technology: we consider each system in its totality, as a set of interconnected elements that work together to meet business needs.
We thrive on overcoming challenges to help our clients reach their goals, by supporting them in the following areas: Data, AI & ML Engineering; Back-end Engineering, Platform Engineering, Front-end Engineering, Product Design & UX Engineering, Mobile App Development, Quality Assurance, FinOps, GovernanceThe effectiveness our solutions also stems from partnerships with key technology vendors, like HashiCorp, Confluent, Lightbend, Databricks, and Meterian.
Who we are looking for
We are seeking a Senior Data Engineer to architect scalable distributed systems and lead the evolution of our data platform. In this role, you will treat data infrastructure as software, combining high-level system design with production-grade programming to deliver robust pipelines that support advanced analytics, machine learning, and RAG capabilities.
We hire smart engineers, not just tool users. If you have strong fundamentals in distributed systems and software engineering but haven't used every tool in our specific stack, we still want to hear from you. We believe great engineers can learn new tools quickly.
Who you are
- Engineering First: You approach data problems with a software engineering mindset, prioritizing maintainability and scalability.
- Tool Agnostic: While we use Databricks, you understand the underlying principles of distributed computing and can adapt to any stack.
- Architectural Vision: You understand why to use a specific architecture (e.g., Lakehouse vs. Warehouse) for specific use cases.
Key Responsibilities
- System Design & Architecture: Architect end-to-end data platforms that balance latency, throughput, and cost. Make high-level trade-off decisions (e.g., batch vs. streaming, consistency vs. availability) and select appropriate infrastructure.
- Advanced Software Engineering: Write production-ready, modular Python code. Enforce software best practices including unit/integration testing, CI/CD, and code reviews, ensuring our data pipelines are as robust as our application code.
- AI & RAG Infrastructure: Design the system topology for retrieval-based AI features. Build pipelines that ingest, chunk, and embed unstructured data, managing the flow from raw documents to Vector Search indices.
- Data Modeling: Implement dimensional modeling (Star Schema) and modern table formats to ensure data quality and usability for downstream analytics.
- Performance Optimization: Diagnose bottlenecks in distributed systems. Tune SQL queries and compute jobs (e.g., Spark) for efficiency at scale.
The Stack
- Compute & Storage: Distributed processing (e.g., Spark, Databricks) and Lakehouse formats (Delta Lake, Iceberg).
- Languages: Python (Advanced/OOP), SQL.
- GenAI: LLM Orchestration, Vector Databases, Embeddings.
- Engineering: Docker, Kubernetes, Terraform, CI/CD, Git.
Requirements
- 5+ years in Data Engineering with a focus on building distributed systems.
- Programming Mastery: Expert-level Python skills
- Distributed Computing: Experience with large-scale processing frameworks (e.g., Apache Spark , Databricks, or similar).
- GenAI Competency: Proven experience building infrastructure for LLMs , working with Vector Databases , and semantic search.
- System Design: Ability to whiteboard complex data architectures and defend your technology choices.
Recruitment process:
Our recruitment process has 3 stages:
- First discovery short interview with our HR team
- Technical interview with our Team Leaders
- Final interview with our Head of Area
How to apply:
- You can apply via LinkedIn or send your cv to