Skip to content

Head of Data Engineering & ML

  • Hybrid
    • Limassol, Cyprus
  • Technology

Job description

We are hiring on behalf of our client for a Head of Data Engineering & Machine Learning.

In this role, you will take full ownership of both the data engineering and machine learning/data science functions — defining the strategy, building and leading the team, establishing best practices, and ensuring high-quality delivery across the board.

You will be responsible for shaping and implementing robust data governance and data quality frameworks, while simultaneously building and scaling the AI (ML/DS) capability. A key focus of the role is driving the integration of AI into the product, translating data into tangible business and product impact.

Responsibilities:

Data Platform & Architecture

·      End-to-end data architecture: ingestion, storage, transformation,

·       Data lake / warehouse design with governed layering, version-controlled and monitored pipelines

·       Integration with Power BI, product, marketing and trading platforms APIs, ML serving endpoints, regulatory reporting

Data Governance

  • Corporate data dictionary covering all critical domains: trading, clients, accounts, leads, campaigns, instruments

  • Naming conventions enforced via CI/CD pipeline (SonarQube or equivalent) — violations caught before production

  • Semantic layer (dbt + Cube.dev / AtScale or equivalent) as the single source of truth between the warehouse and BI/AI consumers — enabling business self-service without IT involvement

  • Data lineage, ownership, and stewardship model across technical and business teams

  • GDPR, MiFID II, and audit alignment

Data Quality

  • Automated quality gates throughout the pipeline: schema validation, null checks, referential integrity, statistical drift, business rule assertions

  • Quality framework (Great Expectations, dbt tests, Soda, or equivalent) integrated as blocking pipeline checks — not just monitoring

  • Data quality KPIs with business-visible dashboards; incident response process with defined remediation SLAs

Machine Learning & Data Science

  • Lead the ML/DS team: technical direction, solution selection, delivery standards, capacity

  • MLOps: versioned training pipelines, model registry, A/B infrastructure, drift monitoring (MLflow, DVC, or equivalent)

  • Business use cases: churn, lead scoring, anomaly detection, trade pattern analysis, risk segmentation

  • Model documentation standard: purpose, inputs, outputs, training data, evaluation metrics, re-training schedule

AI in the Data Pipeline

  • LLM-assisted pipeline development: SQL/transformation code review, anomaly explanation, schema change impact analysis

  • ML-based data quality detection for drift and anomalies that rule-based checks miss

Job requirements

  • 10+ years in data engineering, with 3+ years leading a team

  • Technical ownership of Airflow or equivalent orchestrator (not just usage — architectural responsibility)

  • Strong data warehouse / lake design: partitioning, SCD, incremental loading, Delta Lake / Iceberg / Hudi

  • Deep SQL — query optimization, execution plan analysis, rewrite-level proficiency

  • PostgreSQL and MySQL hands-on; MongoDB data modeling (embedding vs referencing, aggregation pipelines)

  • Databricks or equivalent cloud data platform (Snowflake, BigQuery, Redshift)

  • Implemented a corporate data dictionary in a real organisation — deployed and maintained, not presented in slides

  • Built and operated a semantic layer: dbt as minimum; Cube.dev, AtScale, or LookML as a differentiator

  • Hands-on with data lineage tooling: OpenLineage, Apache Atlas, Collibra, or equivalent

  • Automated data quality frameworks in production: Great Expectations, dbt tests, Soda, Monte Carlo, or equivalent

  • Led end-to-end ML delivery in production — churn, fraud, forecasting, anomaly detection, or classification in a commercial context

  • MLOps tooling: MLflow, DVC, Weights & Biases, or equivalent; model serving and versioning

  • Familiar with Python ML ecosystem: scikit-learn, XGBoost/LightGBM, one of PyTorch/TensorFlow

  • Power BI data model design — the data team owns what BI consumes

  • Git, CI/CD fundamentals, Docker/Kubernetes — enough to own pipeline deployments

Strong plus:

  • LLM / RAG / MCP integration experience in a data or engineering context

  • Financial services data: trading systems, client data, regulatory reporting

Hybrid
  • Limassol, Cyprus
Technology

or