Raúl Camilo Martín Bernal — Senior Data Engineer

01 / Professional life

From legacy bottlenecks to platforms that move in minutes.

Senior Data Engineer with 5+ years designing and delivering end-to-end data platforms across fintech, e-commerce, and enterprise environments — scalable, cost-efficient pipelines on AWS, Snowflake, Spark, dbt & Python, with a consistent track record of turning slow, legacy processes into fast, reliable ones.

Years engineering data

16×

Pipeline speed-up · 8h→30m

3 TB

Largest tables consolidated

50M+

Records processed daily

EPAM Systems Senior Data Integration Engineer

Nov 2025 – Present · Remote

Client: foodservice equipment & supplies distribution (USA)

Leading the modernization of a business-critical platform — migrating ELT from Spark-based AWS Glue to Airbyte + dbt + Redshift, integrating ~10 GB/day from Oracle, SQL Server, OpenEdge & IBM DB2.

Re-engineered legacy Spark pipelines to eliminate memory bottlenecks — cutting a critical pipeline from 8 h → 30 min 16× faster.

Replaced legacy SSIS packages with Airflow DAGs for observability, maintainability & scheduling control.

Rolling out Claude — skills, agents & MCP servers as a developer-productivity and incident-response accelerator across dbt, Terraform, SQL & Airflow.

Globant Semi Senior Advanced Data Engineer

Mar 2024 – Nov 2025

Enterprise data consolidation into Snowflake

Consolidated large on-prem & Amazon RDS Oracle databases into Snowflake — including ~3 TB tables · 50M+ rows/day — via ETL, ELT & Reverse-ETL.

Redesigned the data model from scratch (~20 models across ~50 tables) — reporting 10 h → 5 min.

Built Snowflake stored procedures and high-volume load patterns with Snowflake + Airflow + Airbyte; enhanced pipelines with SnowPark via Notebooks, Worksheets & Streamlit.

Globant Semi Senior Data Engineer

Dec 2022 – Mar 2024

End-to-end data lakehouse on AWS

Built end-to-end pipelines for a conflict-of-interest application — ~10 sources at ~40 GB/day into a .NET app.

Architected the data layer with AWS Neptune (entity-relationship logic) and OpenSearch (fast per-company lookups).

Implemented a data lake with Apache Spark + Iceberg on AWS Glue & Athena; orchestrated with Step Functions, Glue & Lambda. Earlier: documented IBM DataStage ETL and automated XML extraction in Python.

Rappi Data Engineer · RappiPay

Oct 2021 – Dec 2022

Built data marts in Snowflake for BI, Data Science & Operations — ~60 GB/day · peak ~150M rows.

Migrated batch ingestion from Apache NiFi to Spark on AWS Glue — 6 h → 20 min.

Built ETL/ELT for the legal & regulatory reporting behind RappiPay's banking-license process in Colombia — contributing to a new bank that remains in operation today. Added Apache Kafka streaming for near-real-time statements.

Esri Colombia Technical Marketing Engineer · GIS Advisor

Feb 2019 – Sep 2021 · Bogotá, Colombia

Built Python, R & Bash geospatial pipelines — a census-cleaning script that did in 1 day what took weeks, and COVID-19 automation 15 h → 15 min.

Developed on-demand geospatial tooling for internal clients and supported ArcGIS / ArcGIS Online through demos and technical talks.

02 / About me

Beyond the pipelines.

Away from the data platforms, I'm happiest in motion — traveling to new places, hiking trails, and playing sport. I currently train CrossFit at an intermediate level and run regularly. Sport keeps the same discipline I bring to engineering: show up, measure, improve.

Languages

Spanish

Native

English

Full Professional

Portuguese

Elementary

Sports I play

CrossFit INTERMEDIATE Running CURRENT Swimming Tennis Basketball Ping Pong

Travel

New places, new perspectives.

Hiking

Trails, elevation, quiet.

03 / Reading

Books I recommend.

A short shelf on stoicism, mindset, and the practice of staying present.

Meditations

Marcus Aurelius

Stoicism

Ego Is the Enemy

Ryan Holiday

Humility

The Power of Now

Eckhart Tolle

Presence

Mindset

Carol S. Dweck

Growth

The Courage to Be Disliked

Kishimi & Koga

Freedom

04 / Skills

The stack I build on.

Data Engineering

ETL / ELT / Reverse-ETL Dimensional Modeling Data Lakehouse Data Governance

Cloud & Warehouses

AWS Glue Athena Redshift Step Functions Neptune OpenSearch Snowflake SnowPark Streamlit

Processing & Orchestration

Apache Spark Iceberg Airflow Kafka NiFi Airbyte dbt

Languages & Tooling

Python SQL Bash R Terraform Git

AI / ML

Agentic AI — Claude (skills, agents, MCP) LLM Integration Generative AI Deep Learning

Certifications

AWS Certified — Data Engineer, Associate

DeepLearning.AI — Data Engineering

Deep Learning Specialization

Claude Certified Architect — Foundations

Education

Ingeniero Catastral y Geodesta

Universidad Distrital Francisco José de Caldas · Cadastral & Geodetic Engineering

Aug 2021

Técnico de Sistemas

SENA · Software Systems Technician

2014

Raúl CamiloMartín Bernal