ETL runtime cut from 14 hours to 22 minutes — with a 60% lower cloud bill.
The situation
A US financial-services firm was running its nightly reporting on a tangle of legacy SQL Server stored procedures and SSIS packages. The full ETL cycle took roughly 14 hours, which meant analysts routinely started their day on stale data, and any failure mid-run pushed reporting into the afternoon. Every schema change was a multi-day risk because nothing was tested or version-controlled.
What we did
- Migrated 47 on-premise SQL Server sources into a Delta Lake lakehouse on Azure Databricks, using the medallion (bronze/silver/gold) pattern.
- Rebuilt transformations in PySpark with incremental processing, replacing full-table reloads that were the main cause of the long runtime.
- Orchestrated the whole flow in Azure Data Factory with retries, alerting, and data-quality checks so failures surface immediately instead of silently.
- Right-sized clusters with autoscaling, which is where most of the cost reduction came from.
- Ran a parallel cutover so the old and new pipelines produced identical output before switching off the legacy system — zero data loss.
The result
The nightly cycle now finishes in about 22 minutes — a 97% reduction — so analysts have fresh data before the workday starts. Right-sizing and incremental loads cut compute cost by roughly 60%. Because everything is orchestrated, tested, and version-controlled, schema changes that used to take days now ship in hours.
"We went from dreading the morning data refresh to not thinking about it at all. That's the highest compliment I can give a pipeline."
— Head of Data, US financial-services client (name withheld under NDA)