AWS
Stop burning cash on AWS crawlers
Real-time partition discovery at a fraction of the cost
AWS
Real-time partition discovery at a fraction of the cost
AWS Glue
How AWS Glue can bill Streaming jobs at termination (if you are not careful)
apacheiceberg
Compact your Iceberg tables using Spark or AWS Athena
Unveiling the Future of Data Architecture
AWS Glue
Custom Log4J for Spark on Glue to decrease CloudWatch cost!
AWS
Or: how to prevent your CFO from turning 50 shades of red
System Integration Test
Or how to do Data Engineering like Metallica!
great-expectations
Enable advanced dashboarding on Great Expectation results using metrics
thinnest-viable-platform
Welcome to the first part of our series on data platforms! In this series, we are going to dive into data platforms. In part 1 of the series, we start out by going over the main principles for platform success. In our unique role as a data consultanc...
PySpark
Context A few weeks ago we were contacted by FrieslandCampina to help them with a problem they faced on their recommendation engine. Being one of the biggest dairy companies in the world they sell hundreds of dairy products to millions of customers a...
software development
Do you remember the last time you opened your laptop and thought: "I can't wait to spend half my day writing configs in YAML."? Yeah, neither do I. But if you use dbt heavily, I'm afraid you and YAML are in for the long run. Even with the help of dbt...
Software Engineering
Transitioning from a web development background to data engineering, I've encountered a significant cultural shift. One of the most striking differences is the apparent lack of established software engineering practices within many data engineering t...