A version of this blog appeared in TechCrunch: Is modern data stack just new wine in an old bottle?
Remember the cable, phone and internet combo offers that we used to receive in our mailbox? These offers from cable companies are highly optimized for conversion. The type of offer and the monthly price can vary significantly between two houses right next to each other or even between different condos in the same building. I know because I used to be a data engineer once and built Extract-Transform-Load (ETL) data pipelines for this type of offer optimization. Part of my job involved unpacking encrypted data feeds, cleaning them to remove rows or columns that had missing data and map the fields our internal data models. The clean, updated data was then used by our statistics team for modeling the best offer for each household. This was almost a decade ago. Now take this process that I described, run it on steroids for 100x larger datasets and that’s the scale that mid-sized and large organizations are dealing with today.
Take for example, a single video conferencing call can generate logs that require 100s of storage tables. Cloud has fundamentally changed the way business is done because of its unlimited storage and scalable compute resources at an affordable price. A simple comparison between the old and modern stack looks like this:

Why do data leaders today care about the modern data stack?
- Self-service analytics: the citizen-developers want access to critical business dashboard in real-time. Their desire is (Read more...)