Select Star raises seed to automatically document datasets for data scientists


This post is by Danny Crichton from Fundings & Exits – TechCrunch


Back when I was a wee lad with a very security-compromised MySQL installation, I used to answer every web request with multiple “SELECT *” database requests — give me all the data and I’ll figure out what to do with it myself.

Today in a modern, data-intensive org, “SELECT *” will kill you. With petabytes of information, tens of thousands of tables (on the small side!), and millions and perhaps billions of calls flung at the database server, data science teams can no longer just ask for all the data and start working with it immediately.

Big data has led to the rise of data warehouses and data lakes (and apparently data lake houses), infrastructure to make accessing data more robust and easy. There is still a cataloguing and discovery problem though — just because you have all of your data in one place doesn’t mean a data scientist knows what the data represents, who owns it, or what that data might affect in the myriad of web and corporate reporting apps built on top of it.

That’s where Select Star comes in. The startup, which was founded about a year ago in March 2020, is designed to automatically build out metadata within the context of a data warehouse. From there, it offers a full-text search that allows users to quickly find data as well as “heat map” signals in its search results which can quickly pinpoint which columns of a dataset are most used by applications within a (Read more…)