In this episode of DEMO, host Keith Shaw is joined by Matt Fuller, Co-Founder of Starburst, to explore how their innovative data platform unifies access to distributed data—whether it's in the cloud, across regions, or still on-premises. You’ll see a live demo of how Starburst enables: ? Fast, federated queries across multiple data sources (MongoDB, PostgreSQL, Salesforce, S3, and more) ? AI-ready data architecture with support for vector search and LLMs ? Simplified data product creation for analytics, apps, and AI agents ? Built-in governance tools like PII tagging and access control ? No need to move or duplicate data—query it where it lives! Whether you're scaling AI, building data products, or modernizing legacy analytics, Starburst helps you access and activate your data faster. ? Try Starburst for free: starburstdata.com ? Like this video, ? share your thoughts, and ? subscribe for more tech product demos! Timestamps: 00:00 – Welcome & Intro 00:17 – What is Starburst? 01:00 – Real-World Problems Solved 04:00 – Federated Queries Across Systems 06:30 – Creating Simplified Data Products 08:45 – Access Control & Governance Features 12:15 – LLM Integration & Vector Search Demo 15:00 – How to Get Started for Free
Register Now
Keith Shaw: Hi, everybody. Welcome to DEMO, the show where companies come in and show us their latest products and platforms. Today, I’m joined by Matt Fuller. He is the co-founder of Starburst. Welcome to the show, Matt. Matt Fuller: Thank you.
Keith: I love the name of the company—it’s just a fantastic name. So, tell us a little bit about Starburst, and what you're here to show us today. Matt: Yeah, absolutely. Starburst is an open data platform to power both data apps and AI.
You can think of it as being able to run queries whether you're using traditional BI tools like Tableau or Power BI. But we're also seeing a lot of usage by people who are trying to build data apps or power their AI applications.
Keith: All right, so I heard the word “data.” I’m assuming this is designed for data scientists, data analysts—anybody dealing with data? Or is it something everyone in the company can benefit from? Matt: Yeah, absolutely. Let's double-click a bit into what our platform is.
It’s based on an open-source project called Trino. When we founded the company back in 2017, we built it around Trino, which is a distributed SQL query engine that can connect to various data sources. That’s a core component of our product.
We've added a full set of capabilities around it to help data engineers and those building AI applications—what we sometimes call data producers—access and prepare data for the end consumers.
That consumer might be human-led, using tools like Tableau or Power BI, or machine-led—such as an AI agent or data application.
Keith: And what problems are you solving? Obviously, you wouldn’t have started the company if there wasn’t a problem to fix. Matt: A few things. First, it's about getting to the data where it is. We're a powerful engine for querying data from data lakes.
But often the challenge is: how do you get the data into one place? We can augment that process by connecting to various sources. Especially with AI, the effectiveness of your model depends on the quality of the data it gets.
We think of ourselves as the fuel that powers AI. Also, we allow experimentation and production-level usage without forcing you to centralize everything. You can experiment quickly and move to production faster. Uniquely, we support both cloud and on-prem environments.
So we can reach across clouds, across regions—even back to on-prem. Keith: Was the growth of cloud computing one of the main reasons data started getting so fragmented? Matt: That’s part of it. Companies moved to the cloud but still have legacy systems on-prem.
Some keep data on-prem due to privacy or security needs. In M&A scenarios, a company might acquire another with data in Google Cloud while they use AWS. There are lots of reasons for the fragmentation. Keith: So, what kinds of problems would a company face if they weren’t using Starburst?
Matt: They’d likely need more tools and would rely heavily on ETL processes to move everything to one place. ETL has its role, but we don’t make it a requirement. You can connect directly to the data source, experiment, and decide whether you want to move it later.
This gives you faster time-to-value with less complexity. Keith: All right, let’s jump into the demo and see what you've got. Matt: Yeah, absolutely. Let me jump over here. In this demo, we’re pretending to be an airline or travel company. We have data across different sources.
For example, we’re connected to MongoDB for customer profiles, Postgres for ancillary purchases and loyalty programs, Salesforce for customer engagement, and an “Ice House” location with bookings and historical flight data. These data sources could be across different regions—Postgres in AWS US East 1, S3 in US East 2, etc.
Yet, we provide a centralized view.
Now, if you want to run queries, it can get tricky. I’ll use our UI for simplicity, but you could use Tableau, DataGrip, or others. In this query, I’m joining Mongo, Postgres, and bookings data.
You’ll notice I have to cast some column types and manipulate the data because I need to know where everything is. To simplify this, we created a concept called data products. This abstracts away complexity.
Your data engineers create these products, and end users just see unified tables—no need to know which data came from which source.
We also have a powerful search function. For example, I can search for “Customer 360,” and it pulls up everything tied to that view—bookings, ancillary purchases, customer service, etc. It’s more than a catalog; it includes access control and query processing.
We also support tagging sensitive data like PII. For example, if a data product contains customer names or emails, we can mask those fields depending on the user's role. Let me demonstrate.
I’ll switch from a data engineer role to an analyst role. Running the same query now masks the PII fields automatically. That’s because we have a built-in access control system. In this case, a policy hashes data tagged as PII—no data copying needed. You apply governance policies in place.
And that’s one of the strengths of data products. Instead of data swamps with mismatched subsets, you get trusted, curated data assets with fine-grained control.
Now, behind the scenes, all of this runs on Trino clusters. We’ve enhanced Trino in our version, which acts as the SQL processing engine. It pushes down parts of queries to underlying systems like Mongo or Postgres but does much of the processing itself.
We also support data quality checks. For instance, if the “delay reason” column in our flight history data is null more than 10% of the time, we flag that. In this case, it passes the check.
Next, I want to show some new functionality. We now support vector search and invoking LLM functions via SQL. Typically, you’d need a vector database, but we store vector embeddings directly in Iceberg tables in the data lake. This simplifies things and supports retrieval-augmented generation (RAG) pipelines.
Let’s say I want to find delays caused by “heavy rain.” I run a semantic search over the delay reason field, retrieve the relevant records, and then feed the results into an LLM prompt using standard SQL.
The model analyzes the flight data and provides a possible explanation based on the route and weather patterns. It’s a powerful way to integrate your data infrastructure directly with generative AI. Keith: Very cool.
So where can people go to try this out? Do you offer a free trial? Matt: Yes, just go to StarburstData.com and sign up for a free trial of Galaxy. You get free credits to get started. Keith: All right. Matt Fuller, thanks again for joining us on DEMO.
Matt: Thank you. Keith: That’s all the time we have for today’s show. Be sure to like the video, subscribe to the channel, and leave your thoughts below. Join us every week for new episodes of DEMO. I’m Keith Shaw—thanks for watching. ?
Sponsored Links