娇色导航

How Starburst simplifies data access for AI & analytics across cloud, on-premises

Episode 61 How Starburst simplifies data access for AI & analytics across cloud, on-premises

Jul 9, 202517 mins

Analytics Chief Data Officer Generative AI

Jump to

Read transcript

Overview

In this episode of DEMO, host Keith Shaw is joined by Matt Fuller, Co-Founder of Starburst, to explore how their innovative data platform unifies access to distributed data—whether it's in the cloud, across regions, or still on-premises.

You’ll see a live demo of how Starburst enables:
? Fast, federated queries across multiple data sources (MongoDB, PostgreSQL, Salesforce, S3, and more)
? AI-ready data architecture with support for vector search and LLMs
? Simplified data product creation for analytics, apps, and AI agents
? Built-in governance tools like PII tagging and access control
? No need to move or duplicate data—query it where it lives!

Whether you're scaling AI, building data products, or modernizing legacy analytics, Starburst helps you access and activate your data faster.

? Try Starburst for free: starburstdata.com

? Like this video, ? share your thoughts, and ? subscribe for more tech product demos!

Timestamps:
00:00 – Welcome & Intro
00:17 – What is Starburst?
01:00 – Real-World Problems Solved
04:00 – Federated Queries Across Systems
06:30 – Creating Simplified Data Products
08:45 – Access Control & Governance Features
12:15 – LLM Integration & Vector Search Demo
15:00 – How to Get Started for Free

Transcript

00:00

Keith Shaw: Hi, everybody. Welcome to DEMO, the show where companies come in and show us their latest products and platforms. Today, I’m joined by Matt Fuller. He is the co-founder of Starburst. Welcome to the show, Matt. Matt Fuller: Thank you.

Keith: I love the name of the company—it’s just a fantastic name. So, tell us a little bit about Starburst, and what you're here to show us today. Matt: Yeah, absolutely. Starburst is an open data platform to power both data apps and AI.

You can think of it as being able to run queries whether you're using traditional BI tools like Tableau or Power BI. But we're also seeing a lot of usage by people who are trying to build data apps or power their AI applications.

00:34

Keith: All right, so I heard the word “data.” I’m assuming this is designed for data scientists, data analysts—anybody dealing with data? Or is it something everyone in the company can benefit from? Matt: Yeah, absolutely. Let's double-click a bit into what our platform is.

It’s based on an open-source project called Trino. When we founded the company back in 2017, we built it around Trino, which is a distributed SQL query engine that can connect to various data sources. That’s a core component of our product.

We've added a full set of capabilities around it to help data engineers and those building AI applications—what we sometimes call data producers—access and prepare data for the end consumers.

That consumer might be human-led, using tools like Tableau or Power BI, or machine-led—such as an AI agent or data application.

01:23

Keith: And what problems are you solving? Obviously, you wouldn’t have started the company if there wasn’t a problem to fix. Matt: A few things. First, it's about getting to the data where it is. We're a powerful engine for querying data from data lakes.

But often the challenge is: how do you get the data into one place? We can augment that process by connecting to various sources. Especially with AI, the effectiveness of your model depends on the quality of the data it gets.

We think of ourselves as the fuel that powers AI. Also, we allow experimentation and production-level usage without forcing you to centralize everything. You can experiment quickly and move to production faster. Uniquely, we support both cloud and on-prem environments.

So we can reach across clouds, across regions—even back to on-prem. Keith: Was the growth of cloud computing one of the main reasons data started getting so fragmented? Matt: That’s part of it. Companies moved to the cloud but still have legacy systems on-prem.

Some keep data on-prem due to privacy or security needs. In M&A scenarios, a company might acquire another with data in Google Cloud while they use AWS. There are lots of reasons for the fragmentation. Keith: So, what kinds of problems would a company face if they weren’t using Starburst?

Matt: They’d likely need more tools and would rely heavily on ETL processes to move everything to one place. ETL has its role, but we don’t make it a requirement. You can connect directly to the data source, experiment, and decide whether you want to move it later.

This gives you faster time-to-value with less complexity. Keith: All right, let’s jump into the demo and see what you've got. Matt: Yeah, absolutely. Let me jump over here. In this demo, we’re pretending to be an airline or travel company. We have data across different sources.

04:28

For example, we’re connected to MongoDB for customer profiles, Postgres for ancillary purchases and loyalty programs, Salesforce for customer engagement, and an “Ice House” location with bookings and historical flight data. These data sources could be across different regions—Postgres in AWS US East 1, S3 in US East 2, etc.

Yet, we provide a centralized view.

05:45

Now, if you want to run queries, it can get tricky. I’ll use our UI for simplicity, but you could use Tableau, DataGrip, or others. In this query, I’m joining Mongo, Postgres, and bookings data.

You’ll notice I have to cast some column types and manipulate the data because I need to know where everything is. To simplify this, we created a concept called data products. This abstracts away complexity.

Your data engineers create these products, and end users just see unified tables—no need to know which data came from which source.

06:54

We also have a powerful search function. For example, I can search for “Customer 360,” and it pulls up everything tied to that view—bookings, ancillary purchases, customer service, etc. It’s more than a catalog; it includes access control and query processing.

07:58

We also support tagging sensitive data like PII. For example, if a data product contains customer names or emails, we can mask those fields depending on the user's role. Let me demonstrate.

09:08

I’ll switch from a data engineer role to an analyst role. Running the same query now masks the PII fields automatically. That’s because we have a built-in access control system. In this case, a policy hashes data tagged as PII—no data copying needed. You apply governance policies in place.

10:43

And that’s one of the strengths of data products. Instead of data swamps with mismatched subsets, you get trusted, curated data assets with fine-grained control.

10:53

Now, behind the scenes, all of this runs on Trino clusters. We’ve enhanced Trino in our version, which acts as the SQL processing engine. It pushes down parts of queries to underlying systems like Mongo or Postgres but does much of the processing itself.

11:42

We also support data quality checks. For instance, if the “delay reason” column in our flight history data is null more than 10% of the time, we flag that. In this case, it passes the check.

12:15

Next, I want to show some new functionality. We now support vector search and invoking LLM functions via SQL. Typically, you’d need a vector database, but we store vector embeddings directly in Iceberg tables in the data lake. This simplifies things and supports retrieval-augmented generation (RAG) pipelines.

14:22

Let’s say I want to find delays caused by “heavy rain.” I run a semantic search over the delay reason field, retrieve the relevant records, and then feed the results into an LLM prompt using standard SQL.

The model analyzes the flight data and provides a possible explanation based on the route and weather patterns. It’s a powerful way to integrate your data infrastructure directly with generative AI. Keith: Very cool.

15:28

So where can people go to try this out? Do you offer a free trial? Matt: Yes, just go to StarburstData.com and sign up for a free trial of Galaxy. You get free credits to get started. Keith: All right. Matt Fuller, thanks again for joining us on DEMO.

Matt: Thank you. Keith: That’s all the time we have for today’s show. Be sure to like the video, subscribe to the channel, and leave your thoughts below. Join us every week for new episodes of DEMO. I’m Keith Shaw—thanks for watching. ?

Show me more

A scalable framework for digital transformation in retail

By Kamanasish Kundu

Jul 9, 20257 mins

Digital TransformationIT StrategyRetail Industry

Rewriting the rules of enterprise architecture with AI agents

Jul 9, 202510 mins

Data GovernanceEnterprise ArchitectureIT Governance

Why CIOs see APIs as vital for agentic AI success

By Bill Doerrfeld

Jul 9, 202510 mins

APIsCIOIT Skills

Agentic AI, the tech ecosystem, leadership -- all topics covered here with Satya Jayadev, Vice President & CIO, Skyworks Solutions Inc.

Jun 25, 20258 mins

娇色导航Leadership Live

娇色导航Leadership Live Australia with Adam Wrightson, Chief Technology Officer, HOYTS

Jun 24, 202531 mins

娇色导航Leadership Live

Navigating ransomware attacks while proactively managing cyber risks

Jun 20, 202517 mins

CybercrimeRansomware

How Starburst simplifies data access for AI & analytics across cloud, on-premises

Jul 9, 202517 mins

AnalyticsChief Data OfficerGenerative AI

How Abnormal AI automates email threat detection and response

Jul 2, 202512 mins

Email SecurityPhishingZero-Day Vulnerabilities

Synology's ActiveProtect Manager reinvents enterprise backup with speed, simplicity

Jun 25, 20259 mins

Backup and RecoveryEnterprise Storage

Sponsored Links