What happens when you move AI from the cloud to your PC? In this episode of DEMO, host Keith Shaw visits Qualcomm HQ in San Diego to explore the power of AI at the edge. Jeff Monday, VP of Global Enterprise & Channel Sales at Qualcomm, gives an in-depth look at how the Snapdragon platform — featuring an onboard Neural Processing Unit (NPU)—is enabling secure, low-latency code generation on device, without sending sensitive data to the cloud.Watch a full demo of on-device code generation using the LLaMA 3 model and Visual Studio, learn how this benefits developers and enterprise teams, and find out how Qualcomm’s AI Hub and unified architecture are driving innovation across PCs, phones, and IoT.? Key topics include: * What is an NPU and how it compares to a CPU/GPU * Benefits of running AI on the device vs. the cloud * Real-world use case with Citibank * Qualcomm AI Hub & model deployment * Future AI experiences (translation, presentation generation, and more)? Perfect for: Developers, CIOs, enterprise IT leaders, and AI enthusiasts. ? Don’t forget to like, comment, and subscribe for more tech demos every week! #Qualcomm #Snapdragon #EdgeAI #CodeGen #VisualStudio #NPUs #AIOnDevice #KeithShaw #DemoSeries This episode is sponsored by Qualcomm.
Register Now
Keith Shaw: Hi everybody, welcome to DEMO, the show where companies showcase their latest products and services. I’m here with Jeff Monday, Vice President of Global Enterprise and Channel Sales at Qualcomm. Jeff, welcome to the show. Jeff Monday: Hey Keith, thanks for having me.
Keith: You might notice we’re in a different studio—this is Qualcomm headquarters in San Diego. Thanks for hosting us. Jeff: Thanks for coming!
Keith: So you're going to show us something exciting today—we flew all the way here for this. Jeff: Absolutely. Full disclosure: I’m not a coder, I just play one on TV.
But I’m going to walk you through a code generation demo using AI at the edge—not in the cloud, but right on the device.
Keith: And there's a special chip that enables this, right? Jeff: There is. It’s called the NPU—Neural Processing Unit—designed and optimized for AI workloads. It’s a unique part of the Snapdragon architecture.
Keith: For those unfamiliar with NPUs, can you explain the difference between a CPU, GPU, and NPU? Jeff: Sure. A CPU is your general-purpose processor. A GPU handles graphics processing. An NPU is built specifically for AI workloads.
While some AI tasks can run on GPUs, they perform best on NPUs when properly optimized.
Keith: Early AI tasks were often offloaded to GPUs, right? Jeff: Exactly. But now, we’ve evolved to dedicating a chip—the NPU—for those tasks.
Jeff: And the performance leap is huge. The model I’ll show you today used to require multiple GPUs in a cloud server farm. Now it runs entirely on your local device—no internet required.
Keith: So what are the benefits of doing AI on the device rather than in the cloud or on a GPU? Jeff: Now that NPUs are mainstream, developers are building new experiences that are faster, more secure, and more private.
We’ll show a code generation demo, but this also applies to image creation, speech recognition, real-time translation—you name it.
Jeff: Microsoft, for example, is embedding on-device AI features in their Copilot+ suite. These run locally using Snapdragon chips, enabling features like semantic search and Recall—completely disconnected from the cloud. Snapdragon will be the first and best platform for those experiences.
Keith: Who is this demo primarily for? Jeff: Today’s demo is focused on developers and coders. For example, Citibank’s 娇色导航wanted to move AI workloads from the data center to the edge to reduce GPU congestion. Now, their developers can run code generation securely and privately on the device.
Keith: And that solves a lot of security and latency problems. Jeff: Exactly. When the code never leaves the device, it's more secure and private.
Jeff: The NPU enables a new class of efficiency. Take the Lenovo T14s with Snapdragon — it gets 22 hours of battery life. But when endpoint security software runs on the CPU or GPU, that drops to 4–6 hours.
By moving those workloads to the NPU, we maintain battery life and performance.
Keith: That’s great news for anyone working long hours. Jeff: Especially here at Qualcomm!
Keith: Let’s jump into the demo. Jeff: Perfect.
Jeff: I’ll open Visual Studio, a popular development platform. We’ve partnered with Microsoft to optimize it for NPUs. I’ll launch the Qualcomm AI runtime environment — an in-house code assistant used by 1,500+ engineers at Qualcomm, responsible for millions of lines of code.
Jeff: The model I’m using today is LLaMA 3, with 8 billion parameters. Three years ago, this was a cloud-only model. Now it runs on-device. I’ll start generating code and show you the Task Manager so you can see the NPU activity.
Keith: What’s the actual prompt you’re using for this demo? Jeff: I’m asking it to generate a Python script to analyze an Excel file for market share insights. It will clean the data, find correlations, and visualize the results.
It's the kind of mundane task developers don’t love doing — but now, they don’t have to.
Jeff: I’ll paste the prompt into a new Visual Studio file and run the model. You’ll see the NPU activating in Task Manager. None of this data is going to the cloud — it’s all processed locally, securely, and with minimal latency.
Jeff: The script identifies the file, cleans the data, analyzes for correlations, and plots the output. You can even configure it to export to Power BI or another visualization tool. Once I accept the code, it becomes executable.
Jeff: This kind of local code generation frees developers to focus on more creative work. Visual Studio also makes it easy to debug and correct small errors, so even if it’s not perfect, it’s much faster overall.
Keith: You mentioned earlier that many developers don’t want their code in the cloud. This solves that, right? Jeff: That’s exactly why we built it. We didn’t want our proprietary code ending up in hyperscaler models. On-device code generation protects intellectual property.
Keith: Qualcomm is known for mobile chips. Can this tech extend to phones and other devices? Jeff: Yes. Thanks to our unified architecture, the same AI models can run across PCs, phones, XR devices, and IoT.
You can even import your own models — ONNX or PyTorch—into the Qualcomm AI Hub and optimize them for NPU use.
Keith: That explains the value of an AI PC. It’s like when people first questioned AI on smartphones. Jeff: Exactly. But now we’re creating brand-new experiences — like automatically generating slide decks from whiteboard photos or conversation transcripts. Stay tuned for that in a future episode!
Keith: Here’s an idea—use real-time translation on a mobile phone or earbud when I’m traveling. Jeff: That’s already possible. We demoed it last year at Computex. It’s an incredible on-device capability.
Keith: Where can people go to learn more? Jeff: If you're an enterprise customer, contact your local Qualcomm rep. Otherwise, head to the Qualcomm AI Hub and start exploring the 175+ pre-optimized models available now.
Keith: Are systems with NPUs available now? Jeff: Yes. Since June, any device with a Snapdragon X, X Plus, or X Elite chip includes this capability — across all price points.
Keith: Jeff Monday, thanks again for the demo. Jeff: Thanks for having me. Keith: That’s all the time we have for this episode of DEMO. Don’t forget to like the video, subscribe to the channel, and leave your thoughts in the comments. I’m Keith Shaw — thanks for watching!
Sponsored Links