Nvidia’s Stephen Jones on the toolkit powering GPUs: ‘A wild ride’

The topic of Nvidia’s Stephen Jones on the toolkit powering GPUs: ‘A wild ride’ is currently the subject of lively debate — readers and analysts are keeping a close eye on developments.

This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

Nvidia CEO Jensen Huang often shares the story of hand-delivering an AI supercomputer to OpenAI in 2016, back before it was the hotshot company it’s become in recent years.

A key ingredient in the box was Nvidia’s CUDA toolkit, which helped turn OpenAI’s experiments into a foundation for modern AI applications. Huang credits the software platform as being the foundation for Nvidia’s success in AI and high-performance computing.

At 20 years old, CUDA is still going. It’s driving Nvidia’s re-imagination of hardware in new areas that range from quantum computing to robotics, modern machinery and even autonomous vehicles.

Put simply, the CUDA toolkit includes programming tools, a compiler stack and libraries and effectively unlocks the computing capacity of GPUs. For nearly two decades, CUDA architect Stephen Jones has had a front-row seat to the toolkit’s evolution and he’s continuing the work of engineers such as the late John Nickolls, who championed CUDA’s development.

Computerworld sat down recently with Jones to talk about CUDA, AI, and the future.

CUDA architect Stephen Jones has been at Nvidia for nearly two decades overseeing and shaping the CUDA toolkit’s evolution.

It’s been 20 years of CUDA. How’s it been? ”A wild ride. When you build something and people do stuff with it that you had never thought of, that’s the true reward of engineering.

“We started out building a tool good at parallel programming…[it] and turns out everybody needs parallel programming. What’s ChatGPT at now? Like one-third of the world population. I wasn’t one of the founders of CUDA. I joined after CUDA 1.0…. I was in the first dozen and it’s amazing.

“And now it’s used widely around the world every day on ChatGPT by people that have accounts and they’ve touched it. The best engineering is invisible, right?”

Did you know AI was coming before the hardware was developed? And what was the thought process at the programming level to get ready for it? “The watershed moment for AI was AlexNet in 2012. A niche mathematical subject suddenly beat humans and became interesting and powerful.

“At that point we already had people inside Nvidia working on what was going to become CuDNN, because it was looking interesting. That watershed moment really just shifted the gears and opened this whole new avenue of things we could support.

I worked very hard in CUDA to build as general a base as possible, because I don’t know what’s coming. Even just in AI, it changes every six months. But AI is just one piece. There’s supercomputing, there’s robotics and machinery control.

“A huge goal for us is to not build single-purpose tools [and] make sure that what we build can be applied generally. And when we spot someone doing something really interesting, we expand in that direction.

“Especially now, the explosion of the things people are doing — there’s always new stuff around the corner.”

How did CUDA evolve? “We announced it in 2006; 1.0 officially was early 2007. When we first were building it, barely anything worked. The hard part was what can we enable that makes it useful at least a bit. You have to get feedback and people trying as you build things.

“My first week at Nvidia, they said, ‘Go write a program in CUDA.’ And I said, ‘How do I debug it? Where’s my printf?’ And they said there is no printf. So I wrote printf. The most useful thing I’ve ever done at Nvidia was writing printf in my first week. Because you need to debug your stuff.

“The hardest part of my job is that nobody ever publishes failure. There are all these papers, I accelerated my code by 100x with CUDA, but what was hard? When I see half a dozen people all doing the same thing, it says we’re missing something.

“So, we’ve transitioned from how to get it functional enough to be broadly useful to what directions do we start extending it in? That was a real mental shift for us, helped by the exponential adoption that was happening.

“If we build things so people have to write less CUDA, then we have succeeded. You can get to the GPU with less effort and fewer lines of code.”

Do you build CUDA first and then decide the GPU design, or is there cooperation between the two? ”As one of the CUDA architects, literally half my job is working with the hardware team. The thing that Nvidia has really done magnificently well is the co-design between the hardware and the software.

“When the hardware [team] says, ‘Here are some things we’re thinking of building,’ we from CUDA are in the room saying, ‘Make these small changes so that we can project it outwards in software.’

“Or we’ll say, our users are lacking this thing… can we build something in hardware to make this faster? It is literally half my hours in any week talking to hardware people and half to software people.

“A chip takes about four years to build, and we are there for all those four years. We’ve got Rubin sort of around the corner, but there are ones beyond that that haven’t even been named, and we’re already working on those.

“There is no ‘build it and then adapt the software’ to it. We really build it holistically, trying to think of the whole thing.”

Clearly, you needed to know AI was coming to make sure GPUs were ready for it. And AI is a whole new style of computing. How did you change the CUDA stack? “The secret of CUDA is that it’s not one thing, right? It’s like this massive stack of hundreds and hundreds of things…. You can pick and choose which one you want to use.

“It’s also all the things that everybody else builds on top of it. We can support, but we couldn’t do anything without all the people who are also building on top of it.

“What’s fascinating to me about AI and HPC is that they’re really built on the same heavy computational basis. Two different angles of viewing the same thing. All the tools we built for supercomputing applied to the AI world. And the lessons in AI are now being applied back to supercomputing. It’s going both ways.

“Classical von Neumann computing is that logical, repeatable, systematic computing that’s been around for 50 years. AI is probabilistic. It’s not repeatable. It’s approximate.

“Your LLM isn’t necessarily producing the same tokens two or three times in a row, whereas I definitely want my bank statement to produce the same numbers.

“The software and hardware teams work closely together, so we can see this coming. The emphasis on matrix workloads in AI is pushing up against the laws of physics, Moore’s law, information theory. It drives reductions in precision.

“I wouldn’t say it was predictable because we didn’t know what the models would look like, but the things the models need to function, that’s governed by the laws of physics. A lot of it is reacting to the constraints the world puts on us to take advantage of this AI universe that just opened up.”

Are you seeing the gap narrow between when CUDA features come out and when enterprises actually adopt them? “With any engineering tool, especially a platform as big as CUDA, different people pick it up at different rates. Academic researchers are on something the day we put it out. Established companies who are more risk-averse take more time.

“But you can easily imagine people who have established businesses and established software stacks don’t want to pull something in right away. And it’s more expensive for them. If you’ve got a million lines of code, adopting something new is much harder than if you’ve got 10,000 lines of code.

“So any feature we put out, we see early adopters, mid-adopters and late adopters. We have a chance to evolve it. It’s not static, we are definitely reacting and responding.

“In some ways, the time for the big players to adopt things is reducing. It’s part of that incredibly rapid change that AI is undergoing. We make heavy use of AI tools internally ourselves, so that helps us keep up. Everybody is using these tools that get invented to make more tools. The way that software is developed is even shifting in this day and age.

“It’s not like we build a thing and throw it over a wall. We’re building something in response to what people need, and as we build the tools, people can do more and there’s a feedback loop — the same feedback loop that drove how we developed CUDA in the early days. We’re seeing it show up in AI, we’re seeing it show up in quantum.

“People are developing quantum algorithms on GPUs today before quantum computers are ready, so that when the quantum computer technologies catches up, they’re not starting from scratch.”

Keeping in mind that you track the future of computing for CUDA, what excites you the most? “We talked about how AI is opening new doors that have never been opened before. I was at Stanford a few months ago talking to some brilliant undergrads and graduate students about their startup ideas. The energy they have for what they want to do — realized I was envious of where they are.

“I came of age in the ‘90s, when the internet was just showing up. That was fascinating, all these things you could do with the .com. But this is an even bigger step. Everybody coming out of college now has a million choices. There’s all these things to explore that nobody’s even thought of before.

“What they do is going to feed back and change what we build as the underlying platform. I wish I could live to see all the things that they’ll do, but they’re younger than me.

“The next 10 years is going to be wild. I’ve had a great journey, don’t get me wrong, but my God, I am envious of a 25-year-old. It’s unlimited what you can do. I wish I was 25 again.”

Why it matters

News like this often changes audience expectations and competitors’ plans.

When one player makes a move, others usually react — it is worth reading the event in context.

What to look out for next

The full picture will become clear in time, but the headline already shows the dynamics of the industry.

Further statements and user reactions will add to the story.

Nvidia’s Stephen Jones on the toolkit powering GPUs: ‘A wild ride’

You Missed

10 quick productivity tips for Microsoft 365 mobile apps

Relying on LLMs is nearly impossible when AI vendors keep changing things

25 great uses for an old Android device

The Shift from Determinism to Probabilism Is Bigger Than Analog to Digital

Nvidia’s Stephen Jones on the toolkit powering GPUs: ‘A wild ride’

Related Post

10 quick productivity tips for Microsoft 365 mobile apps

Relying on LLMs is nearly impossible when AI vendors keep changing things

25 great uses for an old Android device

You Missed

10 quick productivity tips for Microsoft 365 mobile apps

Relying on LLMs is nearly impossible when AI vendors keep changing things

25 great uses for an old Android device

The Shift from Determinism to Probabilism Is Bigger Than Analog to Digital