How AI Actually Writes: Tensors, Attention, and Why Engineers Cannot Leave CUDA

This is Part 2 of the NVIDIA Series. Part 1 is available here.

Words Become Numbers

Before a language model can process text, it has to convert every word into something a computer can calculate with: numbers.

"Tokyo" becomes something like [0.82, 0.14, 0.67, ...] — a list of several hundred numbers. "Osaka" becomes [0.79, 0.11, 0.71, ...] — similar, but not identical.

Why? Because once words are numbers, you can measure the distance between them. "Tokyo" and "Osaka" are numerically close — they're both Japanese cities. "Tokyo" and "apple" are numerically distant. This spatial representation lets the model reason about relationships between concepts without anyone writing explicit rules.

This process is called embedding — placing words on a vast, multi-dimensional map where similar concepts cluster together.

Reading the Room: How Attention Works

Here's the problem embedding alone doesn't solve.

The word "bank" means something different in "I went to the bank to deposit money" versus "we fished from the river bank." The same token, completely different meaning depending on context.

The solution is called Attention — arguably the most important algorithmic insight in modern AI.

When processing any word, the model simultaneously calculates the relevance of every other word in the sentence to that word. "Bank" near "deposit," "account," "interest" gets weighted toward financial meaning. "Bank" near "river," "fish," "current" gets weighted toward terrain.

The NFL analogy holds again here. Before a quarterback releases the ball, his eyes are processing every receiver simultaneously — tracking routes, reading the coverage, weighting probabilities in real time. He doesn't look at one receiver, decide, then look at the next. It's parallel assessment of the full field, resolved into a single throw.

Attention works the same way. Every word evaluates every other word at once. The result is a rich, context-sensitive understanding that makes language models dramatically better than anything that came before.

Tensors: The Spreadsheet Behind the Magic

This is where the data structure enters the picture.

As words get converted to number lists, and those lists interact through Attention calculations, you end up with multi-dimensional arrays of numbers — rows, columns, depths. The technical name for these is tensors.

Think of a tensor as an extremely large, multi-dimensional spreadsheet. A sentence of twenty words, each embedded as a five-hundred-number list, produces a spreadsheet with twenty rows and five hundred columns. The Attention calculation multiplies this spreadsheet by itself, in various configurations, billions of times.

GPT-4-scale models have over one trillion parameters. Each inference — each response you receive — involves tensor operations at that scale. This is why the hardware matters so much. And this is why GPU architecture, which was designed for exactly this kind of parallel matrix multiplication, became the engine of the AI era.

The I-formation CPU processes one row at a time. The shotgun GPU processes every row simultaneously. For tensor operations, there is no comparison.

The CUDA Lock-In: A Twenty-Year Moat

Understanding CUDA requires understanding what it replaced.

Before CUDA, programming a GPU required writing code in specialized graphics languages that were almost entirely inaccessible to non-graphics engineers. AI researchers couldn't use GPU power because the tools didn't exist.

NVIDIA changed this in 2006 by releasing CUDA — a software layer that let general-purpose programmers access GPU parallel processing using familiar programming languages. For the first time, an AI researcher could harness a GPU without becoming a graphics engineer first.

What happened over the next twenty years was compounding. Every major AI framework built itself on CUDA:

PyTorch — the dominant research framework at virtually every university and AI lab
TensorFlow — Google's production AI platform
Every GPU-accelerated scientific computing library in existence

Today, when an engineer sits down to train or deploy an AI model, they're not choosing between CUDA and something else. CUDA is the assumed environment. The question doesn't come up.

AMD has produced competitive GPUs for years. The hardware gap with NVIDIA has narrowed meaningfully. But market share in AI compute has barely moved. The reason is straightforward: switching away from CUDA would require rewriting or revalidating every piece of software a team relies on. Not one library. All of them. That's not a technical decision — it's an organizational one that no CTO wants to make.

NVIDIA's moat isn't the chip. It's the twenty years of accumulated institutional behavior built around the chip.

The Fabless Model: Designing the Playbook, Outsourcing the Field

One more piece of the investment thesis deserves attention.

NVIDIA employs about 35,000 people and generates roughly $130 billion in annual revenue. It owns no semiconductor factories. Every physical chip is manufactured by TSMC in Taiwan.

This fabless model — design in-house, manufacture externally — produces operating margins around 80%. For context, Apple's iPhone business runs about 30% margins. Toyota runs about 10%.

The economics work because NVIDIA isn't really selling hardware. It's selling access to the CUDA ecosystem — and that ecosystem costs essentially nothing to replicate per unit sold once it exists. The R&D investment is sunk; every incremental sale flows almost entirely to profit.

TSMC runs the plays. NVIDIA collects the licensing fees on the playbook.

What This Means for Japanese Equity Investors

The companies best positioned to benefit from sustained AI demand are those supplying inputs that scale with compute volume — regardless of which AI company ultimately wins the application layer.

Company	Role	AI Demand Link
Tokyo Electron (8035)	Fab equipment	TSMC capex → direct orders
Advantest (6857)	Test equipment	AI chip complexity → testing demand
Shin-Etsu Chemical (4063)	Wafers, photoresist	Chip volume → material demand
SUMCO (3436)	Wafers	Same

These companies don't need to predict whether OpenAI or Google or a Chinese challenger wins the model race. They supply the infrastructure that all of them require. In that sense, they are the toll roads of the AI economy — and the traffic is only growing.

Part 3 — "Chips Are Approaching the Size of Atoms: Japan's Materials Moat" — is available here.

Source: Company IR materials and public filings | 日本語版

Disclaimer | This article is for informational purposes only and does not constitute investment advice. URL: analysis/2026/03/nvidia-ai-series-02/Save_As: analysis/2026/03/nvidia-ai-series-02/index.html

How AI Actually Writes: Tensors, Attention, and Why Engineers Cannot Leave CUDA

Words Become Numbers

Reading the Room: How Attention Works

Tensors: The Spreadsheet Behind the Magic

The CUDA Lock-In: A Twenty-Year Moat

The Fabless Model: Designing the Playbook, Outsourcing the Field

What This Means for Japanese Equity Investors

Is Google Dying from AI, or Being Reborn by It? The Arsonist's Fire Hose Problem

Chips Are Approaching the Size of Atoms — Japan's Invisible Moat in the AI Supply Chain

Why AI Exploded — And Why NVIDIA Won: A Shotgun Formation Guide for Investors

Tomen Devices Rides the AI Memory Wave — A Samsung Distributor in the Right Place at the Right Time