Skip to main content

Side Story (Lower): The Truant — China's Application Revelry and Low-Level Fracture

1:00 AM, Shanghai. The lights are on in the office of a mobile game studio. The numbers on the screen are ticking upwards—their new gacha game has been online for less than twenty-four hours, and first-day gross revenue has broken 100 million RMB. The product manager pops open a bottle of champagne. The operations director is already calculating the second week's paid conversion rate. The art team is rushing to produce the character illustrations for the next limited banner. The entire company is like a precision money-printing machine, every gear spinning at breakneck speed.

In the same time zone, Shenzhen. Inside an R&D building at Huawei, several engineers stare at red error messages on a terminal, their expressions grave. They are trying to get a server equipped with a domestic Ascend 910B AI chip to run the latest version of an open-source large language model from GitHub. The model's code is written in PyTorch. PyTorch calls CUDA underneath. Their chip is not from NVIDIA.

Error messages, line after line.

The most profitable applications and the most powerless foundation coexist within the same nation. This is not an accident. This is the settlement of a complete chain of causality stretching back thirty years.


I. The Zenith of the Spreadsheet Soul

If Google in the previous side story was the prototype of the spreadsheet soul, then China's internet giants are its ultimate form.

Tencent, NetEase, ByteDance—these three companies combined control a massive chunk of the global mobile gaming market. They did not win by making the best games. They won by making the most precise monetization machines.

The pity timers in gacha mechanics, the psychological pressure of event countdowns, the social comparison driven by rankings, the anchoring effect of first-time top-up bundles—the people standing behind these things are not game designers, but behavioral psychologists and data scientists. The exact moment a pop-up appears, the magnitude of a limited-time discount, the color and size of the "Draw Again" button—every detail is precisely calibrated through A/B testing.

The results are astounding. A successful Chinese mobile game can generate over 100 million RMB in revenue on its first day online—a figure that exceeds the first-week global sales of products polished for three to five years by hundreds of people at many Western AAA game studios.

But what these companies do has a fundamental difference from the builders of tech hegemony described in every chapter of the main text.

Microsoft used DirectX to unify the PC gaming graphics interface—this drove the evolution of GPU hardware. NVIDIA used CUDA to stuff general-purpose compute cores into gaming graphics cards—this birthed the entire AI computing industry. TSMC honed extreme yields on the massive silicon of gaming GPUs—this paved the way for the mass production of AI chips. Every player in the main text, no matter how selfish their motives or how dirty their methods, at least objectively pushed some form of low-level technological progress.

China's gaming giants have not pushed any low-level technology.

Their games run on chips designed by others (Qualcomm's Snapdragon, Apple's A-series), fabricated in factories built by others (TSMC, Samsung), built using engines developed by others (Unity, Unreal), and operate on operating systems established by others (Android, iOS). From chip to engine to OS, not a single link in the entire technology supply chain is China's own.

This is not because Chinese engineers aren't smart enough. It's because making money was too easy.

When you can use thirty people, spend six months, make a reskinned gacha game, and break 100 million in first-month revenue—why would you use three hundred people, invest ten years, and build a low-level framework with no immediate return? When venture capital money floods like water into bike-sharing, community group buying, and short-video platforms—why would capital be invested in semiconductor R&D, where the return cycle is ten years and the success rate is abysmal?

In the NVIDIA story told in Chapter 7, Jensen Huang started stuffing CUDA cores into GeForces in 2006. He waited a full six years before AlexNet found a use for CUDA. For six years, that R&D investment looked like pure waste on a financial statement.

China's capital markets do not allow six years of waste. China's internet companies do not allow foundational investments that show no return for ten years. Their systems—quarterly reviews, KPI-driven goals, investor pressure after going public—are, like Google's OKR system, structurally incompatible with long-term, low-level R&D.

So China did one thing: it built a skyscraper piercing the clouds on someone else's foundation.

The skyscraper is full of the world's most generous mobile gamers. Every floor of the skyscraper is equipped with the world's most precise paid conversion engines. From the outside, the skyscraper is magnificent.

But the foundation isn't its own.


II. The Fracture

In October 2022, the US Department of Commerce issued new export control regulations. Simply put: prohibiting the sale of advanced AI chips and semiconductor manufacturing equipment to China. NVIDIA's A100 and H100 could no longer be sold. ASML's EUV (Extreme Ultraviolet) lithography machines could no longer be sold. TSMC and Samsung could no longer fabricate advanced-process AI chips for Chinese clients.

Overnight, the foundation cracked.

But the degree of the cracking was not uniform. Some places cracked shallowly, while others cracked all the way to the root.

Where it cracked shallowly: Mature Processes.

The chips needed for electric vehicles, home appliances, and industrial controllers mostly remain on 28nm or older processes. These processes do not require EUV lithography machines—they can be made with traditional DUV (Deep Ultraviolet) light. SMIC (Semiconductor Manufacturing International Corporation) frantically expanded production in this sector, using state subsidies to suppress prices and dumping them on the global market. A 28nm chip is not advanced, but it is the blood of modern industry—there's one inside every air conditioner, every electric vehicle, every router.

On this front, China achieved self-sufficiency. Not the best, but good enough.

Where it cracked to the root: Advanced Processes.

In August 2023, Huawei released the Mate 60 Pro without any warning. After tearing it down, semiconductor analysts worldwide gasped: the Kirin 9000S chip inside was manufactured by SMIC using a 7nm-class process.

Under the condition of being banned from using EUV lithography machines.

The method SMIC used is called multiple patterning—splitting a single layer of circuitry into multiple exposures, repeatedly stacking them using lower-precision DUV machines, ultimately achieving a result close to 7nm. This is a brute-force crack. It works. But the cost is immense.

There is a core concept in Chapter 8: TSMC's advanced processes are nurtured "by going with the flow." Apple's initial launch validation, gaming GPUs' stress testing, console SoCs' long-term orders—each client brings different technical demands, pushing TSMC to push yields to the limit step by step in actual combat. The reason TSMC's 7nm yield could reach the level of commercial mass production is that it had been honed countless times across dozens of clients and hundreds of chip designs.

SMIC does not have these clients. Its advanced process orders come almost entirely from one company: Huawei. It has no small-die high-yield validation from Apple, no massive-die extreme stress testing from NVIDIA, no long-term stable console orders from Sony and Microsoft. It is brute-forcing yields upward while under blockade, using state subsidies, with extreme client homogeneity.

The result: The Kirin 9000S can ship, but yields are extremely low and costs are extremely high. The number of passing chips cut from a single wafer is far below the level of TSMC's equivalent process. The true cost of every Kirin 9000S—including the loss of defective units—might be several times that of TSMC's equivalent chip.

This is not a victory of the market. This is a battle for survival, using state subsidies to fill the thirty-year gap of skipping hardware classes.

Furthermore, even if Huawei managed to fabricate the chip, the story isn't over. Because there is an even deeper wall.


III. The Chinese Version of the Software Prison

The Kirin 9000S is a smartphone chip. A smartphone chip's software ecosystem is relatively simple—Android OS, applications, drivers—Huawei can rely on its own HarmonyOS to bypass Google's software blockade. Difficult, but feasible.

AI chips are another matter entirely.

Huawei's Ascend 910B is an AI training chip. On pure hardware specs, Huawei claims that in certain specific training tasks, its performance can approach or even exceed NVIDIA's A100 by about 20%. The subsequent 910C and 910D take aim straight at NVIDIA's H100.

But between "approaching in hardware compute" and "can be used as a replacement," there is an invisible abyss.

The name of that abyss is CUDA.

Chapter 7 discussed CUDA's four-layer lock. Now let's apply China's situation:

Layer One — Hardware Instruction Set: The Ascend chips use Huawei's own Da Vinci architecture, completely different from NVIDIA's PTX/SASS. This means all code compiled for NVIDIA GPUs cannot run on Ascend. It must be recompiled.

Layer Two — Compute Libraries: NVIDIA has cuDNN, cuBLAS, NCCL—each one the result of hundreds-strong teams spending ten years deeply optimizing for NVIDIA hardware. Huawei's equivalent is CANN (Compute Architecture for Neural Networks). CANN's feature coverage and performance maturity are more than a generation behind CUDA's libraries.

Layer Three — Framework Binding: AI researchers worldwide write code in PyTorch. PyTorch calls CUDA by default under the hood. Huawei developed its own AI framework, MindSpore, and also built an adaptation layer to allow PyTorch to run on Ascend. But compatibility issues with the adaptation layer pop up endlessly—the red error messages faced by those Shenzhen engineers at the beginning of this chapter are problems originating from this layer. Hidden inside the code of every open-source model are dozens of implicit CUDA dependencies. On the surface, you just change one line to device = 'npu', but in reality, the underlying library calls, memory management, and compute precision handling are fraught with traps.

Layer Four — Knowledge Binding: China's AI engineers learned CUDA in university. The textbooks they read use CUDA for examples. The Stack Overflow pages they search use CUDA to answer questions. Every open-source project they fork on GitHub runs on NVIDIA GPUs by default. Asking them to switch to Huawei's CANN and MindSpore is akin to asking them to forget the language they know and learn a completely new one from scratch.

The concluding sentence of Chapter 7 in the main text is: "CUDA is not an API. CUDA is a prison. And you have lived inside it since the day you enrolled in school."

China's AI engineers face the exact same prison. The difference is: NVIDIA's ecosystem grew naturally over twenty years through the work of tens of millions of gamers and developers worldwide. The alternative ecosystem China is trying to build is being brute-forced by administrative directives under the desperate straits of sanctions.

The difference between these two ecosystems is not a difference of quantity. It is a difference of quality.

A naturally grown ecosystem has roots—every one of its nodes is actively chosen by someone. Someone published a paper using CUDA, someone built a product using CUDA, someone founded a company on CUDA. Cut off any single node, and the other nodes remain.

An ecosystem built by administrative directives has no roots—every one of its nodes is mandated. The moment the directives loosen, the moment subsidies decrease, the moment NVIDIA finds a way to bypass sanctions (such as releasing the nerfed H20 chip specifically for the Chinese market), developers will run back to CUDA's embrace overnight.

In fact, this is already happening. In 2024, when NVIDIA lowered the price of the H20 to roughly the same level as the Ascend 910B, many Chinese AI companies resumed purchasing NVIDIA chips. Not because they don't support domestic products—but because code written in CUDA doesn't need to be changed, while code written in CANN has to be rewritten.

Engineers are rational. The rational choice is always the path of least resistance. And CUDA is the path of least resistance.


IV. The Bill for Skipping Class

Now we can string the causal chain together.

The core argument of the main text is: The money and demands of gamers inadvertently nourished the foundational infrastructure of the entire tech hegemony. From DirectX to CUDA to TSMC's advanced processes—every brick has the fingerprints of gaming on it.

China's negative example perfectly validates this argument.

China has the most profitable game companies in the world. But not a single cent of the money these companies made flowed to foundational technology. Their games run on imported chips, are made with imported engines, and are sold in imported operating systems. They only did one thing: set up on a stage built by someone else, using the most precise psychological tools to pick the pockets of players.

When America said, "You can no longer use this stage," China suddenly realized: Beneath its skyscraper in the application layer, it had no foundation.

This wasn't caused by sanctions. Sanctions only caused the bill for skipping class to arrive early.

The real cause is hidden thirty years ago. When Jensen Huang decided to found NVIDIA in a Denny's diner, Chinese capital was pouring into real estate. When Morris Chang was building TSMC's first factory in Hsinchu, China's manufacturing sector was doing OEM assembly. When CUDA was born in 2006 and university students worldwide started writing parallel computing programs on NVIDIA graphics cards, China's smartest graduates were going to Tencent and Alibaba—to make social software, e-commerce platforms, and mobile games.

Every step was rational. The spreadsheets for every step looked fantastic.

But every step skipped the exact same thing: the foundation.

The foundation of hardware—the ability to independently design high-performance chips. The foundation of software—the ability to independently build developer ecosystems. The foundation of manufacturing—the ability to produce advanced process chips without foreign equipment.

These three "foundations"—the main text of this book spent three chapters explaining how they were built: Chapter 7 (CUDA's software ecosystem), Chapter 8 (TSMC's manufacturing capability), and Chapter 4 (NVIDIA's accumulation of hardware design from the brink of death to its rise). Building each of these foundations took at least fifteen years. In the process of building each foundation, gaming played a critical role—not as an end, but as a stress test arena and a funding pool.

China skipped this process. It started directly from the application layer. Using the shortest time, the highest efficiency, and the most precise monetization designs, it built the world's most profitable digital entertainment empire.

And then discovered the empire's foundation belonged to someone else.


V. Settlement

There is a question worth asking: Can China catch up on its missed classes?

The answer is: Technically, yes, but chronologically, there might not be enough time.

Huawei's Ascend chips are generation-by-generation approaching NVIDIA in hardware compute. SMIC has achieved self-sufficiency in mature processes and brute-forced its way to 7nm-class advanced processes. Baidu's Kunlun chips and Alibaba's Hanguang chips are all attempting to build alternatives to varying degrees.

But Chapter 8 of the main text stated a cruel truth: The reason you can't catch up isn't the technological gap; it's the trust gap. TSMC's moat isn't any single technology—it is thirty years of accumulated trust from the global fabless ecosystem. You cannot catch up with a slide deck. You cannot catch up in a single quarter.

The same logic applies to the software ecosystem. CUDA's moat isn't the performance of cuDNN—it is every university in the world, every GitHub project, every textbook, and the muscle memory in the brain of every AI engineer. You cannot catch up with administrative directives. You cannot catch up with subsidies. You can only catch up with time—and during the ten years you spend chasing, NVIDIA is not standing still either.

China faces a dual catch-up problem. On the hardware end, it is chasing TSMC—a company that invests over thirty billion dollars in capital expenditures annually and has thirty years of accumulated process experience. On the software end, it is chasing CUDA—an ecosystem naturally grown by tens of millions of developers globally.

Both chases must happen simultaneously. Both targets are moving. And the pursuer is being choked at the most critical supply chains—it cannot buy advanced lithography equipment, it cannot buy the newest NVIDIA chips, it cannot use the most advanced process foundry services.

This is the price of skipping class.

You can copy a highly profitable mobile game in a few months. You can use state capital to brute-force a spike in chip production capacity for a certain process node within three years. But you cannot use money or administrative orders to buy overnight a foundational ecosystem jointly woven by tens of millions of engineers and developers over twenty years.

The core pattern of the main text is: Convenience attracts users → Lock-in bounds developers → Monopoly collects rent. The cost only materializes twenty or thirty years later.

China's case is the mirror image of this pattern: However much hard work you skipped in building your foundation back then, history will demand you spit it back out with interest. The cost, similarly, only arrives twenty or thirty years later.

And the bill has arrived.

Back to the opening. The champagne at that mobile game studio in Shanghai is still bubbling. The error messages in that R&D building in Shenzhen are still ticking. The distance between these two scenes—from the prosperity of the application layer to the fracture at the foundation—is not geographical. It is the accumulation of countless choices made by a nation over thirty years between "applications" and "foundations."

Every choice was rational. Every spreadsheet looked fantastic.

But all those rational choices added together equal a bill that history never discounts.

This sentence sounds familiar. Because in Chapter 5 of the main text, when discussing Intel's cancellation of the Larrabee GPU project, the exact same sentence was used—

"Nobody made a bad decision. Everyone made the most rational decision for their specific position. But all those rational decisions added up to a suicidal outcome."

Intel's version applied to a company. China's version applies to the industrial structure of an entire nation.

The scale is different. The logic is exactly the same.