Cerebras slay GPUs and break the record for largest AI models trained on a single machine

Cerebras, the company behind the world’s largest accelerator chip, a CS-2 Wafer Scale Motor, just announced a milestone: training the world’s largest AI (Natural Language Processing) (NLP) model on a single device. While that in and of itself could mean many things (it wouldn’t be much of a record-breaking if the previous largest model was trained in a smartwatch, for example), the AI ​​model Cerebras trained has soared toward a staggering – and unprecedented – 20 billion Teacher. All without having to scale your workload across multiple accelerators. That’s enough to fit into the newer feel of the internet, the image from the text generator, 12 billion DALL-E parameters from OpenAI (Opens in a new tab).

The most important part of achieving Cerebras is reducing infrastructure requirements and software complexity. Sure enough, one CS-2 is like a supercomputer on its own. Chip Scale Engine -2 — which, as the name suggests, is etched into a single, 7nm chip, usually enough for hundreds of main chips — features 2.6 trillion 7nm transistors, 850,000 cores, and 40GB of cache built into a package that takes up about 15k Watts.

Cerebras Chip Scale Engine

Cerebras’s Wafer Scale Engine-2 with all its thinness. (Image credit: Cerebras)

Keeping up to 20 billion NLP model variants in a single chip significantly reduces overhead in training costs across thousands of GPUs (and their associated hardware and scaling requirements) while eliminating the technical difficulties of segmenting models across. This is “one of the more painful aspects of NLP workloads,” Cerebras says, “sometimes taking months to complete.”

It’s a specific and unique issue not just for each neural network being processed, the specification for each GPU, and the network that ties it all together – items that must be laid out beforehand before the first ever training begins. It cannot be transferred across systems.

Cerebras CS-2

Cerebras’ CS-2 is a standalone computing giant that includes not only the Wafer Scale Engine-2, but all of its associated power, memory, and storage subsystems. (Image credit: Cerebras)

Pure numbers can make Cerebras’ achievement look frustrating – OpenAI’s GPT-3, a NLP paradigm that can write entire articles Human readers can sometimes be deceived, Features an astounding 175 billion parameters. DeepMind’s Gopher, which launched late last year, Bringing this number to 280 billion. The brains at Google Brain announced the training of a Trillion modulus plus model, switching transformer.