Cerebras, the company behind the world’s largest accelerator chip, a CS-2 Wafer Scale Motor, just announced a milestone: training the world’s largest AI (Natural Language Processing) (NLP) model on a single device. While that in and of itself could mean many things (it wouldn’t be much of a record-breaking if the previous largest model was trained in a smartwatch, for example), the AI model Cerebras trained has soared toward a staggering – and unprecedented – 20 billion Teacher. All without having to scale your workload across multiple accelerators. That’s enough to fit into the newer feel of the internet, the image from the text generator, 12 billion DALL-E parameters from OpenAI (Opens in a new tab).
The most important part of achieving Cerebras is reducing infrastructure requirements and software complexity. Sure enough, one CS-2 is like a supercomputer on its own. Chip Scale Engine -2 — which, as the name suggests, is etched into a single, 7nm chip, usually enough for hundreds of main chips — features 2.6 trillion 7nm transistors, 850,000 cores, and 40GB of cache built into a package that takes up about 15k Watts.
Keeping up to 20 billion NLP model variants in a single chip significantly reduces overhead in training costs across thousands of GPUs (and their associated hardware and scaling requirements) while eliminating the technical difficulties of segmenting models across. This is “one of the more painful aspects of NLP workloads,” Cerebras says, “sometimes taking months to complete.”
It’s a specific and unique issue not just for each neural network being processed, the specification for each GPU, and the network that ties it all together – items that must be laid out beforehand before the first ever training begins. It cannot be transferred across systems.
Pure numbers can make Cerebras’ achievement look frustrating – OpenAI’s GPT-3, a NLP paradigm that can write entire articles Human readers can sometimes be deceived, Features an astounding 175 billion parameters. DeepMind’s Gopher, which launched late last year, Bringing this number to 280 billion. The brains at Google Brain announced the training of a Trillion modulus plus model, switching transformer.
“In NLP, larger models appear to be more accurate. But traditionally, very few select companies have had the resources and expertise to do the hard work of deconstructing these large models and spreading them across hundreds or thousands of GPUs,” he said. Andrew Feldman, CEO and Co-Founder of Cerebras Systems. As a result, very few companies were able to train large NLP models – it was too expensive, time-consuming, and inaccessible to the rest of the industry. Today we are proud to democratize access to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B, and GPT-NeoX 20B, allowing the entire AI ecosystem to create large models in minutes and train them on a single CS-2. ”
However, just like the clock speed of the world Best CPUs, the number of parameters is only one possible indicator of performance. Recently, work has been done to achieve better results with fewer parameters – chinchilla, for example, It routinely outperforms both GPT-3 and Gopher With only 70 billion of them. The goal is to work smarter, not harder. As such, Cerebras’ achievement is more significant than what first meets the eye—researchers should be able to fit increasingly complex models even if the company says its system has the potential to support models with Hundreds of billions even trillions of parameters.
This explosion in the number of applicable parameters is used Cerebras Weight Flow Technology, which can separate computing and memory traces, allowing memory to be scaled to whatever quantity is needed to store the rapidly growing number of parameters in AI workloads. This allows setup times to be reduced from months to minutes, and to easily switch between models such as the GPT-J and GPT-Neo “With just a few keystrokes“.
“Cerebras’ ability to deliver large language models to audiences with easy and cost-effective access opens an exciting new era in artificial intelligence. It gives organizations that cannot afford to spend tens of millions an easy and inexpensive way to join in,” said Dan Olds, chief research officer at Intersect360 Research. To Major Neuro Linguistic Journals.” “It will be interesting to see new applications and discoveries for CS-2 clients as they train GPT-3 and GPT-J class models on huge data sets.”