AI: The pattern is not in the data, it’s in the device

b21c7d8b-5465-4ff6-ad1e-a3aa0de5af4e.jpg

The neural network turns the input, the circuits on the left, into the output, on the right. How this happens is the shift of weights, the center, which we often confuse with patterns in the data itself.

Ternan Ray for ZDNET

It is common in artificial intelligence to say that machine learning, which relies on massive amounts of data, works by finding patterns in the data.

In fact, the phrase “finding patterns in data” has been a key phrase for things like data mining and knowledge discovery for years now, and it was assumed that machine learning, and the Deep Learning variant in particular, would still continue the tradition of finding such patterns.

AI programs actually lead to patterns, but just as the fault, dear Brutus, is not in our stars but in ourselves, the truth of these patterns isn’t something in the data, it’s what the AI ​​program makes of the data.

Almost all machine learning models work through a learning base that changes the so-called weights, also known as parameters, of the program as the program is fed with examples of data, and possibly labels attached to that data. It is the value of the weights that count as ‘knowledge’ or ‘understanding’.

The pattern found is really a pattern of how the weights change. The weights mimic how real neurons are thought to “fire,” a principle developed by psychologist Donald O. Hub, which became known as Hebbian التعلم learningthe idea that “neurons that fire together, bind together.”

also: Artificial intelligence in sixty seconds

It is the pattern of weight changes that is a paradigm for learning and understanding in machine learning, something that the founders of deep learning emphasized. As it was expressed nearly forty years ago, in one of the foundational texts of deep learning, Parallel Distributed Processing, Volume One, James McClelland, David Rumelhart, and Jeffrey Hinton write,

What is stored are the strengths of the connection between the modules that allow these patterns to be created […] If knowledge is the strengths of connections, then learning should be a matter of finding the right connection strengths so that the right activation patterns are produced under the right conditions.

McClelland, Rummelhart, and Hinton were writing for a select audience, and cognitive psychologists and computer scientists were writing in an entirely different era, an era when people weren’t making easy assumptions that anything a computer did represented “knowledge.” They were working at a time when AI programs couldn’t do much at all, and they were mainly interested in how to produce a computation, i.e. computation, from a somewhat limited arrangement of transistors.

Then, beginning with the advent of powerful GPU chips about sixteen years ago, computers really began to produce interesting behaviour, culminating in ImageNet’s historic performance of Hinton’s work with his graduate students in 2012 which marked the coming of age for deep learning.

As a result of the new achievements of the computer, the popular mind began to build all kinds of myths about artificial intelligence and deep learning. he was there Really bad headlines rush Likening technology to superhuman performance.

also: Why are AI reports so bad?

The concept of artificial intelligence today has obscured what McClelland, Rummelhart and Hinton focused on, namely, the machine and how it “creates” patterns, in their own words. They were intimately familiar with the mechanics of weights building a pattern in response to what, in the input, was just data.

Why is all this important? If the machine is the pattern generator, the conclusions people make about AI are likely to be wrong. Most people assume that a computer program perceives a pattern in the world, which can put people off to judge the machine. The thinking goes that if it yields results, then the computer must be seeing something that humans can’t.

Except that the machine that builds the patterns doesn’t explicitly see anything. It builds a pattern. This means that what is “visible” or “knowable” is different from the everyday slang in which humans speak of themselves as knowing things.

Instead of starting from the human-centered question, what does a machine know? It is better to start with a more precise question, what does this program represent in the links of its weights?

Depending on the task, the answer to this question takes many forms.

Consider computer vision. The convolutional neural network that underlies machine learning programs for image recognition and other visual perception consists of a set of weights that measure pixel values ​​in a digital image.

The pixel grid is really an imposition of a two-dimensional coordinate system on the real world. By providing a machine-friendly abstraction of the coordinate network, the representation task of the neural network is to match the strength of the pixel groups to a label that has been imposed, such as a ‘bird’ or a ‘blue bird’.

In a scene that contains a bird, or specifically a blue jay, many things may happen, including clouds, sunlight, and passersby. But the whole scene is not the thing. What matters to the program is the set of pixels that are most likely to produce a proper label. In other words, the pattern is a reductive act of focus and selection inherent in the activation of neural network connections.

You might say, a program of this type does not “see” or “perceive” so much as it is filtered.

also: New experiment: Does AI really know cats or dogs – or something else?

The same is true of games, where artificial intelligence has mastered chess and poker. In the Complete Information Chess Game, for DeepMind’s AlphaZero, The task of machine learning is to formulate a probability score at each moment To what extent the possible next move will ultimately result in a win, a loss or a tie.

Since the number of potential future gamepad configurations cannot be counted by even the fastest computers, computer weights shorten the search for moves by doing what you might call summarization. The program summarizes the probability of success if one follows several moves in one direction, and then compares this summary with a summary of possible moves to be taken in another direction.

While the state of the board at any given moment – the position of the pieces, and any remaining pieces – might “mean” something to a human chess master, it’s not clear that the term “mean” has any meaning for DeepMind’s AlphaZero for such a summary task.

A similar summarization task has been achieved for Pluribus That in 2019 he beat the toughest poker variant, Texas Hold’em Unlimited. This game is more complex in that it contains hidden information, player face down cards, and additional “random” items to cheat. But the representation, again, is a summary of the possibilities at every turn.

Even in human language, what is in the weights is different from what the average observer might suppose. GPT-3which is OpenAI’s top language program, can produce strikingly human-like output in sentences and paragraphs.

Does the program know the language? Their weights are a representation of the probability of how to find individual words and even entire strings of text in sequence with other words and strings.

You can call this neural network function a synopsis similar to AlphaGo or Pluribus, since the problem is a bit like chess or poker. But the states that can be represented as connections in a neural network are not only vast, but also infinite due to the infinite structure of language.

On the other hand, since the output of a language program like GPT-3, a statement, is an ambiguous answer rather than a separate score, a “correct answer” is somewhat less demanding than winning, losing, or drawing in chess or poker. You can also call this function of GPT-3 and similar programs “indexing” or “taking inventory” of things by their weights.

also: What is GPT-3? Everything your business needs to know about OpenAI’s Super Artificial Intelligence Language Program

Do humans have a similar type of repository or index for language? There doesn’t seem to be any indication of that yet in neuroscience. Similarly, in the expression To tell the dancer about the dance, does GPT-3 detect multiple levels of significance in a statement, or associations? It is not clear that such a question makes sense in the context of a computer program.

In each of these cases—the chessboard, cards, word strings—the data is the same: a designed substrate divided in different ways, a group of rectangular plastic paper products, a group of sounds or shapes. Whether these inventions “mean” anything, collectively, to the computer is only a way of saying that the computer is tuned in response to a purpose.

The things that such data induces into the machine – filters, summaries, indexes, stocks, or how you want to describe those representations – are not the thing in themselves. They are inventions.

also: DeepMind: Why is AI so good at language? It’s something in the language itself

But, you could say, people see snowflakes and see their differences, and they also catalog those differences, if they have a mind for it. It is true that human activity has always sought to find patterns by various means. Direct observation is one of the simplest means, and in a sense, what’s being done in a neural network is kind of an extension of that.

You could say that a neural network reveals what has always been true in human activity for thousands of years, that talking about patterns is something imposed on the world rather than something in the world. In the world, snowflakes are shaped but this shape is nothing but a pattern for someone to collect, catalog and classify them. It is constructive, in other words.

Pattern creation activity will increase exponentially as more and more programs are run on world data and their weights are tuned to form connections that hopefully create useful representations. Such representations can be incredibly useful. They may cure cancer one day. It is helpful to remember, however, that the patterns you reveal are not in the world, but are in the eye of the perceiver.

also: DeepMind’s “Gato” is humble, so why did they build it?