The political economy of neural networks

“Modern industry … is continually transforming not only the technical basis of production but also the functions of the worker and the social combinations of the labour process. When different parts of the working day are replaced by machinery the productiveness of labour increases. At the same time, it thereby also revolutionises the division of labour within society, and incessantly throws masses of capital and of workers from one branch of production to another. Increasing productiveness heralds more than the modern mechanical quantities of labour, but also the slave Acts of the State.” (Marx)

In 1996 IBM’s chess computer, Deep Blue, beat the world champion, Gary Kasparov. This victory marked an important step in the development of machine intelligence.

However, Deep Blue was directly told how to play chess by human experts who codified thousands of rules-of-thumb about good moves. So although Deep Blue was highly effective at chess, it couldn’t play any other games.

Just a few months ago, a new algorithm, called Alpha Zero, learnt to play chess from scratch. It only knew the rules of the game. It learnt its strategies by playing against itself. After about 24 hours of self-play Alpha Zero achieved superhuman performance, and consistently beat the previous best chess program.

Not only did AlphaZero play brilliantly it also discovered entirely novel, and more powerful, move sequences. Defeated human players described its strategy as “alien”, “unnatural” and “amazing”.

And AlphaZero can learn to play other games too. It just needs the rules. For example, it also achieves superhuman performance on the game of Go.

AlphaZero is one recent advance amongst many. Progress in the field of Artificial Intelligence has leaped forward in the last 20 years or so.

Some things that AI researchers thought were really very hard, and many decades away, are now taken for granted.

Progress is partly due to faster computers with more memory. And the internet has massively increased the availability of large quantities of data, which is the essential ingredient of machine learning. But another driver is improvements in algorithms, particularly a class of algorithms known as neural networks.

Some of the biggest tech companies in the world — such as Google, Microsoft and Facebook — are currently investing heavily in neural network research. They believe neural networks will transform our world, and they want to profit from it.

So what are neural networks? How do they really work? And what are the implications of this technology for the political economy of capitalism?

What are neural networks?

Let’s begin by examining what neural networks really are.

The human brain has billions of neuronal cells connected together in complex networks. Artificial neural networks have a similar structure.

But I want to put aside analogies with the human brain for the moment. Instead, I will explain neural networks in a purely mathematical or mechanical manner. Because this perspective can explain why animal brains are networks of neurons, and why they have different kinds of dedicated circuitry.

Hypothesis spaces

Imagine we perform a physical experiment, where we attach weights of different masses to a metal spring. Each time we attach a weight we measure the spring’s length. We record the weight, and a corresponding length, as a pair of numbers. Let’s say we collect 100 such observations.

We now have a data set. Next, we want to build a neural network that learns the underlying relationship between weights and lengths. We want the algorithm to discover the principle that connects these pairs of numbers.

No learning algorithm is a blank state. It must, at least, have the capacity to hypothesise possible relationships between weights and lengths.

So let’s do this by defining a family of mathematical functions. For this example:

eq1

The “x” here represents a weight. And the “y” represents the length of the spring. Given a weight, we can use this formula to predict a length.

The function has some parameters, labelled a, b and c. These could take any values. Different values specify a different function. For example, if we set a=2, b=3 and c=-4 then we’d get the function:

eq2

This specific function is one possible description of how weight (x) relates to spring length (y).

So we’ve defined a simple hypothesis space, where different values of a, b and c, pick out a specific hypotheses from that space.

Normally we think of mathematical functions as something abstract, as some kind of symbolic expression. But we can also think of them as machines with distinct parts, which take in an input and produce an output.

In fact, we can think of the hypothesis space we’ve just defined as equivalent to a very, very simple “neural network”:

graph0
A (very simple) “neural network” that represents a family of quadratic functions: inputs enter at the top, and outputs exit at the bottom. This network has 4 inputs. 3 of those inputs, labelled a, b, c, are parameters of a quadratic function. The remaining input, x, represents the weight attached to a spring. The network outputs a single number, y, which is the (predicted) length of the spring. The output node is a single “neuron” that has 4 incoming connections, and produces its output, y, by combining its inputs according to a specific mathematical formula. Different values of the parameters, a, b and c, pick out a specific quadratic function.

So this “neural machine” assumes that the actual relationship between weight and length is defined by some (unknown) quadratic function. Of course, this assumption might be wrong. But that’s its starting point!

The network has 3 parameters, which change its behaviour. That may not sound like many, but this is sufficient to represent an infinite number of quadratic functions.

Forward propagation

OK, we’ve got a simple neural network. How can it learn from the data?

Basically, we want the network to choose one function, from the infinity it can represent, which reproduces our 100 examples as accurately as possible.

For instance, we set the networks’s parameters, a, b and c, all equal to 1. And then we feed the first example into it, with a weight of x = 0.5 kg. We can think of the 4 input values propagating forwards through the network to the output neuron. That neuron computes an output of y = 1.75 cm. That’s its prediction of the spring’s length.

However, we measured the actual spring length as 10 cm. So the network is wrong. How wrong? Well, it made an error of 8.25 cm.

The error signal as teacher

In order to learn you need to know when you are wrong, and how to be less wrong in the future. The error signal, the difference between the network’s prediction and the actual value, is like feedback from a teacher, or from reality.

We add up the network’s error on all 100 examples to get the total error. And the total error indicates how well, or how badly, the machine reproduces the relationship between weight and spring length in the data.

Different values for the machine’s parameters — a, b and c — will give different total errors on the data. We want to find the parameters that minimise this error. In Artificial Intelligence, the process of finding functions with low error on some data, is normally what’s called learning.

Searching for the best hypothesis

But now we have a problem. How do we find the right values for a, b and c?

One method, which has proven surprisingly successful, is to start with an arbitrary guess. Then take an example, push it through the network, measure the error, and then change the network’s parameters a tiny, tiny bit — in exactly the right way — in order to reduce the error on that example. And then keep doing that for every example — again and again — until we can’t make the total error any smaller.

For example, let’s add an “error neuron” to our network:

graph1
A trainable network: this is the same network as before, except we’ve added an extra “error neuron” that has two inputs: the predicted length (y) and the actual length (labelled “length”). The error neuron then computes the (squared) difference between the predicted length and the actual length in the data.

Remember that our data contains pairs of numbers: (weight, length). By adding the extra “error neuron” we can propagate both numbers through the entire network (starting at the green input nodes), which now also computes not just the prediction, but also its error.

But how should we change the network’s parameters to reduce that error? Should we increase parameter “a” a little bit, and reduce “c”, and leave “b” unchanged? Or something else?

At this point we turn to differentiation. You’ll recall that differentiating a function gives its gradient. And a gradient tells us how much a small change in a function’s inputs affect its outputs.

So, if we differentiate the error function with respect to each of machine’s parameters, we get 3 different gradients. And those gradients immediately tell us how small changes to the parameters affect the prediction error.

This is bit like standing on an undulating landscape with your eyes closed. You want to find the lowest point on the landscape, but you can’t see it. All you can do is feel the shape of the ground where you’re standing right now, by moving your feet slightly. That’s the gradients.

The best direction to move is the direction with the steepest downhill slope, since that direction will reduce the error the quickest. So this method is called gradient descent.

Except here we have 3 parameters, so we’re really we’re feeling and moving in 3, not 2, dimensions. But the principle is the same in higher dimensions.

Back propagation

So we need to differentiate in order to get some gradient information.

Differentiating the error of a quadratic function with respect to 3 parameters is fairly easy. But for industrial-scale applications we need to differentiate enormously complex functions with billions of parameters. A naive approach would be computationally very expensive. But we can exploit the structure of the neural machine to save time.

graph2
Back-propagating the error to update the network’s parameters: we forward-propagate an example, which is a pair of values (x, length), through the network and compute the error. Then we back-propagate by multiplying the error by the gradient from the “error” to the “y” neuron (see black arrow). This derived error is then further back-propagated from the “y” neuron to the parameters “a”, “b” and “c”, which we then update. Back-propagation is equivalent to differentiating the network’s error by its parameters, and then updating those parameters in the error-minimising direction. Note that back-propagation works with arbitrary network topologies (not just the very simple one shown here).

Perhaps a good image is a spider’s web. You twang it at one place, and see the wobbles propagate through the web, which affect some webbing much further away. Back-propagating the error twangs the web in just the right way to improve the network’s parameters.

So we forward propagate each example through the network to get the error. We then back-propagate that error through the network’s gradients to update the parameters. Forwards and backwards, again and again, for every example. And we keep doing this until, eventually, the total error on the data stops getting smaller.

At that point, our search is over.

Learning as optimisation

So let’s actually do that now. Here’s a plot of the network’s error on individual examples (blue dots) over time, where the y-axis is the total error and the x axis is the number of times we’ve pushed all the data through the network. The orange line is the network’s error on test data (data it hasn’t been trained with). Clearly, the error reduces over time, or, if we want, the machine is “learning”:

training

What are the network’s final parameters?

a = 0.05
b = 0.96
c = 0.006

Why these parameters? Well, to 1 significant figure, we have: a = 0, b = 1.0 and c = 0. So the network has learnt that the relationship between weight and spring length is:

eq3

So, from its hypothesis space of quadratic functions, it’s selected a very simple linear function. Is that a good fit to the data? Here’s a plot of the network’s prediction (of the spring’s length for a given weight) compared to the actual length observed in the data:

p3

So it’s a pretty good predictor. In fact, this simple neural machine has “learned” a particular case of Hooke’s law, which states that the extension of a string is proportional to the force applied to it.

(For those with a bit more background knowledge: this simple (nonlinear) computational graph has produced a linear regression on the data set via gradient descent.)

Neural networks at scale

Our “neural network” was extremely simple. So simple that most practitioners wouldn’t recognise it as a neural network. But more complex neural networks operate on identical principles. Scaling up doesn’t change this basic story.

Here’s an example of a more complex network:

g2
A neural network with 4 inputs, 3 outputs, and 3 intermediate layers of size 8, 8 and 4 respectively. Each neuron is a simple, nonlinear function of its inputs, and has associated parameters. In total, this network has 163 parameters (not shown) that, in this context, can be interpreted as “connection strengths” between neurons.

We train more complex networks in exactly the same way: by forward-propagating inputs, and backward-propagating errors. But, of course, this more complex function has a correspondingly more complex hypothesis space. To get an idea of the complexity, here’s the symbolic expression for just one of its outputs, y1, as a function of its 4 inputs (and its 163 parameters):

Hypothesis space for y1 output of a more complex neural network

So this network has much greater representational capacity than our simple example. In principle, therefore, it can learn much more complex relationships in data. And, with a little bit of creativity, the inputs and outputs can represent not just numbers, but images, sounds, letters, words … anything.

However, even this network, by industrial standards, is tiny.

The technology of neural networks

So that’s neural networks in a nutshell. I’d like to emphasise how very simple this really is. The basic mechanisms are satisfyingly elegant and sparse. Of course, the state-of-the-art adds many bells-and-whistles but, as of today, the core element is searching in a hypothesis space to minimise error.

So why are real brains are composed of billions of neurons? Brains need to represent and predict properties of a complex environment. Lots of neurons yields a bigger and more complex hypothesis space.

But real brains also have many dedicated structures that wire-up the neurons in specialised ways. Different kinds of neuronal assemblies are dedicated to vision, to motor control, and to higher thought. Why?

The simple reason is that different network topologies define different kinds of hypothesis spaces that are better adapted to specific kinds of tasks. Learning becomes easier if you start with a good guess.

So although our brains are highly adaptable, evolution has given us a head start for learning the kinds of problems we regularly face.

Let’s now turn to the implications of neural networks for society.

Neural networks in capitalism

Technology that reproduces bits of human cognition is as old as human society. For example, tally marks on sticks or bones reproduce aspects of human memory. The abacus partially automates the mental operations of arithmetic, and so on.

Machine learning represents a continuation of this historical trend of humans alienating their causal powers in machines.

What is new, however, is that neural networks replicate the complex pattern recognition powers of humans.

Some impacts of neural networks we can be fairly confident about, because we’ve seen examples of machine automation before. Others are more speculative, because we haven’t seen this specific type of machine before.

Impact of machines on material production

So what can be fairly confident about?

Any new machine is always introduced into an existing division of labour. The machine may automate an existing task, or some part of it.

If so, some labour is freed or saved. The machine is labour-saving if the saved labour exceeds the new labour required to reproduce the machine.

So typically this means that some workers now have nothing to do.

In purely material terms, this saved labour could remain saved, resulting in a reduction in the total working day. Everyone could work a bit less. Or, alternatively, this saved labour could be allocated to entirely new tasks, and people could have some more goods and services.

A machine may not merely automate, but make new kinds of work possible (in fact this is typically the case). For example, computers created the new role of programmer.

In this case, new kinds of concrete labours are demanded.

Machines tend to have both effects: they automate existing tasks, and create new tasks. So new machinery raises the structural-economic problem of reallocating the division of labour within society.

Impact of machines on capitalist production

What I’ve just said applies to any kind of society, whether feudal, capitalist, socialist or communist. But the reallocation of the division of labour in capitalist society takes specific forms.

Machines won’t reduce our working time

I once asked an AI researcher why they were passionate about building robots. They answered that robots will free us from work and lead to utopia. I then asked why centuries of labour-saving technical progress had failed to reduce the length of working day.

They couldn’t answer (the typical STEM graduate doesn’t know much social science, and knows even less about Marxism).

But the answer is pretty clear. In capitalism the labour saved by automation takes the form of profit appropriated by firm owners, which they add to their capital. Capital is privately owned, and private capitals compete with each other, searching for the highest returns.

Now, if an individual capital is hoarded then its value will quickly decrease, compared to rival capitals that are reinvested in new production and accumulation. So this means that individual capitalists, in order to remain capitalists, must reinvest profits in new production.

In consequence, labour-saving technical progress doesn’t translate into a reduction in the length of the working day. Capitalism simply lacks any economic mechanism for the working population to collectively decide to work less. (Unless you buy into the neoclassical fantasy that the length of the working day is the outcome of individual choices that trade-off disliking work and loving consumption).

Automation could mean that we all work produce less, which would not only massively benefit everyone, but also the environment. However, capitalism cannot realise this technical possibility.

So neural networks don’t herald a utopia where robots do all the work and we take it easy. That’s not going to happen (despite what we might read). Both the robots, and us, will continue to work hard for a class of exploiters. Free time is the one commodity that’s not for sale under capitalism.

Machines will benefit the few not the many

So instead of a approaching a robot utopia, we’ll experience familiar capitalist macro-dynamics: labour displaced by machinery will reduce the demand for labour and therefore wages. On the other hand, any reinvested profits may increase the demand for labour in sectors yet be automated, which increases wages. So there’s a contradictory effect.

But the increased demand for labour is typically not significant. In general, private capitals aren’t very good at initiating entirely new industrial sectors that soak-up the excess labour force. They are risk-averse and impatient for returns. Instead, profits are speculated on existing assets, rather than new production. And high-end luxury consumption isn’t a big employer.

Also, much of the economic cost of the re-division of labour is borne by the workers themselves: some of us get laid off, have no income, and must somehow re-train or re-orient ourselves in a new technical landscape, if we can. That often puts enormous strain on workers and their families.

The churn in the division of labour is inescapable for any society with technical progress. But the accompanying misery inflicted on workers is not.

So, as a whole, the capitalist class benefits from automation, not workers.

So the technology of neural networks, in the context of capitalism, is not a leveller. Instead, it will augment the economic power of the already rich and wealthy.

Machines worsen the lot of the highly exploited and overworked

Capitalism, in virtue of both markets and the wage system, produces extreme income inequality, both within and between nations. Wages, in particular, vary greatly internationally. So automation does not proceed evenly across the globe.

For example, Foxconn, the big electronics manufacturer, plans to automate 30% of jobs in its Chinese factories by 2020 in response to rising labour costs. In just one factory alone, Foxconn cut tens-of-thousands of jobs by introducing industrial robots.

But the average wage in Bangladeshi garment factories is about $67 per month. So these textile factories won’t be introducing robots anytime soon.

Neural networks will allow robots to perform an increasingly broader range of tasks. So, where wages are high, many workers will be out of a job.

But where wages are low, neural network technology will probably be used to intensify exploitation. For example, Amazon tracks warehouse workers with armbands that recognise the pattern of arm movements that indicate when they are packing goods, or not. The surveillance has been automated.

So neural networks will be used to turn humans into robot-like things, rather than freeing them from drudgery.

Typical applications of neural networks

So that’s what we can be fairly confident about. The conclusions strike a very discordant tone compared to the technical optimism of corporate press releases.

Having said that, there are applications I’m excited about. For example, neural networks are widely used to improve search engines. They detect diseases in medical scans with better accuracy than human doctors. They help search for new medicines. They increase the efficiency of production processes, by predicting breakdowns before they happen, or preemptively spinning-up or down productive capacity, for example in electricity grids, or fans that cool computers in data centres. They can augment creative tasks, such as auto-completion of partial sketches and music compositions, or automatic colouring. Self-driving cars will provide safer and more efficient travel. I could go on, since pattern recognition is really useful, and forms the basis of many human tasks.

But for every good application, there are plenty of bad ones.

Obviously, neural networks are already deployed in military machinery for detecting, tracking and destroying targets.

Also, neural networks are tremendously good at generating fake images, audio and video. This capability isn’t sufficiently appreciated. Faces can be transplanted over other faces in video seamlessly. The human voice can now be impersonated, and told to say anything. So you can no longer take anything at face value. Here’s a recent quote from a machine learning researcher:

Today was the first day I fell for an AI-generated fake video with major geopolitical implications. The world is gonna get weird.

Neural networks are routinely deployed to intensify addictions, whether shopping, games, gambling or pornography. The neural network tailors the user experience to specific individuals in order to squeeze the most money out of them.

Some of the brightest but most greedy in society waste their talents building neural network models to gamble and speculate in financial markets.

Neural networks are increasingly used for industrial-scale surveillance: for example, the US government can retrieve a semi-automated summary file for most people in the western world, at the touch of a button.

In conclusion

Despite the hype, neural networks are currently quite limited, and there’s zero chance of robots taking over anytime soon. On the other hand, neural networks do point to a future where humans alienate all their causal powers in machinery, which is an exciting thought.

Neural networks are just another kind of machine, and we’ve seen what happens in this movie before: the real winners, as ever, are the owners of the machinery.

We’re told that AI is a disruptive technology that will transform society for the better. But capitalist society also transforms AI and ensures its applied in ways that reproduce class society, including the power of the exploiting class. So there are lots of contradictions, most of which are glossed over by the companies that tout this technology and the journalists that report on them.


Addendum

So let’s return to the quotation of Marx at the top of this page. More wisdom from the sage, you might think. However, some of the sentences in that quote are not Marx at all. They were generated by a neural network (NN). Can you tell which ones?

There were two NN generated sentences:

When different parts of the working day are replaced by machinery the productiveness of labour increases.

and:

Increasing productiveness heralds more than the modern mechanical quantities of labour, but also the slave Acts of the State.

How where these Marx-like sentences created?

I built a deep recurrent neural network (using Mathematica with its MXNet back-end) and trained it on Marx’s Capital Volume 1. I randomly sampled million of strings of length 100 from Capital. I then split those strings into a prefix of length 99 and the final character. The idea is to train the network to predict the next letter given a previous 99.

I used the following NN architecture composed of 3 long-short-term-memory layers:

netchain

This is a recurrent neural network, where “recurrent” means that some of the network’s output is fed back into itself, which allows it to model long-term relationships in sequences of data.

This network is still quite small by industrial standards, and probably a little too small for this dataset.

The NN isn’t told about English words, their spelling, or even grammar, or punctuation. Instead, it learns these concepts from individual sequences of characters. So the NN, in fact, learns grammatical structure and the content of Capital at the same time, which is really quite remarkable.

My particular network isn’t that good, since I didn’t spend time tuning its performance. But it generates reams of English text that, superficially, read like Capital Volume 1. Most of it is nonsense (a bigger network would do much better). So I just cherry-picked two output sentences, and inserted them into an authentic quotation.

The labour of producing NN models is itself in the process of being automated. Soon, people without a PhD in AI will regularly build and deploy them. So the technology is increasingly becoming commodified (much like 3D graphics programming did in the 90 and 00s). So this means that, soon, it will be much harder to know what content is real, and what is fake.

3 Comments

  1. Yohan John tweeting this brought me here…
    “I once asked an AI researcher why they were passionate about building robots. They answered that robots will free us from work and lead to utopia. I then asked why centuries of labour-saving technical progress had failed to reduce the length of working day.
    They couldn’t answer (the typical STEM graduate doesn’t know much social science, and knows even less about Marxism)”

    Labour-saving technical progress *has* reduced the length of working day enormously over the centuries:
    * Working week reduced to 35 days from over double that.
    * This is particularly true for women.
    * The major role of technology until recently was in reducing *physical* work. This has had a 100% reduction in the working week for horses – so that we no longer have so many (the few remaining horses continue to work in the leisure industry).
    * Tertiary Education is a way of delaying entry into the workplace at the start of a working life.
    * Progress has improved life expectancy dramatically: people now live for decades after retirement instead of months. Hence: the ratio of leisure hours to work hours, counted over a lifetime, has increased dramatically.

    The 40 hours of intellectual play that the AI researcher spends now would had been 80 hours of drudgery 200 years ago. In his case (yes: generalizing to male), the majority of the benefit of his work is for him rather than downstream for the rest of society.

    Having said that, it is a change to the increasingly-creaky economic system that is more likely to bring utopia. Inequality is *the* issue of our time for the West to address.

    Great article, thanks.

    – (from a STEM graduate with little knowledge about Marx.)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s