Through an impressive mastery of cutting-edge papers in AI, a deep understanding of the history of cognitive science, and four hundred interviews with leading scientists, Brian Christian presents an in-depth account of AI and how we can engineer it to align with human values.

I studied AI at university, and I try to keep up with the field through occasional reading. I’ve just finished The Alignment Problem which has hugely extended my thinking about AI and its societal implications, through a fascinating presentation of recent breakthroughs in machine learning. The book gives a good non-technical grounding in supervised and unsupervised machine learning (ML) and reinforcement learning early in the book and uses this as a foundation for forays into very new research into AI safety. It covers the problems with bias in datasets and the perils of using predictive algorithms for policing in some detail, along with the difficulties which arise when ML algorithms don’t generate human-understandable explanations of their reasoning.

Reading the book gave me pause to consider whether the concept of computational thinking – which rose to ascendency in 2006 – needs to be updated. Christian observes that “We are now… well beyond the point where our machines can do only that which we program into them in the explicit language of math and code” (p261). This is interesting because it has long been a goal of computational thinking education to explain to learners that “computers can only do what programmers tell them to”. We may need to update the intermediate-level explanations of computational thinking to include explanations of the capabilities of machine learning. Computers indeed execute code written by humans, but with machine learning, it is not always possible to read the code and know what the output of the code will be, even for the person who wrote the program. Does this mean that probabilistic reasoning needs to become part of computational thinking? This topic should be explored further in the ongoing discussions of AI literacy for schools.

In the later sections of the book, the author explores work on training complex autonomous systems to learn appropriate behaviour (and potentially underlying human values) by observing human behaviour or working in cooperation with users. In such cases, the AI would have no goals of its own, but would try to infer what the user wishes to achieve, and figure out how to help them achieve it. There is interesting material here from cognitive science and education including studies of how even very young children can figure out how to help adults who appear to be struggling with simple everyday tasks like opening cupboards. If you’re haunted by the horrifying possibility of future AIs behaving like a deranged “Clippy” (the Microsoft Office helper agent of old) , you’re not alone. The author considers the problem of a toddler-like AI trying to help but actually making things worse. He points out that AIs will likely have to serve “two masters” – the user and the company which created it, and there are definitely times when the interests of these two masters do not coincide. There is already growing concern about software companies ruthlessly optimising the design of their products to attract and keep our attention at the expense of our health and wellbeing. Any AI with the purpose of “assisting” a user by building a model of their preferences may not consider the longer-term interests of the user, or allow for the idea that the user might wish to change in the future. For example, people who are overcoming an addiction will not benefit from their AI assistant showing them adverts or helping them to buy products they now want to avoid, even if this results in reduced advertising revenue for the company which owns the AI.   Christian writes:  “I think there is an important policy matter here, as least as much a theoretical one. We should take seriously the idea that users have a right to both see and alter any preference model that a site or app or advertiser has about them. It is worth considering regulation to this effect: to say, in essence, I have a right to my own models. I have the right to say, ‘That’s not who I am’. Or aspirationally, ‘This is who I want to be. This is the person in whose interest you must work’.” p275. This is a powerful contribution to discussions of data privacy: access to (interpretable) models about oneself as well as the raw data which feeds them could be a game changer.

The author makes the point that AI researchers are not blind to the ethical implications of their work:  “An ecosystem of research and policy efforts to influence both the near and long term is underway across the globe… Research on bias, fairness, transparency and the myriad dimensions of safety now forms a substantial portion of all the work presented at major AI and machine-learning conferences. Indeed at the moment they are the most dynamic and fastest-growing areas arguably not just in computing but in all of science.” p313. The book is an excellent place to start for researchers in the humanities or social sciences who wish to add their critical skills to these efforts.

This book is a must-read for students and academics who want to get to grips with the current state of AI research in non-technical terms.

‘The Alignment Problem’ by Brian Christian

By Judy Robertson 3/9/22

Posted on:

05.09.22

Categories: