Last month, I made the arduous trip to Hawaii (half joking - it is a 20hr journey!) to the AI Ethics and Society conference (AIES), co-located with the AAAI/ACM annual conference. I wanted to share some slightly delayed reflections on the trip.
Some general thoughts on AIES
This is only the second year AIES has run. I was particularly interested in going because it is the first mainstream academic conference focused explicitly on the ethical and societal aspects of advances in AI, covering a wide range of disciplines. Because this is a new and emerging area of research that’s incredibly broad, bringing that all together effectively in a single conference isn’t exactly easy - but I think it’s important to enable people across many different disciplines - philosophy, political science, economics, law, literature, history, and so on... - to collaborate and learn from one another.
Quite a large proportion of the accepted papers either (a) focused on solutions to the technical problem of AI alignment/safety, or (b) presented technical work on making ML systems transparent and fair. I found a lot of these presentations really interesting, and think this kind of work is important at a conference like AIES. But they’re also two areas which are now fairly well-established subfields of AI research, and I was a bit disappointed not to see more research that focused on policy or governance approaches to AI in particular. It’s also definitely challenging to bring together people from different disciplines and have them communicate effectively with one another - in some cases it felt like the talks were steeped in the language of a discipline I wasn’t familiar with and so difficult to follow, and someone else commented to me that they felt some presentations coming from one discipline weren’t aware enough of relevant work in existing disciplines.
All that said, AIES still did a much better job of bringing together a wide range of people and perspectives than most conferences I’ve been to! And it was only the second year of the conference, so there’s lots of room for it to develop and provide an even better environment for cross-disciplinary collaboration and conversations. I think it’d be worth thinking about how to get the call for papers out to a wider range of disciplines and groups next year, and how to build opportunities for sharing insights across relevant disciplines into the conference program (maybe a session or workshop on the side explicitly focused on this would be helpful.)
Some notes and highlights
These are some pretty quick thoughts/summaries of some of the talks I found interesting and actually managed to take notes on...
There were a couple of talks on AI safety/alignment research that I thought did a really good job of explaining technical work in an engaging and accessible way. This is something I particularly care about, as I think it’s really important for those working on ethics/policy issues to have a solid a grounding in what’s technically possible in general, and technical approaches to ensuring safe and beneficial AI specifically.
Anca Dragan - Specifying AI Objectives as a Human-AI Collaboration Problem
First, Anca Dragan from UC Berkeley/CHAI gave a great talk on her group’s work on inverse reward design, an approach to building reinforcement learning systems that can learn what we want them to do from what we tell them. The idea here is to avoid us having to specify a reward function precisely, which is really difficult - in large part because we’re only able to consider a relatively small subset of possible situations, and will inevitably always miss some unintended consequences. So rather than us providing the agent with a reward function and the agent taking that literally as what it should do, the idea of inverse reward design is that the agent should take the specified reward function merely as evidence of what we actually want, as evidence of the “true reward function.” Instead of optimising a single reward function, then, the
agent learns a probability distribution over a range of possible reward functions (i.e. a range of things we might have meant), allowing it to then take actions which account for uncertainty, and are robust across a wide range of possibilities. This approach also makes the task of defining a reward function a much less difficult one for humans - in fact, we could define multiple different reward functions for different environments which are then all taken as evidence of what we actually want. Anca showed some nice results demonstrating how this could improve the robustness and reliability of a robotic arm.
Of course, this only works well if we have a good way to define the appropriate space of possible reward functions. We could give this to the agent explicitly, but we might not always know what the best space to consider is, and this somewhat defeats the point of wanting the agent to learn to be sensitive to situations we haven’t thought about. Something which can help with this is to enable the agent to come back and ‘query’ its designers about unknown situations, rather than being overly risk-averse (called ‘active inverse reward design’). This reframes the problem of trying to get an agent to do what we want somewhat: rather than being an optimisation problem, we might instead think of it as a human-AI collaboration problem. But we still face a tradeoff here in how we want this collaboration to work in practice - we’d rather human input didn’t constrain possibilities too much initially, but the more open we leave things the more the agent will likely need to query the human later, which is also costly and may be impractical.
A next step for research here is to try and build models which capture human biases - which would make it easier to model the kinds of mistakes we might make in specifying reward functions. This would make it easier for the agent to anticipate the kinds of mistakes we are likely to make across different scenarios, and so narrow the space of possible reward functions that are likely to be relevant initially.
Alexander Peysakhovich - Reinforcement learning and inverse reinforcement learning with system 1 and system 2
Second, and relatedly, Alex Peysakhovich presented some interesting work on incorporating models of human biases from behavioural economics into reinforcement learning. Alex begins by pointing out that inferring a person’s goals from their behaviour is important to many problems in AI - including for cooperation between humans and AI in general, for products such as recommender systems, and for inverse reinforcement learning (where we aim to train an agent to learn human goals/preferences from behaviour.) Most approaches to learning goals from behaviour in AI assume a rational actor model, which has been challenged by a great deal of research in psychology/behavioural economics in recent years. Peysakhovich’s paper uses the now-popular idea that some of the irrationalities in human cognition can be modelled as “two systems” (a slow, reflective system 2, and a fast, intuitive, associative system 1), formalised as two separate utility functions. He shows that both reinforcement learning and inverse reinforcement learning still work with this distinction between s1 and s2: it’s still possible to compute an optimal policy using the two utility functions, and you can infer what both s1 and s2 want separately using inverse RL.
Peysakhovich ends by with the broader suggestion that we need better models of human irrationalities for both RL and inverse RL. It was interesting to see this conclusion coming out pretty strongly both in this talk and in Anca’s talk described above. I also wonder whether trying to implement different models of human (ir)rationality in AI systems might yield some interesting findings for cognitive/behavioural science in return, as it could potentially provide a means of testing different models of cognition, and might encourage thinking in novel ways about how we model irrationality. For example, in my PhD I ended up suggesting that some of the things we call ‘biases’ in psychology might be better modelled as solutions to trade-offs that are better or worse suited to different environments (rather than as strict deviations from some well-defined normative standard) - I wonder if modelling the different tradeoffs people commonly face and the way they tend to resolve them could be useful here. More generally, if AI systems are using models of rationality to predict human behaviour or cooperate with humans, and we can measure how effectively they are able to do so, this might tell us something interesting about the utility (or even accuracy?) of those different models.
Daniel Susser - Invisible Influence: Artificial Intelligence and the Ethics of Adaptive Choice Architectures
A third talk I really enjoyed focused on how data and machine learning techniques can be used for online manipulation - something I’ve been thinking about myself recently. Susser opened his talk by pointing out that while a lot is being said now about the impact of AI on structural issues (bias, power, etc.), there’s somewhat less discussion about how AI is affecting and might affect individual experience. He focuses on online manipulation, defined as the use of information technologies to impose hidden influences on another person’s decision-making, which has the potential to both undermine autonomy and to diminish welfare. In particular, new forms of online manipulation are being made possible via the use of adaptive choice architectures: highly personalised choice environments, constructed based on data about individuals, that can be used to steer behaviour. Especially as we get used to new technologies, Susser points out, they recede from our conscious attention and awareness, and so we stop noticing their impact on us. (He uses the term “technological transparency” to refer to this fact that we stop noticing technologies and how they impact us, which is somewhat confusing as the term transparency is often now used to mean the goal of making people more aware of applications of AI technologies, almost the opposite!)
I enjoyed this talk and paper, especially as I’ve been thinking about some very related issues, and I think that the kinds of manipulation made possible by even today’s available data and ML techniques is worrying, and something we need to find ways to prevent. Some of the ways that Susser presented these ideas helped me to clarify some of my own half-formed thoughts on these issues, and I think this is a paper I’ll be returning to. One of the high-level points I took away from the talk was that the ‘invisibility’ of many of the technologies that are potentially influencing our choice environments is a big part of the threat to our autonomy - and that there’s therefore a real tradeoff between the benefits of a ‘seamless user experience’, and the costs of not making conscious decisions about how we use our phones, the internet, social media etc.
A few other mentions:
Gillian Hadfield presented joint work with Dylan Hadfield-Menell on Incomplete Contracting and AI Alignment, which attempts to draw insights from economics for the AI alignment problem. The idea of misalignment - between individual and societal welfare - is central to ‘principal agent analysis’ in economics, they point out, and this misalignment is governed by contracts. These contracts are generally incomplete i.e. they do not completely specify all behaviour in all situations, due to our limited rationality and the fact that some things are not easily describable or verifiable. These incomplete contracts are supported by the ability to take disputes into external formal or informal enforcement mechanisms, including legal processes. The idea that we might learn something from this about how to align AI systems with our interests seems like an interesting one, and worth exploring the implications for current work on AI alignment (such as the work on inverse reward design Anca Dragan presented where reward functions are specified imprecisely but agents can then query and work with humans to figure out the best behaviour in edge cases.) The talk didn’t really get into this as much as I would have liked, but they did only have 12 minutes - I should instead probably just read the paper properly!
Sky Croeser and Peter Eckersley presented a paper on Theories of parenting and their application to AI. I liked this idea simply because it was an angle I’d never thought of. I wasn’t totally convinced by the analogy e.g. there was a claim that “RL agents are like rampant toddlers”, but actually I think there’s a lot of ways they are very unlike toddlers: in particular, toddlers seem to be heavily driven by curiosity, trying out lots of new things, while this is something that is missing from standard RL (there’s no intrinsic desire to explore and try new things), and needs to be explicitly built in. That aside, the parenting perspective still raised some really interesting points. For example, Croeser and Eckersley suggested that if developers thought about building AI agents more from the perspective of parenting they might invest more effort in dataset curation relative to architecture design (we tend to care a lot about ensuring our children have the right kinds of experiences to learn from, but architecture design is currently much more popular in ML - that said, we have a lot more control over architecture design in AI systems than we do in children!) They also suggested that a parenting perspective might make people more open to differences in AI development - not necessarily seeking to create AI agents that are just like us - and reconsider the problem of control - perhaps being more open, within certain constraints, to giving up control once we have achieved a certain level of trust. I think it could be really interesting to think a bit more about how far these analogies between parenting and developing AI systems go and what their limits are, especially when it comes to questions about how “like us” we want AI systems to be and how much control we should be aiming for. Perhaps it makes sense for us to be much more willing to allow for difference and cede control with our own children, because we already have a relatively high baseline for how “like us” they will be, and good mechanisms for understanding and trusting human children, which we may not have with AI systems.
Tom Gilbert and Mckane Andrus gave an interesting talk on their paper Towards a Just Theory of Measurement: A Principled Social Measurement Assurance Program, where they made the point that AI ethics shouldn’t just be about making ML tools fair within the constraints of existing institutions, but that we should be going a step further and trying to use these tools to make the institutions and processes themselves more just. I’m not exactly sure how we do this in practice, but I think it’s a great point and one that applies to how we think about AI ethics more broadly: a lot of discussion and writing focuses on how we ensure that AI systems don’t worsen the status quo in certain ways, but we should also be thinking about how they may be able to change that status quo.
Inioluwa Deborah Raji and Joy Buolamwini presented a really cool paper on what they call Actionable Auditing - in particular, looking at the impact of publicly naming companies whose products were found to have biased performance (in this case, racial bias in facial recognition models.) They found that those companies which were publicly named significantly reduced the bias in their models (without reducing overall performance) relative to those who were not named. I really liked this as showing that it’s actually possible to change companies’ behaviour in a direction that’s clearly positive, given a relatively small intervention.
David Danks gave a great invited talk on “The Value of Trustworthy AI”. I didn’t manage to take notes on this so I’m forgetting some of the details, but he gave a pretty in-depth, very clear analysis of what we mean by “trust” in AI and why it matters, drawing on both philosophy and psychology literature. I enjoyed how clear and precise the talk was, and Danks is an amazing speaker. By the end, though, I wasn’t sure if he’d really reached any new conclusions, or just built a much more solid and rigorous foundation below claims that we mostly all already accept and know to be true (e.g. why interpretability is important and in what contexts.) I don’t think this is necessarily a problem, and given how often these claims about trust and interpretability and made uncritically and ambiguously I think this more rigorous foundation can be incredibly useful - I’m just not quite sure how deep this foundation needs to go, and how useful it is relative to more constructive work. One new-ish thing the talk did make me think about is how we will, almost inevitably, sometimes need to build trust on something other than really understanding how a system works - and how we probably need much more research on what this might look like. This is something I’ve thought about before but seemed much clearer to me after Danks’ analysis.
A bit of AAAI
I also managed to make it to a couple of sessions at AAAI, the bigger AI conference of which AIES was a part.
The first was a panel debate on the “Future of AI”, which was surprisingly entertaining - the moderator and panelists had decided to try and make it light-hearted given it was in the early evening, and I laughed a lot more than I expected to. The proposition was “The AI community today should continue to focus mostly on ML methods.” Of course, this is frustratingly vague in a few ways - what exactly counts as “mostly”? How long past literally “today” does this extend? - but I’ll resist focusing on this. What was somewhat surprising is that a majority of the audience - 64% - voted against the proposition, and the panel seemed to come out stronger in that direction too. I’m not sure whether people wanted to be contrary or somehow ‘interesting’ or progressive in their answers, but I didn’t expect this. I’d be pretty interested to see a comparison between these votes and the proportion of the audience who themselves work mostly on methods they would refer to as ML (and plan to continue to for the considerable future...) I strongly suspect it would be more than 36%...
The main argument on the “for” side (i.e. the AI community should focus mostly on ML) was that ML is where we are suddenly making a lot of progress that shows no sign of slowing, we haven’t been doing this for all that long, and it’s currently the area of AI we understand least - and so here we should continue to focus, for now. But even those arguing this side suggested that “ultimately” we would need a much broader set of approaches, and emphasised the importance of combining ML with symbolic approaches and different kinds of structure. The “against” side began by taking a more, um, humorous approach - comparing the current focus on ML within AI research to the populist movements leading to the election of Trump and Brexit... And perhaps my favourite quote of the conference: “If you have any doubt that an AI winter is coming, just look outside: we’ve all come to Hawaii and today was a disaster!” (It had been an unusually cold and rainy day by Hawaii standards...)
One ambiguity in the proposition which I found a bit frustrating was that there was no clear statement or agreement on what counts as “ML methods” and what was considered “other approaches.” Much of the time it seemed like “ML” was actually being used to mean something more like “learning purely from data using deep neural networks.” One side would claim something like “we need to figure out how to incorporate more innate knowledge and cognitive architectures into ML approaches” as an argument against focusing mostly on ML, and the other side would just respond with “but that’s still ML!” This reminds me of a similar frustration I’ve felt when people talk about whether “current methods” in AI will enable us to solve certain problems, but it’s not really clear where the boundaries of current methods lie. Presumably many kinds of new architectures don’t move us that far away from current methods, and nor does incorporating insights from other disciplines to improve current methods... I think what people are trying to get at here is the possibility of new, deep algorithmic insights of some kind on a similar level to training neural networks using gradient descent - but I don’t think there’s any clear line here between what counts as totally novel and what’s just an adaption of existing approaches. I’d be interested to see more discussion here that explicitly picks apart different types of approach/research in AI, and different kinds of novel insight/progress that might be made, rather than talking in vague terms about “ML” or “current methods.”
I also made it to Ian Goodfellow’s keynote talk on Adversarial Machine Learning. Goodfellow essentially argued that adversarial approaches underpin (or at least can be very useful for) most of the new and important areas that ML is beginning to branch out into, now that we’ve got the basics down. The basic idea of adversarial ML, as I understand it, is to train two different ML systems (normally neural networks) which have connected and ‘adversarial’ goals, such that they continually force each other to improve. The classic example, generative adversarial networks (GANs) involves training one network that aims to classify images correctly, and another network which aims to generate images that will ‘fool’ the first network into misclassifying them. As the discriminator gets better at telling the difference between images, the generator must get better at producing ‘convincing’ images to achieve its objective, which then means the former has to be able to discriminate more finely, and so on.
Goodfellow then spent an hour going through many of the areas where adversarial ML can be useful: including in generative modelling, security, reliability, model-based optimisation, reinforcement learning, domain adaptation, fairness, accountability and transparency (FATML), and neuroscience. There were some really interesting examples here, and it helped me to better understand what adversarial ML is actually doing, and how it has applications beyond the standard panda image adversarial example. I’m naturally a bit sceptical of anything that attempts to claim that a single method, approach, or theory, can be applied to almost all things we think are important (especially if that method happens to be the speaker’s own specialism), and Goodfellow’s talk felt a little bit like that... but at the same time, contained a lot of really interesting ideas and I liked the fact it had a really clear message and cohesive structure.