AI and the value alignment problem

I was recently listening to the following podacst with Sam Harris and Stuart Russell. It’s a nice discussion of the future of AI and superintelligence. They give a good basic introduction, while also introducing some thoughts I hadn’t heard before.

They discuss what is currently known as the “value alignment problem” in AI. That is, if (when?) we create a super-intelligent computer, we could be SoL if we don’t embed the proper values within it. Russell makes a fun analogy to Midas (whose wish to have all he touched turn to gold demonstrated a lack of foresight). Though we might hope our superintelligent overlords / overseers will promote flourishing, many in the AI community rightly feel that we have no clear way to convey our sense of what human flourishing is to the robot.

Here is my solution: add randomness to the AI’s objective function. At least traditionally, AI’s have some sort of objective. They try to balance upright, or yield high stock returns, or maximize paperclip production. Things go wrong, because we can’t perfectly and parsimoniously state a rule for an AI to follow, because we human beings can’t (and wouldn’t want) to do this for ourselves.

So perhaps we introduce a bit or randomness into the objective function. Aim for f₁, and then f₂. By confusing an AI in this way, can’t we introduce a balance in its goal seeking? Thoughts?