I've been thinking about what research projects I should work on, and I've posted my current view. Naturally, I think these are also good projects for other people to work on as well.
Brief summaries of the projects I find most promising:
- Elaborating on apprenticeship learning. Imitating human behavior seems especially promising as a scalable approach to AI control, but there are many outstanding problems.
- Efficiently using human feedback. The limited availability of human feedback may be a serious bottleneck for realistic approaches to AI control.
- Explaining human judgments and disagreements. My preferred approach to AI control requires humans to understand AIs’ plans and beliefs. We don’t know how to solve the analogous problem for humans.
- Designing feedback mechanisms for reinforcement learning. A grab bag of problems, united by a need for proxies of hard-to-optimize, implicit objectives.
Minor naming feedback. You switched from calling something "supervised learning" to "reinforcement learning". The first images that come to my mind when I hear "reinforcement learning" are TD-Gammon and reward signals. So, when I read "reinforcement learning", I first think of a computer getting smarter through iterative navel-gazing, then think of a computer trying to wirehead itself, then stumble to the meaning I think you intend. I am a lay reader.