AI Alignment Is Turning from Alchemy Into Chemistry


For years, AI alignment has been a field that seemed strange, stuck, and lacking any clear ideas or progress. It has often been compared to alchemy, and many ML researchers have steered clear of it. Recently, however, there have been some breakthroughs in the field that suggest progress is possible. Unsupervised alignment has been demonstrated as possible, suggesting that the field is becoming real. These breakthroughs suggest that it is time to ignore the "craziness" of the field and think for oneself, discarding history and focusing on one's own ideas and experiments. It is also important to remember that no one knows when progress will be made, so it is a gamble worth taking.


What is AI Alignment?
AI Alignment is the field where people are working on making AI not take over humanity and/or kill us all.

What progress has been made in the field of AI Alignment?
Collin Burns et al's Discovering Latent Knowledge in Language Models Without Supervision was the first alignment paper to make meaningful progress on the issue of evaluating AI systems when we don’t understand what they’re doing.

What implications does the progress of AI Alignment have?
The implications for alignment are to ignore the “craziness”, discard history, stop reading the literature, and think for yourself; to focus on one's own ideas and experiments; and to remember that one doesn't know if they are capable of making a contribution until they actually make one.

What is the importance of Alignment separate from Safety?
The importance of Alignment separate from Safety is to figure out how to evaluate AI systems when we don’t understand what they’re doing.

How can one make contributions to the field of AI Alignment?
To make contributions to the field of AI Alignment, one should focus on their own ideas and experiments, and think for themselves.

