I believe that current architecture of neural networks supports mesa-optimization: generally speaking, searching across some vectors in order to select one of them, which will be most useful for producing an answer.

Three inner optimization ways are already possible, and most likely there will be new ones.

  1. Optimization on scaffolding level.
    1. Iterative diffusion models (e.g. Stable Diffusion) are very close to this, modifying intermediate pseudo-image each pass.
    2. LLMs with scratchpad can write down some amount of text - including plans, for instance - then compare them to calculate output.
  2. Custom "neuron" functions embedded into the neural network, which would perform search (for instance, generate arbitrary incoming data to subnetwork, infer output and select the best). To my knowledge, this way is not currently used in any public models (but name "Q*" is pretty suspicious, reminding of "A*" - path searching algorithm in graph).
  3. Option selection based on nonlinear activation functions - mostly, ReLU.
    If subnetwork has inputs  and , it's pretty easy to output . Additional information can be selected either by squeezing it into input numbers, or by building a slightly bigger subnetwork.

Actual subnetwork design for point #3

Let's suppose the neural network consists of layers, as common now - composition of matrix multiplication and activation function - and activation function .

  1. A, B - inputs ();

This construction can be extended to select maximum out of  options in k layers; possibly, even  options.

Conclusion

I believe that inner optimization might exist in current neural networks, and that it can be used as evidence to approximate what future AIs can do at what levels of capability.

New Comment
1 comment, sorted by Click to highlight new comments since:

I've laid out a concrete example of this at https://www.lesswrong.com/posts/FgXjuS4R9sRxbzE5w/medical-image-registration-the-obscure-field-where-deep , following the "optimization on a scaffold level" route. I found a real example of a misaligned inner objective outside of RL, which is cool