Nathan Helm-Burger

AI alignment researcher, ML engineer. Masters in Neuroscience.

I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility. 

See my prediction markets here:

 https://manifold.markets/NathanHelmBurger/will-gpt5-be-capable-of-recursive-s?r=TmF0aGFuSGVsbUJ1cmdlcg 

I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.

I now work for SecureBio on AI-Evals.

relevant quote: 

"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe 

Wiki Contributions

Comments

I'm not so sure. You might be right, but I suspect that catastrophic forgetting may still be playing an important role in limiting the peak capabilities of an LLM of given size. Would it be possible to continue Llama3 8B's training much much longer and have it eventually outcompete Llama3 405B stopped at its normal training endpoint?

I think probably not? And I suspect that if not, that part (but not all) of the reason would be catastrophic forgetting. Another part would be limited expressivity of smaller models, another thing which the KANs seem to help with.

Wow, this is super fascinating.

A juicy tidbit:

Catastrophic forgetting is a serious problem in current machine learning [24]. When a human masters a task and switches to another task, they do not forget how to perform the first task. Unfortunately, this is not the case for neural networks. When a neural network is trained on task 1 and then shifted to being trained on task 2, the network will soon forget about how to perform task 1. A key difference between artificial neural networks and human brains is that human brains have functionally distinct modules placed locally in space. When a new task is learned, structure re-organization only occurs in local regions responsible for relevant skills [25, 26], leaving other regions intact. Most artificial neural networks, including MLPs, do not have this notion of locality, which is probably the reason for catastrophic forgetting.

Yeah, I was playing around with using a VAE to compress the logits output from a language transformer. I did indeed settle on treating the vocab size (e.g. 100,000) as the 'channels'.

So when trying to work with language data vs image data, an interesting assumption of the ml vision research community clashes with an assumption of the language research community. For a language model, you represent the logits as a tensor with shape [batch_size, sequence_length, vocab_size]. For each position in the sequence, there are a variety of likelihood values of possible tokens for that position.

In vision models, the assumption is that the data will be in the form [batch_size, color_channels, pixel_position]. Pixel position can be represented as a 2d tensor or flattened to 1d.

See the difference? Sequence position comes first, pixel position comes second. Why? Because a color channel has a particular meaning, and thus it is intuitive for a researcher working with vision data to think about the 'red channel' as a thing which they might want to separate out to view. What if we thought of 2nd-most-probable tokens the same way? Is it meaningful to read a sequence of all 1st-most-probable tokens, then read a sequence of all 2nd-most-probable tokens? You could compare the semantic meaning, and the vibe, of the two sets. But this distinction doesn't feel as natural for language logits as it does for color channels.

This is something I've been thinking about as well, and I think you do a good job explaining it. There's definitely more to breakdown and analyze within competence and intelligence. Such as simulation being a distinct sort of part of intelligence. A measure of how many moves a player can think ahead in a strategy game like chess or Go. How large of a possibility-tree can they build in the available time? With what rate of errors? How quickly does the probability of error increase as the tree increases in size? How does their performance decrease as the complexity of the variables needed to be tracked for an accurate simulation increase?

As a population of AGI copies, the obvious first step towards 'taking over the world' is to try to improve oneself.

I expect that the described workforce could find improvements within a week of clock time including one or more of:

Improvements to peak intelligence without needing to fully retrain.

Improvements to inference efficiency.

Improvements to ability to cooperate and share knowledge.

I've been using a remineralization toothpaste imported from Japan for several years now, ever since I mentioned reading about remineralization to a dentist from Japan. She recommended the brand to me. FDA is apparently bogging down release in the US, but it's available on Amazon anyway. It seems to have slowed, but not stopped, the formation of cavities. It does seem to result in faster plaque build-up around my gumline, like the bacterial colonies are accumulating some of the minerals not absorbed by the teeth. The brand I use is apagard. [Edit: I'm now trying the recommended mouthwash CloSys as the link above recommended, using it before brushing, and using Listerine after. The CloSys seems quite gentle and pleasant as a mouthwash. Listerine is harsh, but does leave my teeth feeling cleaner for much longer. I'll try this for a few years and see if it changes my rate of cavity formation.]

Thanks faul-sname. I came to the comments to give a much lower effort answer along the same lines, but yours is better. My answer: lazy local evaluations of nodes surrounding either your current position or the position of the goal. So long as you can estimate a direction from yourself to the goal, there's no need to embed the whole graph. This is basically gradient descent...

Personally, I most enjoyed the first one in the the series, and enjoyed listening to Joe's reading of it even more than when I just read it. My top three are 1, 6, 7.

I generally agree, i just have some specific evidence which I believe should adjust estimates in the report towards expecting more accessible algorithmic improvements than some people seem to think.

Load More