Alexander Gietelink Oldenziel

(...) the term technical is a red flag for me, as it is many times used not for the routine business of implementing ideas but for the parts, ideas and all, which are just hard to understand and many times contain the main novelties.
                                                                                                           - Saharon Shelah

 

As a true-born Dutchman I endorse  Crocker's rules.

For my most of my writing see my short-forms (new shortform, old shortform)

Twitter: @FellowHominid

Personal website: https://sites.google.com/view/afdago/home

Sequences

Singular Learning Theory

Wiki Contributions

Comments

You may be right. I don't know of course. 

At this moment in time, it seems scaffolding tricks haven't really improved the baseline performance of models that much. Overwhelmingly, the capability comes down to whether the rlfhed base model can do the task.

To some degree yes, they were not guaranteed to hold. But by that point they held for over 10 OOMs iirc and there was no known reason they couldn't continue.

This might be the particular twitter bubble I was in but people definitely predicted capabilities beyond simple extrapolation of scaling laws.

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthened, primarily because some of the never-clearly-stated but heavily implied AI developments by proponents of very short timelines have not materialized. To be clear, it has only been a year since gpt-4 is released, and gpt-5 is around the corner, so perhaps my hope is premature. Still my timelines are lengthening. 

A year ago, when gpt-3 came out progress was blindingly fast. Part of short timelines came from a sense of 'if we got surprised so hard by gpt2-3, we are completely uncalibrated, who knows what comes next'.

People seemed surprised by gpt-4 in a way that seemed uncalibrated to me. gpt-4 performance was basically in line with what one would expect if the scaling laws continued to hold. At the time it was already clear that the only really important driver was compute  data and that we would run out of both shortly after gpt-4. Scaling proponents suggested this was only the beginning, that there was a whole host of innovation that would be coming. Whispers of mesa-optimizers and simulators. 

One year in: Chain-of-thought doesn't actually improve things that much. External memory and super context lengths ditto. A whole list of proposed architectures seem to serve solely as a paper mill. Every month there is new hype about the latest LLM or image model. Yet they never deviate from expectations based on simple extrapolation of the scaling laws. There is only one thing that really seems to matter and that is compute and data. We have about 3 more OOMs of compute to go. Data may be milked another OOM. 

A big question will be whether gpt-5 will suddenly make agentGPT work ( and to what degree). It would seem that gpt-4 is in many ways far more capable than (most or all) humans yet agentGPT is curiously bad. 

All-in-all AI progress** is developing according to the naive extrapolations of Scaling Laws but nothing beyond that. The breathless twitter hype about new models is still there but it seems to be believed more at a simulacra level higher than I can parse. 

Does this mean we'll hit an AI winter? No. In my model there may be only one remaining roadblock to ASI (and I suspect I know what it is). That innovation could come at anytime. I don't know how hard it is, but I suspect it is not too hard. 

* the term AGI seems to denote vastly different things to different people in a way I find deeply confusing. I notice that the thing that I thought everybody meant by AGI is now being called ASI. So when I write AGI, feel free to substitute ASI. 

** or better, AI congress

addendum:  since I've been quoted in dmurfet's AXRP interview as believing that there are certain kinds of reasoning that cannot be represented by transformers/LLMs I want to be clear that this is not really an accurate portrayal of my beliefs. e.g. I don't think transformers don't truly understand, are just a stochastic parrot, or in other ways can't engage in the abstract reasoning that humans do. I think this is clearly false, as seen by interacting with any frontier model. 

Wildlife Welfare Will Win

The long arc of history bend towards gentleness and compassion. Future generations will look with horror on factory farming. And already young people are following this moral thread to its logical conclusion; turning their eyes in disgust to mother nature, red in tooth and claw. Wildlife Welfare Done Right, compassion towards our pets followed to its forceful conclusion would entail the forced uploading of all higher animals, and judging by the memetic virulences of shrimp welfare to lower animals as well. 

Morality-upon-reflexion may very well converge on a simple form of pain-pleasure utilitarianism. 

There are few caveats: future society is not dominated, controlled and designed by a singleton AI-supervised state, technology inevitable stalls and that invisible hand performs its inexorable logic for the eons and an Malthuso-Hansonian world will emerge once again - the industrial revolution but a short blip of cornucopia. 

Perhaps a theory of consciousness is discovered and proves once and for all homo sapiens and only homo sapiens are conscious ( to a significant degree). Perhaps society will wirehead itself into blissful oblivion.  Or perhaps a superior machine intelligence arises, one whose final telos is the whole of and nothing but office supplies. Or perhaps stranger things still happen and the astronomo-cosmic compute of our cosmic endowment is engaged for mysterious purposes. Arise, self-made god of pancosmos. Thy name is UDASSA. 

I have heard of AIXI but haven't looked deeply into it. I'm curious about it. What are some results you think are cool in this field ?

This seems valuable! I'd be curious to hear more !!

Interesting...

Wouldn't I expect the evidence to come out in a few big chunks, e.g. OpenAI releasing a new product?

I agree with you. 

Epsilon machine (and MSP) construction is most likely computationally intractable [I don't know an exact statement of such a result in the literature but I suspect it is true] for realistic scenarios. 

Scaling an approximate version of epsilon reconstruction seems therefore of prime importance. Real world architectures and data has highly specific structure & symmetry that makes it different from completely generic HMMs. This must most likely be exploited. 

The calculi of emergence paper has inspired many people but has not been developed much. Many of the details are somewhat obscure, vague. I also believe that most likely completely different methods are needed to push the program further. Computational Mechanics' is primarily a theory of hidden markov models - it doesn't have the tools  to easily describe behaviour higher up the Chomsky hierarchy. I suspect more powerful and sophisticated algebraic, logical and categorical thinking will be needed here. I caveat this by saying that Paul Riechers has pointed out that actually one can understand all these gadgets up the Chomsky hierarchy as infinite HMMs which may be analyzed usefully just as finite HMMs. 

The still-underdeveloped theory of epsilon transducers I regard as the most promising lens on agent foundations. This is uncharcted territory; I suspect the largest impact of computational mechanics will come from this direction. 

Your point on True Names is well-taken. More basic examples than gauge information, synchronization order are the triple of quantites entropy rate , excess entropy  and Crutchfield's  statistical/forecasting complexity . These are the most important quantities to understand for any stochastic process (such as the structure of language and LLMs!)

I agree with you that the new/surprising thing is the linearity of the probe. Also I agree that not entirely clear how surprising & new linearity of the probe is.

If you understand how the causal states construction & the MSP works in computational mechanics the experimental results isn't surprising. Indeed, it can't be any other way! That's exactly the magic of the definition of causal states.

What one person might find surprising or new another thinks trivial. The subtle magic of the right theoretical framework is that it makes the complex simple, surprising phenomena apparent.

Before learning about causal states I would have not even considered that there is a unique (!) optimal minimal predictor canonical constructible from the data. Nor that the geometry of synchronizing belief states is generically a fractal. Of course, once one has properly internalized the definitions this is almost immediate. Pretty pictures can be helpful in building that intuition !

Adam and I (and many others) have been preaching the gospel of computational mechanics for a while now. Most of it has fallen on deaf ears before. Like you I have been (positively!) surprised and amused by the sudden outpouring of interest. No doubt it's in part a the testimony to the Power of the Visual! Never look a gift horse in the mouth ! _

I would say the parts of computational mechanics I am really excited are a little deeper - downstream of causal states & the MSP. This is just a taster.

I'm confused & intrigued by your insistence that this is follows from the good regulator theorem. Like Adam I don't understand it. It is my understanding is that the original 'theorem' was wordcelled nonsense but that John has been able to formulate a nontrivial version of the theorem. My experience is that it the theorem is often invoked in a handwavey way that leaves me no less confused than before. No doubt due to my own ignorance !

I would be curious to hear a *precise * statement why the result here follows from the Good Regular Theorem.

Load More