Epistemic status: speculative

For a while, I’ve had the intuition that current machine learning techniques, though powerful and useful, are simply not touching some of the functions of the human mind. But before I can really get at how to justify that intuition, I would have to start clarifying what different kinds of thinking there are. I’m going to be reinventing the wheel a bit here, not having read that much cognitive science, but I wanted to write down some of the distinctions that seem important, and trying to see whether they overlap. A lot of this is inspired by Dreyfus’ Being-in-the-World. I’m also trying to think about the questions raised in the post “What are Intellect and Instinct?”

Effortful vs. Effortless

In English, we have different words for perceiving passively versus actively paying attention. To see vs. to look, to hear vs. to listen, to touch vs. to feel. To go looking for a sensation means exerting a sort of mental pressure; in other words, effort. William James, in his Principles of Psychology, said “Attention and effort, as we shall see later, are but two names for the same psychic fact.” He says, in his famous introduction to attention, that

Every one knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Localization, concentration, of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others, and is a condition which has a real opposite in the confused, dazed, scatterbrained state which in French is called distraction, and Zerstreutheit in German.

Furthermore, James divides attention into two types:

Passive, reflex, non-voluntary, effortless ; or Active and voluntary.

In other words, the mind is always selecting certain experiences or thoughts as more salient than others; but sometimes this is done in an automatic way, and sometimes it’s effortful/voluntary/active. A fluent speaker of a language will automatically notice a grammatical error; a beginner will have to try effortfully to catch the error.

In the famous gorilla experiment where subjects instructed to count passes in a basketball game failed to notice a gorilla on the basketball field, “counting the passes” is paying effortful attention, while “noticing the gorilla” would be effortless or passive noticing.

Activity in “flow” (playing a musical piece by muscle memory, or spatially navigating one’s own house) is effortless; activities one learns for the first time are effortful.

Oliver Sacks’ case studies are full of stories that illustrate the importance of flow. People with motor disorders like Parkinson’s can often dance or walk in rhythm to music, even when ordinary walking is difficult; people with memory problems sometimes can still recite verse; people who cannot speak can sometimes still sing. “Fluent” activities can remain undamaged when similar but more “deliberative” activities are lost.

The author of intellectualizing.net thinks about this in the context of being an autistic parent of an autistic son:

Long ago somewhere I can’t remember, I read a discussion of knowing what vs. knowing how. The author’s thought experiment was about walking. Imagine walking with conscious planning, thinking consciously about each muscle and movement involved. Attempting to do this makes us terrible at walking.

When I find myself struggling with social or motor skills, this is the feeling. My impression of my son is the same. Rather than trying something, playing, experimenting he wants the system first. First organize and analyze it, then carefully and cautiously we might try it.

A simple example. There’s a curriculum for writing called Handwriting Without Tears. Despite teaching himself to read when barely 2, my son refused to even try to write. Then someone showed him this curriculum in which letters are broken down into three named categories according to how you write them; and then each letter has numbered strokes to be done in sequence. Suddenly my son was interested in writing. He approached it by first memorizing the whole Handwriting Without Tears system, and only then was he willing to try to write. I believe this is not how most 3-year-olds work, but this is how he works.

One simple study (“Children with autism do not overimitate”) had to do with children copying “unnecessary” or “silly” actions. Given a demonstration by an adult, autistic kids would edit out pointless steps in the demonstrated procedure. Think about what’s required to do this: the procedure has to be reconstructed from first principles to edit the silly out. The autistic kids didn’t take someone’s word for it, they wanted to start over.

The author and his son learn skills by effortful conscious planning that most people learn by “picking up” or “osmosis” or “flow.”

Most of the activity described by Heidegger’s Being and Time, and Dreyfus’ commentary Being-In-The-World, is effortless flow-state “skilled coping.” Handling a familiar piece of equipment, like typing on a keyboard, is a prototypical example. You’re not thinking about how to do it except when you’re learning how for the first time, or if it breaks or becomes “disfluent” in some way. If I’m interpreting him correctly, I think Dreyfus would say that neurotypical adults spend most of their time, minute-by-minute, in an effortless flow state, punctuated by occasions when they have to plan, try hard, or figure something out.

William James would agree that voluntary attention occupies a minority of our time:

There is no such thing as voluntary attention sustained for more than a few seconds at a time. What is called sustained voluntary attention is a repetition of successive efforts which bring back the topic to the mind.

(This echoes the standard advice in mindfulness meditation that you’re not aiming for getting the longest possible period of uninterrupted focus, you’re training the mental motion of returning focus from mind-wandering.)

Effortful attention can also be viewed as the cognitive capacities which stimulants improve. Reaction times shorten, and people distinguish and remember the stimuli in front of them better.

It’s important to note that not all focused attention is effortful attention. If you are playing a familiar piece on the piano, you’re in a flow state, but you’re still being “focused” in a sense; you’re noticing the music more than you’re noticing conversation in another room, you’re playing this piece rather than any other, you’re sitting uninterrupted at the piano rather than multitasking. Effortless flow can be extremely selective and hyper-focused (like playing the piano), just as much as it can be diffuse, responsive, and easily interruptible (like navigating a crowded room). It’s not the size of your window of salience that distinguishes flow from effortful attention, it’s the pressure that you apply to that window.

Psychologists often call effortful attention cognitive disfluency, and find that experiences of disfluency (such as a difficult-to-read font) improve syllogistic reasoning and reduce reliance on heuristics, while making people more likely to make abstract generalizations. Disfluency improves results on measures of “careful thinking” like the Cognitive Reflection Test as well as on real-world high-school standardized tests, and also makes people less likely to confess embarrassing information on the internet. In other words, disfluency makes people “think before they act.” Disfluency raises heart rate and blood pressure, just like exercise, and people report it as being difficult and reliably disprefer it to cognitive ease. The psychology research seems consistent with there being such a thing as “thinking hard.” Effortful attention occupies a minority of our time, but it’s prominent in the most specifically “intellectual” tasks, from solving formal problems on paper to making prudent personal decisions.

What does it mean, on a neurological or a computational level, to expend mental effort? What, precisely, are we doing when we “try hard”? I think it might be an open question.

Do the neural networks of today simulate an agent in a state of “effortless flow” or “effortful attention”, or both or neither? My guess would be that deep neural nets and reinforcement learners are generally doing effortless flow, because they excel at the tasks that we generally do in a flow state (pattern recognition and motor learning.)

Explicit vs. Implicit

Dreyfus, as an opponent of the Representational Theory of Mind, believes that (most of) cognition is not only not based on a formal system, but not in principle formalizable. He thinks you couldn’t possibly write down a theory or a set of rules that explain what you’re doing when you drive a car, even if you had arbitrary amounts of information about the brain and human behavior and arbitrary amounts of time to analyze them.

This distinction seems to include the distinctions of “declarative vs. procedural knowledge”, “know-what vs. know-how”, savoir vs. connaître. We can often do, or recognize, things that we cannot explain.

I think this issue is related to the issue of interpretability in machine learning; the algorithm executes a behavior, but sometimes it seems difficult or impossible to explain what it’s doing in terms of a model that’s simpler than the whole algorithm itself.

The seminal 2001 article by Leo Breiman, “Statistical Modeling: The Two Cultures” and Peter Norvig’s essay “On Chomsky and the Two Cultures of Statistical Learning” are about this issue. The inverse square law of gravitation and an n-gram Markov model for predicting the next word in a sentence are both statistical models, in some sense; they allow you to predict the dependent variable given the independent variables. But the inverse square law is interpretable (it makes sense to humans) and explanatory (the variables in the model match up to distinct phenomena in reality, like masses and distances, and so the model is a relationship between things in the world.)

Modern machine learning models, like the n-gram predictor, have vast numbers of variables that don’t make sense to humans and don’t obviously correspond to things in the world. They perform well without being explanations. Statisticians tend to prefer parametric models (which are interpretable and sometimes explanatory) while machine-learning experts use a lot of non-parametric models, which are complex and opaque but often have better empirical performance. Critics of machine learning argue that a black-box model doesn’t bring understanding, and so is the province of engineering rather than science. Defenders, like Norvig, flip open a random issue of Science and note that most of the articles are not discovering theories but noting observations and correlations. Machine learning is just another form of pattern recognition or “modeling the world”, which constitutes the bulk of scientific work today.

These are heuristic descriptions; these essays don’t make explicit how to test whether a model is interpretable or not. I think it probably has something to do with model size; is the model reducible to one with fewer parameters, or not? If you think about it that way, it’s obvious that “irreducibly complex” models, of arbitrary size, can exist in principle — you can just build simulated data sets that fit them and can’t be fit by anything simpler.

How much of human thought and behavior is “irreducible” in this way, resembling the huge black-box models of contemporary machine learning? Plausibly a lot. I’m convinced by the evidence that visual perception runs on something like convolutional neural nets, and I don’t expect there to be “simpler” underlying laws. People accumulate a lot of data and feedback through life, much more than scientists ever do for an experiment, so they can “afford” to do as any good AI startup does, and eschew structured models for open-ended, non-insightful ones, compensating with an abundance of data.

Subject-Object vs. Relational

This is a concept in Dreyfus that I found fairly hard to pin down, but the distinction seems to be operating upon the world vs. relating to the world. When you are dealing with raw material — say you are a potter with a piece of clay — you think of yourself as active and the clay as passive. You have a goal (say, making a pot) and the clay has certain properties; how you act to achieve your goal depends on the clay’s properties.

By contrast, if you’re interacting with a person or an animal, or even just an object with a UI, like a stand mixer, you’re relating to your environment. The stand mixer “lets you do” a small number of things — you can change attachments or speeds, raise the bowl up and down, remove the bowl, fill it with food or empty it. You orient to these affordances. You do not, in the ordinary process of using a stand mixer, think about whether you could use it as a step-stool or a weapon or a painting tool. (Though you might if you are a child, or an engineer or an artist.) Ordinarily you relate in an almost social, almost animist, way to the stand mixer. You use it as it “wants to be used”, or rather as its designer wants you to use it; you are “playing along” in some sense, being receptive to the external intentions you intuit.

And, of course, when we are relating to other people, we do much stranger and harder-to-describe things; we become different around them, we are no longer solitary agents pursuing purely internally-driven goals. There is such a thing as becoming “part of a group.” There is the whole messy business of culture.

For the most part, I don’t think machine-learning models today are able to do either subject-object or relational thinking; the problems they’re solving are so simple that neither paradigm seems to apply. “Learn how to work a stand mixer” or “Figure out how to make a pot out of clay” both seem beyond the reach of any artificial intelligence we have today.

Aware vs. Unaware

This is the difference between sight and blindsight. It’s been shown that we can act on the basis of information that we don’t know we have. Some blind people are much better than chance at guessing where a visual stimulus is, even though they claim sincerely to be unable to see it. Being primed by a cue makes blindsight more accurate — in other words, you can have attention without awareness.

Anosognosia is another window into awareness; it is the phenomenon when disabled people are not aware of their deficits (which may be motor, sensory, speech-related, or memory-related.) In unilateral neglect, for instance, a stroke victim might be unaware that she has a left side of her body; she won’t eat the left half of her plate, make up the left side of her face, etc. Sensations may still be possible on the left side, but she won’t be aware of them. Squirting cold water in the left ear can temporarily fix this, for unknown reasons.

Awareness doesn’t need to be explicit or declarative; we aren’t formalizing words or systems constantly when we go through ordinary waking life. It also doesn’t need to be effortful attention; we’re still aware of the sights and sounds that enter our attention spontaneously.

Efference copy signals seem to provide a clue to what’s going on in awareness. When we act (such as to move a limb), we produce an “efference copy” of what we expect our sensory experience to be, while simultaneously we receive the actual sensory feedback. “This process ultimately allows sensory reafferents from motor outputs to be recognized as self-generated and therefore not requiring further sensory or cognitive processing of the feedback they produce.” This is what allows you to keep a ‘still’ picture of the world even though your eyes are constantly moving, and to tune out the sensations from your own movements and the sound of your own voice.

Schizophrenics may be experiencing a dysfunction of this self-monitoring system; they have “delusions of passivity or thought insertion” (believing that their movements or thoughts are controlled from outside) or “delusions of grandeur or reference” (believing that they control things with their minds that they couldn’t possibly control, or that things in the outside world are “about” themselves when they aren’t.) They have a problem distinguishing self-caused from externally-caused stimuli.

We’re probably keeping track, somewhere in our minds, of things labeled as “me” and “not me” (my limbs are part of me, the table next to me is not), sensations that are self-caused and externally-caused, and maybe also experiences that we label as “ours” vs. not (we remember them, they feel like they happened to us, we can attest to them, we believe they were real rather than fantasies.)

It might be as simple as just making a parallel copy of information labeled “self,” as the efference-copy theory has it. And (probably in a variety of complicated and as-yet-unknown ways), our brains treat things differently when they are tagged as “self” vs. “other.”

Maybe when experiences are tagged as “self” or labeled as memories, we are aware that they are happening to us. Maybe we have a “Cartesian theater” somewhere in our brain, through which all experiences we’re aware of pass, while the unconscious experiences can still affect our behavior directly. This is all speculation, though.

I’m pretty sure that current robots or ML systems don’t have any special distinction between experiences inside and outside of awareness, which means that for all practical purposes they’re always operating on blindsight.

Relationships and Corollaries

I think that, in order of the proportion of ordinary neurotypical adult life they take up, awareness > effortful attention > explicit systematic thought. When you look out the window of a train, you are aware of what you see, but not using effortful attention or thinking systematically. When you are mountain-climbing, you are using effortful attention, but not thinking systematically very much. When you are writing an essay or a proof, you are using effortful attention, and using systematic thought more, though perhaps not exclusively.

I think awareness, in humans, is necessary for effortful attention, and effortful attention is usually involved in systematic thought. (For example, notice how concentration and cognitive disfluency improve the ability to generalize or follow reasoning principles.) . I don’t know whether those necessary conditions hold in principle, but they seem to hold in practice.

Which means that, since present-day machine-learners aren’t aware, there’s reason to doubt that they’re going to be much good at what we’d call reasoning.

I don’t think classic planning algorithms “can reason” either; they’re hard-coding in the procedures they follow, rather than generating those procedures from simpler percepts the way we do. It seems like the same sort of misunderstanding as it would be to claim a camera can see.

(As I’ve said before, I don’t believe anything like “machines will never be able to think the way we do”, only that they’re not doing so now.)

The Weirdness of Thinking on Purpose

It’s popular these days to “debunk” the importance of the “intellect” side of “intellect vs. instinct” thinking. To point out that we aren’t always rational (true), are rarely thinking effortfully or explicitly (also true), can’t usually reduce our cognitive processes to formal systems (also true), and can be deeply affected by subconscious or subliminal processes (probably true).

Frequently, this debunking comes with a side order of sneer, whether at the defunct “Enlightenment” or “authoritarian high-modernist” notion that everything in the mind can be systematized, or at the process of abstract/deliberate thought itself and the people who like it. Jonathan Haidt’s lecture on “The Rationalist Delusion” is a good example of this kind of sneer.

The problem with the popular “debunking reason” frame is that it distracts us from noticing that the actual process of reasoning, as practiced by humans, is a phenomenon we don’t understand very well yet. Sure, Descartes may have thought he had it all figured out, and he was wrong; but thinking still exists even after you have rejected naive rationalism, and it’s a mistake to assume it’s the “easy part” to understand. Deliberative thinking, I would guess, is the hard part; that’s why the cognitive processes we understand best and can simulate best are the more “primitive” ones like sensory perception or motor learning.

I think it’s probably better to think of those cognitive processes that distinguish humans from animals as weird and mysterious and special, as “higher-level” abilities, rather than irrelevant and vestigial “degenerate cases”, which is how Heidegger seems to see them. Even if the “higher” cognitive functions occupy relatively little time in a typical day, they have outsize importance in making human life unique.

Two weirdly similar quotes:

“Three quick breaths triggered the responses: he fell into the floating awareness… focusing the consciousness… aortal dilation… avoiding the unfocused mechanism of consciousness… to be conscious by choice… blood enriched and swift-flooding the overload regions… one does not obtain food-safety freedom by instinct alone… animal consciousness does not extend beyond the given moment nor into the idea that its victims may become extinct… the animal destroys and does not produce… animal pleasures remain close to sensation levels and avoid the perceptual… the human requires a background grid through which to see his universe… focused consciousness by choice, this forms your grid… bodily integrity follows nerve-blood flow according to the deepest awareness of cell needs… all things/cells/beings are impermanent… strive for flow-permanence within…”

–Frank Herbert, Dune, 1965

“An animal’s consciousness functions automatically: an animal perceives what it is able to perceive and survives accordingly, no further than the perceptual level permits and no better. Man cannot survive on the perceptual level of his consciousness; his senses do not provide him with an automatic guidance, they do not give him the knowledge he needs, only the material of knowledge, which his mind has to integrate. Man is the only living species who has to perceive reality, which means: to be conscious — by choice. But he shares with other species the penalty for unconsciousness: destruction. For an animal, the question of survival is primarily physical; for man, primarily epistemological.

“Man’s unique reward, however, is that while animals survive by adjusting themselves to their background, man survives by adjusting his background to himself. If a drought strikes them, animals perish — man builds irrigation canals; if a flood strikes them, animals perish — man builds dams; if a carnivorous pack attacks them animals perish — man writes the Constitution of the United States. But one does not obtain food, safety, or freedom — by instinct.”

–Ayn Rand, For the New Intellectual, 1963

(bold emphasis added, ellipses original).

“Conscious by choice” seems to be pointing at the phenomenon of effortful attention, while “the unfocused mechanism of consciousness” is more like awareness. There seems to be some intuition here that effortful attention is related to the productive abilities of humanity, our ability to live in greater security and with greater thought for the future than animals do. We don’t usually “think on purpose”, but when we do, it matters a lot.

We should be thinking of “being conscious by choice” more as a sort of weird Bene Gesserit witchcraft than as either the default state or as an irrelevant aberration. It is neither the whole of cognition, nor is it unimportant — it is a special power, and we don’t know how it works.

New Comment
24 comments, sorted by Click to highlight new comments since: Today at 11:34 PM

Evolving brains took a long time. Learning to think, once we had a monkey's brain, was comparatively fast. If we focus on "do this algorithm's conscious thoughts at all resembles human conscious thoughts?", I expect AI progress will look like many decades of "not at all" followed by a brief blur as machines overtake humans.

I agree with you that there is a real cluster of distinctions here. I also agree with the takeaway:

We should be thinking of “being conscious by choice” more as a sort of weird Bene Gesserit witchcraft than as either the default state or as an irrelevant aberration. It is neither the whole of cognition, nor is it unimportant — it is a special power, and we don’t know how it works.

But I suspect there isn't any concise explanation of "how it works"; thinking is a big complicated machine that works for a long list of messy reasons, with the underlying meta-reason that evolution tried a bunch of stuff and kept what worked.

It also looks increasingly likely that we won't understand how thinking works until after we've built machines that think, just as we can now build machines-that-perceive much better than we understand how perception works.

It seems casually obvious that raccoons and crows engage in deliberate thought. Possibly hunting spiders and octopods do. Also obvious at this point that we have more processing power available than hunting spiders.
Everything biology does has an explanation as concise as the algorithmic complexity of the genome, which isn't obviously intractable for understanding at all of the relevant levels of description.

I could believe that. My guess is that the things brains do which are "interpretable" or "explicit" are a very small subset.

OTOH, I'm interested in the possibilities that might arise from building machines that have things similar to "attention" or "memory" or "self-other distinctions", things which mammalian brains generally have but most machine learners don't. I think there's qualitatively new stuff that we're only beginning to touch.

I still find myself confused that you don't express thinking that the more complicated architectures provide a plausible guide for what to expect these things to look like in the brain. I have come to the conclusion that all the parts of intelligence - thinking and blindsight among them - have been tried by ML people at this point, but none of them at large enough scale or integrated well enough to produce something like us. It continues to seem strange to me that your objection is "they haven't actually tried the right thing", and that you are also optimistic about attention/memory/etc as being the right sort of thing to produce it. Do you think that thinking doesn't have an obvious construction from available parts? What are the cruxes, the diff of our beliefs here?

I really wish you would give his argument for the claim that we (even plausibly) have all the pieces, Lahwran. I would also love to see an abridged transcript of a discourse wherein the two of you reached a double-crux. My best guess is that Lahwran is thinking of 'only integrating existing systems' as a triviality which can be automated by the market rather than what it actually is, a higher-level instance of the design problem.

That said, the idea that thinking has been tried seems so insane to me that I may be failing to steelman it accurately.

I was under the impression that things like "deliberative thinking" and "awareness" haven't been simulated by ML thus far, so I think that's the diff between us -- though it's not that strongly held, there are lots of ML advances I may just not have heard of.

An example of what I would mean by thinking: https://arxiv.org/pdf/1705.03633.pdf

Thanks for the paper!

At first I was very surprised that they got such good performance at answering questions about visual scenes (e.g. "what shape is the red thing?" "the red thing is a cube.")

Then I noticed that they gave ground-truth examples not just for the answers to the questions but to the programs used to compute those answers. This does not sound like the machine "learned to reason" so much as it "learned to do pattern-recognition on examples of reasoning." When humans learn, they are "trained" on examples of other people's behavior and words, but they don't get any access to the raw procedures being executed in other people's brains. This AI did get "raw downloads of thinking processes," which I'd consider "cheating" compared to what humans do. (It doesn't make it any less of an achievement by the paper authors, of course; you have to do easier things before you can do harder things.)

That seems like weaseling out of the evidence to me. This is just another instance of neural networks being able to learn to do geometric computation to produce hard-edged answers, like alphago is; that they're being used to generate programs seems not super relevant to that. I certainly agree that it's not obvious exactly how to get them to learn the space of programs efficiently, but it seems surprising to expect it to be different in kind vs previous neural network stuff. This doesn't seem that different to me vs attention models in terms of what kind of problem learning the internal behavior presents.

I think thought is irrelevant.

If we have two systems A and B.

For all inputs, A and B produce the same corresponding outputs.

A and B have the same domain and the same image.

Then it doesn't matter if the algorithms of A and the algorithms of B are different, as far as I'm concerned, A and B are functionally equivalent.

@SarahConstantin I think the intricacies of human thought are irrelevant.

I don't think we need machines to approximate human thought.

If you want machines to reach human level intelligence, then you need them to approximate human behaviour.

I don't think the fact that machines don't think the way we do is at all relevant to human level machine intelligence.

It seems like a non issue.

I think the difference in internal algorithmic process is largely irrelevant.

Having matching algorithmic processes is one way to get two systems have the same output for all input, but it's not the only way.

For any function f, it is possible to construct an infinite number of functions f'_i that are functionally equivalent to f, but possess different algorithmic processes.

f'_0(x) = f(x) + 0.

f'_i{x} = f'_{i-1}(x) + i - i. (i > 0, i in N).

The cardinality of the set of such functions is the same as the cardinality of the set of natural numbers.

What we really want is functional equivalence.

We do not need algorithmic equivalence (where algorithmic equivalence is when the algorithms match exactly).

We care about the output, and not the processing.

Algorithmic equivalence is a proper subset of functional equivalence.

If your only criticism of B is that B is not algorithmically equivalent to A even though it is functionally equivalent to A, then I would say that you're missing the point.

I think criticising algorithmic equivalence is missing the point. Unless it is not functionally equivalent, then you don't need to criticise algorithmic equivalence.

If it is functionally equivalent, then you don't need to criticise algorithmic equivalence and can directly criticise the lack of functional equivalence.

In short, there's no need to criticise algorithmic equivalence?

Unless algorithmic equivalence is pursued s the path to functional equivalence.

I was not under the impression that algorithmic equivalence was what is necessarily being pursued.

(Admittedly NN are based on gaining algorithmically similar architectures).

Computer vision is not algorithmically equivalent to human vision.

I think the same is true for speech recognition as well.

I don't think you can prove that algorithmic equivalence is necessary.

I think that's a fucking insane thing to prove.

I mean mention 3 significant ML achievements that are algorithmically equivalent to their human counterparts.

I do think algorithmic similarity matters:

The human algorithm is a very general problem-solving algorithm. If we can solve problems with an algorithm that looks very similar to the human algorithm, at a similar level of competence to the human algorithm, then this is evidence about the generalizability of our algorithm, since we should expect algorithms that are very similar to the human algorithm to also have similar generality.

You'll have to decompose what you mean by "algorithmic similarity"; if I presented you a universal learning algorithm that looked very different from the human universal learning algorithm, would you say that the two algorithms were simiar?

What makes two algorithms similar?

I think algorithm equivalence is unnecessary. We do not want algorithmic equivalence , what we actually want is functional equivalence. If the human universal learning algorithm was random, and implementing it did not produce a human engine of cognition, then you would not want to implement the human universal learning algorithm. Algorithmic equivalence is a subordinate desire to functional equivalence; it is desired only in so much as it produces functional equivalence.

The true desire is functional equivalence (which is itself subordinate to Human Level Machine Intelligence).

If we have functional equivalence we do not need to criticise algorithmic equivalence, and if we do not have functional equivalence we do not need to criticise algorithmic equivalence and algorithmic equivalence is not what we actually desire; it is a subordinate desire to functional equivalence.

If modern ML was pursuing algorithmic equivalence is a route to functional equivalence, then--and only then--woud it make sense to criticise a lack of algorithmic equivalence. However, it has not been my impression that this is the path AI is taking.

Machine natural Language Processing is not algorithmically equivalent to human nature language processing.

Computer vision is not algorithmically equivalent to human vision.

Computer speech recognition is not algorithmically equivalent to human speech recognition.

AlphaGo is not algorithmically equivalent to human Go.

I can not recall a single significant concrete machine learning achievement in a specific problem domain made by gaining algorithmic equivalence with the human engine of cognition. If you can name 3 examples, I would update in favour of algorithmic equivalence.

Neural networks are human inspired, but the actual algorithms those neural networks run are not algorithmically equivalent to the algorithms we run.

Promoted to Featured, for carefully communicating a number of novel and interesting models regarding the path to AGI (a topic that's both useful and important).

This immediately reminded me of this paper which I saw on Short Science last week.

Yep yep yep!!! I am a Tenenbaumist in my outlook, pretty much. Sometimes I forget how big an influence he's had on my thinking.

I'm not influenced by him, but his (and Gary Marcus') approach seems like common sense to me. It seems likely that Sarah's ability to track ideology to its philosophical roots can contribute to Tenenbaum's and Marcus' abilities to present their perspectives compellingly to people with more philosophical and less technical bent who have the relatively high trust in consensus which tends to accompany technical ability but which is the opposite of Thiel's effective investment strategies.

These are heuristic descriptions; these essays don’t make explicit how to test whether a model is interpretable or not. I think it probably has something to do with model size; is the model reducible to one with fewer parameters, or not?

If you use e.g. the Akaike Information Criterion for model evaluation, you get around the size problem in theory. Model size is then something you score explicitely.

Personally, I still have intuitive problems with this approach though: many phenomenological theories in physics are easier to interpret than Quantum Mechanics, and seem to be intuitively less complex, but are more complex in a formal sense (and thus get a worse AIC score, even if they predict the same thing).

How much of human thought and behavior is “irreducible” in this way, resembling the huge black-box models of contemporary machine learning? Plausibly a lot

'Irreducible' is a pretty strong stance. I agree that many things will be hard for humans to describe ein a way that other humans find satisfying. But do you think that an epistemically rational entity with unlimited computational power (something like a Solomonov Inductor) would be unable to do that?

Also, WOW I had no idea Arbital's writing was so good. (The Solomonov Inductor link.) In case anyone else didn't click the first time, it's not just a definition, it's a dialogue, probably by Eliezer, and it's super cool.

So, it's usually possible to create "adversarial examples" -- datasets that are so "perverse" that they resist accurate prediction by simple models and actually require lots of variables. (For a very simple example, if you're trying to fit a continuous curve based on a finite number of data points, you can make the problem arbitrarily hard with functions that are nowhere differentiable.) I'm not being that rigorous here, but I think the answer to the question "are there irreducibly complex statistical models?" is yes. You can make models such that any simplification has a large accuracy cost.

Are there irreducibly complex statistical models that humans or animals use in real life? That's a different and harder question, and my answer there is more like "I don't know, but I could believe so."

I think the answer to the question "are there irreducibly complex statistical models?" is yes.

I agree that there are some sources of irreducible complexity, like 'truely random' events.

To me, the field of cognition does not pattern-match to 'irrecducibly complex', but more to 'We don't have good models. Yet, growth mindset'. So, unless you have some patterns where you can prove that they are irrreducible, I will stick with my priors I guess. The example you gave me,

For a very simple example, if you're trying to fit a continuous curve based on a finite number of data points, you can make the problem arbitrarily hard with functions that are nowhere differentiable.

falls squarely in the 'our models are bad'-category, e.g. the Weierstrass function can be stated pretty compactly with analyitic formulas.

But also, of course I can't prove the non-existence of such irreducible, important processes in the brain.

and my answer there is more like "I don't know, but I could believe so."

Fair enough.

Ah, how you think about that example helps clarify. I wasn't even thinking about the possibility of an AI that could "learn" the analytic form of Weierstrass function, I was thinking about the fact that trying to fit a polynomial to it would be arbitrarily hard.

Obviously "not modelable by ANY means" is a much stronger claim than "if you use THESE means, then your model needs a lot of epicycles to be close to accurate." (Analyst's mindset vs. computer scientist's mindset; the computer scientist's typical class of "possible algorithms" is way broader. I'm more used to thinking like an analyst.)

I think you and I are pretty close to agreement at this point.

Yes, I completely agree with the weaker formulation "irreducible using only THESE means", like e.g. Polynomials, MPTs, First-Order Logic etc.

FWIW, https://www.edge.org/response-detail/26794 is the best actual argument I have seen for deep learning being a qualitative new capability, but it looks like the capability "analogy/metaphor" not the capability 'awareness' or 'deliberate thought'.

Yep, I'd agree on both counts.