A brief history
  • The breakthrough of reinforcing enabled early vertebrates to learn from their own actual actions (trial and error).
  • The breakthrough of simulating enabled early mammals to learn from their own imagined actions (vicarious trial and error).
  • The breakthrough of mentalizing enabled early primates to learn from other people’s actual actions (imitation learning).
  • But the breakthrough of speaking uniquely enabled early humans to learn from other people’s imagined actions

Evolution milestones

World Before Brains

Intelligence existed before brains where a thing. Steps to life (abio-genesis):

  • Making self-duplicating info carriers (DNA).
  • Encasing the DNA in a protective membrane.
  • Manufacturing proteins: DNA is good for info storage, but not for making structures (unlike proteins) for movement, perception, …

Then we moved from fetching energy from Hydrogen to cyanobacteria, which get it through photo-synthesis. From this comes an abundance of oxygen, from which we get the first organisms doing cellular respiration. Respiratory life, however, could survive by stealing sugar from photo-synthetic life. => hunting begins! (and thus desire for mobility, perception, …).

  • Fungi decide to wait for things to die (external digestion).
  • Animals actively seek live things. (internal digestion) -> even early eukaryotes would absorb a smaller organism and break it down, which is what digestion is. Since they were hunting more complex organisms, this forced them to develop perception and mobility.

Neurons     Are structurally equivalent across all animals! What changes is the number and the type of wiring between them.     They exchange continous information, not through the intensity of the electrical signal (which are all or nothing) but by the time differential between spikes. Furthermore, due to adaptation, the rate is not fixed, but relative to the context -> fixes the issue of having to deal with hugely different magnitudes (e.g. perceiving light, where luminosity can vary by 1.000.000x).     Communication within a neuron is electrical, among neurons is chemical.

Steering - The First Bilaterians

Animals have a similar body structure: a head with brain and most sensory organs, a bilateral body, … Bilaterality is good if you are going towards food, rather than waiting for it.

The brain evolved to steer a bilateral body

The most primitive survival technique is:

  • If positive stimuli increase, keep going
  • If it’s decreasing, turn and retry.

Steering behavior

Movement cannot happen in a decentralized organism like a polyp, since it requires a central brain to make decisions about where to steer based on all available information.

Emotions:     To survive effectively, we must take into account both the environment and our internal states (e.g. if hungry, be more risky).     We control this internal state using neuro-modulators, which manipulate certain neuron’s activation sensibility to encourage certain behaviors.     - Dopamine: positive, high arousal (search). -> does not indicate pleasure, but the expectation of pleasure. (more wanting than liking)     - Serotonin: positive, low arousal (rest). indication we are satiated.     Arousal map     
Stress:         - Temporary: use a lot of energy to escape the danger         - Chronic: means it’s un-escapable, give up and try to conserve energy.

Early Learning:     The simplest form of learning (which requires little infrastructure) is associative. (so if I have a good (unconditional) cue such as food appear often alongside another cue, then this cue will be intended as good).     Learning is needed on top of hard-coded genetic information since some cues are good or bad depending on the context -> some flexibility is needed.     Lots of heuristics to choose how to associate cues:     Associative learning heuristics     
Neurally, learning is all about strengthening / weakening neural connections.

Learning was initially not the primary feature of the brain, it was steering!

Reinforcing - The First Vertebrates

The brain template across vertebrates is remarkably consistent. The basic template for learning is through trial and error / law of effect: satisfying behaviors are likely to be repeated (and viceversa for discomforting effects).

Reinforcement Learning is tricky! Since to reinforce a behavior, you need a short time window between behavior and reward, but so how can you learn more complex tasks? This is the Temporal Credit Assignment Problem We can solve it with Time Differential Learning: where we have a critic and actor system, that are co-trained and can bootstrap each other to improve.

With Time Differential learning, we are rewarding the expectation of winning rather than winning itself.

Dopamine does exactly this! Dopamine is not a signal for reward, but rather for reinforcement. => to actually learn, we have to reinforce based on changes in predicted future rewards (both how probable they are and when they are going to happen), rather than actual rewards. Since by having a critic, we can constantly evaluate how well we are doing at any point in time, rather than only at the end, where it’s harder to understand which actions led to our success / defeat.

Dopamine encodes probability and distance of future reward, not reward itself

In particular, the time sensibility allows us to learn not only from the presence of an unexpected reward, but also from the omission of an expected one!

Critic <=> Dopamine + Hypothalamus: tells us the expectation of a future rewards Actor <=> Basal Ganglia: repeats actions that cause dopamine spikes

Pattern Recognition: -> requires combining info from different sensory neurons, not just a single one like early bilaterians which could only detect if there was light or not.     It’s hard, since you have to balance:     - Discrimination: distinguish overlapping sensory patterns as distinct     - Generalization: distinguish similar patterns as the same     Brains solve this by first mapping the sensory organs to a higher dimension neuron space AA (which helps decorrelate similar input, thus solving discrimination) and then generalization by having the AA neurons talk with each other, and by Hebb’s law, associate with each other.     In the brain, pattern recognition systems are thus all overlapping! So there isn’t a specific section that only recognizes this specific pattern etc… but this means that, unlike simpler creatures where a given neuron si only for a specific task, we are at the risk of catastrophic forgetting.     Associated to the generalization problem is the Invariant Problem: how to recognize patterns across invariances. => To do this, we have a hierarchical representation of the world, and the more deep we go, the more invariances are removed.

Curiosity:     Exploitation / Exploration dilemma: balancing doing behaviors that worked vs trying out new behaviors. -> curiosity is super important! helps you get out of local minima and learn tasks for which it’s hard to formulate short-term rewards.     The brain of vertebrates is thus wired to also release dopamine when something unexpected happens, even it it’s not rewarding per se.

Modelling the World:     We have the ability to internally model space (most invertebrates can’t! e.g. if you pick up an ant an replace it, it won’t be able to go back: it doesn’t learn a model of the world, but rather a fixed set of rules to navigate it).

Vertebrate brain systems

Simulating - The First Mammals

The predatory pressure in the ocean is immense, so organisms start going on land, and some of then also develop warm-bloodedness (since in the water, temperature is constant, while on land it shifts a lot more).

Land transition

Mammals ended up living with dinosaurs, and had the advantage of deciding when and how to strike => they evolved simulating, aka learning by imagining. It was evolved since mammals where the first to generate heat and thus energy, meaning that they were the first organisms to have the computational resources to simulate events!

Neo-Cortex:     The new development in the brain which allowed for more compute.     Made up of millions of identical columns => stuff like visual vs auditory cortex don’t differ in terms of their structure / general functioning, but only inputs and outputs! Neocortex columns

A lot of optical illusions come from the fact that we don’t perceive what we experience, but rather what we infer / simulate of the world! E.g. our brain filling in missing components is because we simulate the reality. The Neo-Cortex is nothing but a generative model! => much of human perception is just inference, a.k.a. using a generative model to have an inner simulation that matches external stimuli. This is why dreaming, imagination etc. make sense: we have an internal sim that mirrors the world and which we can freely play around with it to explore possible options without having to carry them out. Imagination <=> Perception: both types of simulation, more or less grounded on external stimuli. e.g. same neurons are triggered when seeing a house vs imagining a house!!

The Neo-Cortex is a constantly on, world-prediction machine
But why develop this ability? Because we can predict things! And so we are always running a sim some time in the future which we then compare with what actually happens, and then we can adjust this.

We, humans, give way too much importance to language and symbols as the substrate of intelligence. But primates, dogs, cats, crows, parrots, octopus, and many other animals don’t have humans-like languages, yet exhibit intelligent behavior beyond that of our best AI systems. What they do have is an ability to learn powerful “world models” that allow them to predict the consequences of their actions and to search for and plan actions to achieve a goal. The ability to learning such world models is what’s missing from AI systems today.

So the main function of the Neo-Cortex is not recognition, but simulation!

So mammals, unlike the others, can learn not only through trial and error, but also through vicarious trial and error (aka simulating). e.g. if you give a fish a salty food and a normal one, train him while not salt deprived, then it will prefer the normal food even when he’s salt deprived, since the trial and error did not account for different states. A rat, instead, if salt deprived, will go towards the salt snack, even if the salty snack was in previous trials the negative one. => this is because the rat can simulate! So mammals can learn from counter-factuals (hypothetical realities). e.g. a mammal that loses at rock paper scissor is more likely to use in the next turn the move that would’ve won the last round (since we can imagine winning), while a fish would not use the previous move (negatively reinforced) but would choose either of the other 2.

Counterfactual learning

Causation is mostly constructed by our brains to enable us to learn vicariously from alternative past choices.

Even remembering is a type of simulation! (which is why memories are inaccurate).     The Neo-Cortex is split into 2:

Model-Based Reinforcement Learning:     Model-based reinforcement learning     While model-free is easier to implement, model-based is how mammals actually operate (it’s generally more effective provided the model is accurate).     Stuff needed for good RL:     - A discrete search space     - Low-noise environment, where it’s easy to discern the consequences of one’s actions.     - Informative intermediate rewards.     
The mammal brain is super flexible, since it can not only flexibly decide strategies (go left, go right) but also meta-strategies: stop and think, act instinctually, …     But how does it do it (aka what is the search strategy)? It uses the frontal cortex to model the self, and uses that to quantify intent, and thus which met-strategy to use. 3 Steps:     - Trigger the Internal sim and generate a set of possible actions. If there are many options equally valid, it means we have to stop and think.     - Simulate the actions chosen     - Choose the best one     The emergent result is vicarious training!     Habits emerge when this cycle is done so many times that there is no uncertainty, and we thus skip the sim step to save energy.     Intent is not real it’s a computational trick for predicting one’s behavior. (Free will an illusion? Since we predict our behavior, but we only become conscious of what our behavior is afterwards).

Imagination is an unconstrained sim, Attention is the same sim that has the constraint of having to be consistent with sensory data.

Motor Cortex in Primates     The modern motor cortex was not the locus of command in early mammals, so why do primates need it to move, and why did it evolve? A hypothesis is that the motor cortex tries to explain and predict movement, and then these decisions are sent to the spinal cord to move. So it’s the locus of motor planning! Not execution.     This enables us to carry out planned, precise movements.

Motor planning

Mentalizing - The First Primates

Social grouping was the result of evolutionary pressure: more resistant against predators, but more competition for resources / reproduction. But primate societies are incredible, since they can mentalize other individuals!

The next step after a sim of the self is a sim of other individuals!

Chimps recognize intent and knowledge of others => theory of mind, they have a working model of other people’s inner selves. They are able to model not only individuals, but also inter-relationships in their group. Primate societies are hierarchical => being at the top improves fitness, so there is a strong evolutionary incentive. Deeply political, with dynasties, allyships, … But why did this develop in the first place? A frugivore lifestyle gave primates both a lot of calories (bigger brain) and time. Being good at politics thus became a valid evolutionary strategy! This new brain could not only simulate, but also simulate the Self => the concept of self (and thus how the self relates to the environment) emerged! Meta-Cognition.

Suppose you put our ancestral primate in a maze. When it reached a choice point, it turned left. Suppose you could ask its different brain areas why the animal turned left. You would get very different answers at each level of abstraction. Reflexes would say, “Because I have an evolutionarily hard-coded rule to turn toward the smell coming from the left.” Vertebrate structures would say, “Because going left maximizes predicted future reward.” Mammalian structures would say, “Because left leads to food.” But primate structures would say, “Because I’m hungry, eating feels good when I am hungry, and to the best of my knowledge, going left leads to food.” In other words, the gPFC constructs explanations of the simulation itself, of what the animal wants and knows and thinks.

Mentalizing hierarchy

Primates, unlike other mammals, can reason about both their knowledge and that of others (e.g. they understand the concept of lying, which stems from asymmetric information, which requires to have a working model of knowledge for you and the other). So theory of mind does not derive from the need to survive from predators / hunt, but rather from the societal aspect ! But why is this useful? Because this supercharges vicarious learning! If we can learn not only from simulating our own actions, but also from seeing other people perform them, we can simulate ourselves doing them and improve. But this can only happen if we have a working model of others. Which is why seeing other do actions triggers the same neurons as us doing the action.

Imitation Learning:     Learning is a lot more about ingenuity and less about transmissibility: most of what we learn is by seeing others do it: great evolutionary advantage, a single individual can figure out a skill and everybody else can just copy. It’s easier to learn if somebody wants to actively teach you (rather than just you observing) and to be aware that somebody needs to be taught you need theory of mind! Since we need to both understand that another organism has knowledge that differs from ours and that it has the intent to learn the skill we have.

Anticipating Future Needs:     Ecological Brain Hypothesis: Another explanation for our evolved brain: a frugivore diet requires you to plan in advance for when and where fruits are going to be ripe => forces you to plan for when you’re going to be hungry, rather than reacting to hunger when it arises.     But how does the primate brain model the idea of wanting in the future? We recycle theory of mind! Simulating our wants in the future is similar to simulating another person’s wants in the present. Mammals can’t do that since while they can simulate the environment in the future, they can’t simulate future self states, since they don’t have a model of the mind.

Future simulation

Speaking - The First Humans

The difference in mind between man and the higher animals, great as it is, is certainly one of degree and not of kind.

No substantial structural differences between chimp and human brains (mostly a difference in scale). The big difference is how we communicate:

  • Humans are the only ones with symbols (ie declarative labels), while for all other animals their gestures / communication patterns are genetically hardwired.
  • We have grammars -> symbols can be composed to convey more complex meaning => exponential expressiveness wrt the number of symbols.
Language allows to effectively share our inner model with others, and thus learn not only from other’s actions, but also from other’s simulations!!

This is infinitely more efficient than learning by just seeing others do stuff => ability to learn common symbols, so possibly infinitely large pool of collaborators. Furthermore, easy to transfer and accumulate knowledge across time! => evolution not only in the genetic sense.

Language and culture

The real reason why humans are unique is that we accumulate our shared simulations (ideas, knowledge, concepts, thoughts) across generations. We are the hive-brain apes. We synchronize our inner simulations, turning human cultures into a kind of meta-life-form whose consciousness is instantiated within the persistent ideas and thoughts flowing through millions of human brains over generations. The bedrock of this hive brain is our language.

Neurologically, language is not an ability that comes for free with a scaled up neo-cortex, but rather from the specific re-purposing of these areas of the brain. Note that primates could also communicate, but not voluntarily, as they had hard-coded mappings between emotional states and sounds. We also have this system (e.g. smiling when told a joke) but we can also produce these things out of context, using the language areas! => control over communication.

A skill as complex as language cannot be directly hard-coded into the genes => we encode a learning system (neo-cortex) and a learning curriculum (instinct to ask questions and combine words we hear, instinct to teach language).

Language is an altruistic trait, that only makes sense to evolve in a society rather than in individuals (so subject to group selection). Altruism in nature is of 2 kinds:

  • Kin Selection: helping our offspring survive
  • Reciprocal Altruism: helping others if we get something back In general, altruistic genes cannot survive unless there are mechanisms in place for punishing freeloaders. -> gossip might have evolved for this purpose!

As brains expanded, humans became better hunters and cooks, which provided more calories and thereby expanded the frontier of how big brains could get. And as brains got bigger, births became earlier, which created even more opportunity for language learning, which put even more pressure on altruistic cooperation to support child-rearing, which again expanded the frontier of how big brains could get as it became possible to evolve longer time periods of childhood brain development.

Large Language Models:     LLMs learn language purely by analyzing statistical patterns: humans instead learn it mainly by matching labels given to objects that are already present in one’s inner sim. But LLMs have no inner sim. Note that we also do some amount of inference based on previous words (thinking fast), but we can also leverage our inner sim to reason about complex stuff (thinking slow). (see Thinking Fast and Slow).     Language is the window to intelligence, but LLMs only have language.     Also LLMs cannot to mentalizing: conversations are built on modelling other people’s minds and language is the way through which we gain more info.

the weak reasoning abilities of LLMs are partially compensated by their large associative memory capacity. They are a bit like students who have learned the material by rote but haven’t really built deep mental models of the underlying reality.