Skip to content

Latest commit

 

History

History
250 lines (131 loc) · 60.6 KB

chapter-03.md

File metadata and controls

250 lines (131 loc) · 60.6 KB
bibfile
ccnlab.bib

Networks {#sec:ch-networks}

In this chapter, we build upon the previous Neuron Chapter to understand how networks of detectors can produce emergent behavior that is more than the sum of their simple neural constituents. We focus on the networks of the neocortex ("new cortex", often just referred to as "cortex"), which is the evolutionarily most recent, outer portion of the brain where most of advanced cognitive functions take place. There are three major categories of emergent network phenomena:

  • Categorization of diverse patterns of activity into relevant groups: For example, faces can look very different from one another in terms of their raw "pixel" inputs, but we can categorize these diverse inputs in many different ways, to treat some patterns as more similar than others: male vs. female, young vs. old, happy vs. sad, "my mother" vs. "someone else", etc. Forming these categories is essential for enabling us to make the appropriate behavioral and cognitive responses (approach vs. avoid, borrow money from, etc.). Imagine trying to relate all the raw inputs of a visual image of a face to appropriate behavioral responses, without the benefit of such categories. The relationship ("mapping") between pixels and responses is just too complex. These intermediate, abstract categories organize and simplify cognition, just like file folders organize and simplify documents on your computer. One can argue that much of intelligence amounts to developing and using these abstract categories in the right ways. Biologically, we'll see how successive layers of neural detectors, organized into a hierarchy, enable this kind of increasingly abstract categorization of the world. We will also see that many individual neural detectors at each stage of processing can work together to capture the subtlety and complexity necessary to encode complex conceptual categories, in the form of a distributed representation. These distributed representations are also critical for enabling multiple different ways of categorizing an input to be active at the same time --- e.g., a given face can be simultaneously recognized as female, old, and happy. A great deal of the emergent intelligence of the human brain arises from multiple successive levels of cascading distributed representations, constituting the collective actions of billions of excitatory pyramidal neurons working together in the cortex.

  • Bidirectional excitatory dynamics are produced by the pervasive bidirectional (e.g., bottom-up and top-down or feedforward and feedback) connectivity in the neocortex. The ability of information to flow in all directions throughout the brain is critical for understanding phenomena like our ability to focus on the task at hand and not get distracted by irrelevant incoming stimuli (did my email inbox just beep??), and our ability to resolve ambiguity in inputs by bringing higher-level knowledge to bear on lower-level processing stages. For example, if you are trying to search for a companion in a big crowd of people (e.g., at a sporting event or shopping mall), you can maintain an image of what you are looking for (e.g., a red jacket), which helps to boost the relevant processing in lower-level stages. The overall effects of bidirectional connectivity can be summarized in terms of an attractor dynamic or multiple constraint satisfaction, where the network can start off in a variety of different states of activity, and end up getting "sucked into" a common attractor state, representing a cleaned-up, stable interpretation of a noisy or ambiguous input pattern. Probably the best subjective experience of this attractor dynamic is when viewing an Autostereogram (wikipedia link) --- you just stare at this random-looking pattern with your eyes crossed, until slowly your brain starts to fall into the 3D attractor, and the image slowly emerges. The underlying image contains many individual matches of the random patterns between the two eyes at different lateral offsets --- these are the constraints in the multiple constraint satisfaction problem that eventually work together to cause the 3D image to appear --- this 3D image is the one that best satisfies all those constraints.

  • Inhibitory competition, mediated by specialized inhibitory interneurons is important for providing dynamic regulation of overall network activity, which is especially important when there are positive feedback loops between neurons as in the case of bidirectional connectivity. The existence of epilepsy in the human neocortex indicates that achieving the right balance between inhibition and excitation is difficult --- the brain obtains so many benefits from this bidirectional excitation that it apparently lives right on the edge of controlling it with inhibition. Inhibition gives rise to sparse distributed representations (having a relatively small percentage of neurons active at a time, e.g., 15% or so), which have numerous advantages over distributed representations that have many neurons active at a time. In addition, we'll see in the Learning Chapter that inhibition plays a key role in the learning process, analogous to the Darwinian "survival of the fittest" dynamic, as a result of the competitive dynamic produced by inhibition.

We begin with a brief overview of the biology of neural networks in the neocortex.

Biology of the Neocortex

Neural constituents of the neocortex. (A) shows excitatory pyramidal neurons, which constitute roughly 85% of neurons, and convey the bulk of the information content via longer-range axonal projections (some of which can go all the way across the brain). (B) shows inhibitory interneurons, which have much more local patterns of connectivity, and represent the remaining 15% of neurons. Reproduced from Crick & Asanuma (1986).{#fig:fig-cortex-bio-neurtypes width=60% }

The cerebral cortex or neocortex is composed of roughly 85% excitatory neurons (mainly pyramidal neurons, but also stellate cells in layer 4), and 15% inhibitory interneurons ([@fig:fig-cortex-bio-neurtypes]). We focus primarily on the excitatory pyramidal neurons, which perform the bulk of the information processing in the cortex. Unlike the local inhibitory interneurons, they engage in long-range connections between different cortical areas, and it is clear that learning takes place in the synapses between these excitatory neurons (evidence is more mixed for the inhibitory neurons). The inhibitory neurons can be understood as "cooling off" the excitatory heat generated by the pyramidal neurons, much like the cooling system (radiator and coolant) in a car engine. Without these inhibitory interneurons, the system would overheat with excitation and lock up in epileptic seizures (this is easily seen by blocking inhibitory GABA channels, for example). There are, however, areas outside of the cortex (e.g., the basal ganglia and cerebellum) where important information processing does take place via inhibitory neurons, and certainly some researchers will object to this stark division of labor even within cortex, but it is nevertheless a very useful simplification.

Layered Structure

A slice of the visual cortex of a cat, showing the six major cortical layers (I - VI), with sublayers of layer IV that are only present in visual cortex. The first layer (I) is primarily axons ("white matter"). Reproduced from Sejnowski and Churchland (1989).{#fig:fig-cortex-bio-layers width=40% }

The neocortex has a characteristic 6-layer structure ([@fig:fig-cortex-bio-layers]), which is present throughout all areas of cortex ([@fig:fig-cortex-bio-arealayers]). However, the different cortical areas, which have different functions, have different thicknesses of each of the 6 layers, which provides an important clue to the function of these layers, as summarized in [@fig:fig-cortical-layers-in-hid-out]. The anatomical patterns of connectivity in the cortex are also an important source of information giving rise to the following functional picture:

The thickness of the different cortical layers varies depending on the location in cortex --- this is an important clue to the function of these layers (and the cortical areas). A) shows primary visual cortex (same as [@fig:fig-cortex-bio-layers]) which emphasizes input layer 4. B) shows extrastriate cortex which processes visual information, and emphasizes superficial layers 2/3. C) shows primary motor cortex, which emphasizes deep layers 5/6. D) shows prefrontal cortex ("executive function") which has an even blend of all layers. Reproduced from Shepherd (1990).{#fig:fig-cortex-bio-arealayers width=60% }

  • Input areas of the cortex (e.g., primary visual cortex) receive sensory input (typically via the thalamus), and these areas have a greatly enlarged layer 4, which is where the axons from the thalamus primarily terminate. The input layer contains a specialized type of excitatory neuron called the stellate cell, which has a dense bushy dendrite that is relatively localized, and seems particularly good at collecting the local axonal input to this layer.

  • Hidden areas of the cortex are so-called because they don't directly receive sensory input, nor do they directly drive motor output --- they are "hidden" somewhere in between. The bulk of the cortex is "hidden" by this definition, and this makes sense if we think of these areas as creating increasingly sophisticated and abstract categories from the sensory inputs, and helping to select appropriate behavioral responses based on these high-level categories. This is what most of the cortex does, in one way or another. These areas have thicker superficial layers 2/3, which contain many pyramidal neurons that are well positioned for performing this critical categorization function.

  • Output areas of cortex have neurons that synapse directly onto muscle control areas ("motor outputs"), and are capable of causing physical movement when directly stimulated electrically. These areas have much thicker deep layers 5/6, which send axonal projections back down into many different subcortical areas.

Function of the cortical layers: layer 4 processes input information (e.g., from sensory inputs) and drives superficial layers 2/3, which provide a "hidden" internal re-processing of the inputs (extracting behaviorally-relevant categories), which then drive deep layers 5/6 to output a motor response. Green triangles indicate excitation, and red circles indicate inhibition via inhibitory interneurons. Solid lines indicate connections that are included in our models. Dotted lines indicate less strong connections that are not included in our models. BG = basal ganglia which is important for driving motor outputs, and Subcortex includes a large number of other subcortical areas.{#fig:fig-cortical-layers-in-hid-out width=40% }

In summary, the layer-wise (laminar) structure of the cortex and the area-wise function of different cortical areas converge to paint a clear picture about what the cortex does: it takes in sensory inputs, processes them in many different important ways to extract behaviorally relevant categories, which can then drive appropriate motor responses. We will adopt this same basic structure for most of the models we explore.

Patterns of Connectivity

Typical patterns of connectivity between cortical areas (Feedforward and Feedback) and within cortical areas (Lateral). Information flows from the Hidden layers "up" in a feedforward direction into the Input layers of "higher" areas (from which it flows into the Hidden layer of that area), and flows back down in a feedback direction from Hidden and Output (output typically stronger as indicated) back to Hidden and Output layers in "lower" areas. All areas have lateral connections that similarly originate in the Hidden and Output layers, and connect to all three layers within another part of the same cortical area. Based on Figure 3 from Felleman and Van Essen (1991).{#fig:fig-cortical-cons-ff-fb-lat width=45% }

The dominant patterns of longer-range connectivity between cortical areas, and lateral connections within cortical areas are shown in [@fig:fig-cortical-cons-ff-fb-lat]. Consistent with the Input-Hidden-Output laminar structure described above, the feedforward flow of information "up" the cortical hierarchy of areas (i.e., moving further away from sensory inputs) goes from Input to Hidden in one area, and then to Input to Hidden in the next area, and so on. This flow of information from sensory inputs deeper into the higher levels of the brain is what supports the formation of increasingly abstract hierarchies of categories that we discuss in greater detail in the next section.

Information flowing in the reverse direction (feedback) goes from Hidden & Output in one area to Hidden & Output in the previous area, and so on. We will see later in this chapter how this backward flow of information can support top-down cognitive control over behavior, direct attention, and help resolve ambiguities in the sensory inputs (which are ubiquitous). One might have expected this pattern to go Hidden to Output in one area, to Hidden to Output in the previous area, but this pattern is only part of the story. In addition, the Hidden layers can communicate directly to each other across areas. Furthermore, Output areas can also directly communicate with each other. We can simplify this pattern by assuming that the Output layers in many cortical areas serve more as extra copies of the Hidden layer patterns, which help make additional connections (especially to subcortical areas --- all cortical areas project to multiple subcortical areas). Thus, the essential computational functions are taking place directly in the Hidden to Hidden connections between areas (mediated by intervening Input layers for the feedforward direction), and Output layers provide an "external interface" to communicate these Hidden representations more broadly. The exception to this general idea would be in the motor output areas of cortex, where the Output layers may be doing something more independent (they are at least considerably larger in these areas).

Each cortical area also has extensive lateral connectivity among neurons within the same area, and this follows the same general pattern as the feedback projection, except that it also terminates in layer 4. These lateral connections serve a functional role very similar to that of the feedback projections --- essentially they represent "self feedback".

Connectivity matrix between cortical areas, showing that when a given area sends a feedforward projection to another area, it typically also receives a feedback projection from that same area. Thus, cortical connectivity is predominantly bidirectional. Reproduced from Sporns & Zwi (2004).{#fig:fig-cortex-bidir-cons-map width=40% }

The other significant aspect of cortical connectivity that will become quite important for our models, is that the connectivity is largely bidirectional ([@fig:fig-cortex-bidir-cons-map]). Thus, an area that sends a feedforward projection to another area also typically receives a reciprocal feedback projection from that same area. This bidirectional connectivity is important for enabling the network to converge into a coherent overall state of activity across layers, and is also important for driving error-driven learning as we'll see in the Learning Chapter.

Next, let's see how feedforward excitatory connections among areas can support intelligent behavior by developing categorical representations of inputs.

Categorization and Distributed Representations

Schematic of a hierarchical sequence of categorical representations processing a face input stimulus. Representations are distributed at each level (multiple neural detectors active). At the lowest level, there are elementary feature detectors (oriented edges). Next, these are combined into junctions of lines, followed by more complex visual features. Individual faces are recognized at the next level (even here multiple face units are active in graded proportion to how similar people look). Finally, at the highest level are important functional "semantic" categories that serve as a good basis for actions that one might take --- being able to develop such high level categories is critical for intelligent behavior.{#fig:fig-category_hierarch_dist_reps-3 width=100% }

As explained in the introduction to this chapter, the process of forming categorical representations of inputs coming into a network enables the system to behave in a much more powerful and "intelligent" fashion [@fig:fig-category_hierarch_dist_reps-3]. Philosophically, it is an interesting question as to where our mental categories come from --- is there something objectively real underlying our mental categories, or are they merely illusions we impose upon reality? Does the notion of a "chair" really exist in the real world, or is it just something that our brains construct for us to enable us to get by (and rest our weary legs)? This issue has been contemplated since the dawn of philosophy, e.g., by Plato with his notion that we live in a cave perceiving only shadows on the wall of the true reality beyond the cave. It seems plausible that there is something "objective" about chairs that enables us to categorize them as such (i.e., they are not purely a collective hallucination), but providing a rigorous, exact definition thereof seems to be a remarkably challenging endeavor (try it! don't forget the cardboard box, or the lump of snow, or the miniature chair in a dollhouse, or the one in the museum that nobody ever sat on). It doesn't seem like most of our concepts are likely to be true "natural kinds" that have a very precise basis in nature. Things like Newton's laws of physics, which would seem to have a strong objective basis, are probably dwarfed by everyday things like chairs that are not nearly so well defined (and "naive" understanding of physics is often not actually correct in many cases either).

The messy ontological status of conceptual categories doesn't bother us very much. As we saw in the previous chapter, neurons are very capable detectors that can integrate many thousands of different input signals, and can thereby deal with complex and amorphous categories. Furthermore, we will see that learning can shape these category representations to pick up on things that are behaviorally relevant, without requiring any formality or rigor in defining what these things might be. In short, our mental categories develop because they are useful to us in some way or another, and the outside world produces enough reliable signals for our detectors to pick up on these things. Importantly, a major driver for learning these categories is social and linguistic interaction, which enables very complex and obscure things to be learned and shared --- the strangest things can be learned through social interactions (e.g., you now know that the considerable extra space in a bag of chips is called the "snackmosphere", courtesy of Rich Hall). Thus, our cultural milieu plays a critical role in shaping our mental representations, and is clearly a major force in what enables us to be as intelligent as we are (we do occasionally pick up some useful ideas along with things like "snackmosphere"). If you want to dive deeper into the philosophical issues of truth and relativism that arise from this lax perspective on mental categories, see the Chapter Appendix Philosophy of Categories.

How synaptic weights act to project input patterns along specific dimensions or bases, in this case projecting the inputs along the dimensions of Emotion and Gender.  In the left panel, the very high-dimensional face inputs (256 dimensions for a 16x16 image) are projected along two random weight vectors, allowing us to visualize this high-dimensional input space in a 2D plot.  In the right panel, the specific synaptic weights trained for discriminating along the emotion vs. gender dimensions have transformed or rotated the input space into a much more systematic and well-organized, low-dimensional space.  This is fundamentally what neurons do: organize and transform input patterns along relevant dimensions, and that is another way of stating that neurons detect stimuli along these dimensions. {#fig:fig-face-categ-dim-prjn width=100% }

[@fig:fig-face-categ-dim-prjn] provides a complementary view of the neuron and its weights, as projecting input patterns along a specific dimension in a high-dimensional space. Mathematically, the synaptic weights are a vector that multiplies the high-dimensional input vector of neural activity signals using a dot product, which is just multiplying weights times activations and adding up the total --- that is also known as the projection of the input space onto the weight vector dimension. This projection operation organizes and systematizes the inputs along dimensions of behavioral importance (e.g., emotion and gender in the case shown in the figure, which is used in the exploration below).

In linear algebra terms, the neural weights rotate the input space along a new basis set, where a basis set is a collection of different axes (like the X and Y axes) or dimensions that provides a different way of encoding the inputs. Furthermore, in these terms, learning is the process of finding a good such basis set for encoding the inputs, and current deep neural networks used in AI are primarily doing exactly that, over many successive layers that each applies a different such "rotation", such that at the "top" of such a network, a few very informative such dimensions have been extracted (e.g., the object category extracted from a set of input images).

The detector way of looking at the neuron is useful for understanding the roles of inhibition and the neural firing threshold as we saw in the previous chapter --- it specifically differentiates between active firing for detected items, vs. not firing for everything else, and provides a more "discrete" view of what the neuron is doing. By contrast, the dimension projection framework provides a more continuous, mathematical view. Both are useful ways of understanding what is going on in the brain.

One intuitive way of understanding the importance of having the right categories (and choosing them appropriately for the given situation) comes from insight problems. These problems are often designed so that our normal default way of categorizing the situation leads us in the wrong direction, and it is necessary to re-represent the problem in a new way ("thinking outside the box"), to solve it. For example, consider this "conundrum": "two men are dead in a cabin in the woods. what happened?" --- you then proceed to ask a bunch of true/false questions and eventually realize that you need to select a different way of categorizing the word "cabin" in order to solve the puzzle. Here is a list of some of these kinds of conundrums (external link).

For computer programmers, one of the most important lessons one learns is that choosing the correct representation is the most important step in solving a given problem. As a simple example, using the notion of a "heap" enables a particularly elegant solution to the sorting problem. Binary trees are also a widely used form of representation that often greatly reduce the computational time of various problems. In general, you simply want to find a representation that makes it easy to do the things you need to do. This is exactly what the brain does.

One prevalent example of the brain's propensity to develop categorical encodings of things are stereotypes. A stereotype is really just a mental category applied to a group of people. The fact that everyone seems to have them is strong evidence that this is fundamentally how the brain works. We cannot help but think in terms of abstract categories like this, and as we've argued above, categories in general are essential for allowing us to deal with the world in an intelligent manner. But the obvious problems with stereotypical thinking indicate that these categories can also be problematic (for stereotypes specifically and categorical thinking more generally), and limit our ability to accurately represent the details of any given individual or situation. As we discuss next, having many different categorical representations active at the same time can potentially help mitigate these problems. The ability to entertain multiple such potential categories at the same time may be an individual difference variable associated with things like political and religious beliefs [@CritcherHuberHoEtAl09; @NamJostBavel13]. This stuff can get interesting!

Distributed Representations

Graded response as a function of similarity. This is one aspect of distributed representations, shown here in a neuron in the visual cortex of a monkey --- this neuron responds in a graded fashion to different input stimuli, in proportion to how similar they are to the thing that it responds most actively to (as far as is known from presenting a wide sample of different input images). With such graded responses ubiquitous in cortex, it follows that any given input will activate many different neuron detectors. Reproduced from Tanaka (1996).{#fig:fig-dist-rep-vis-bio width=40% }

Distributed representations of different shapes mapped across regions of inferotemporal (IT) cortex in the monkey. Each shape activates a large number of different neurons distributed across the IT cortex, and these neurons overlap partially in some places. Reproduced from Tanaka (2003).{#fig:fig-tanaka03-topo-maps width=40% }

Schematic diagram of topographically organized shape representations in monkey IT cortex, from Tanaka (2003) --- each small area of IT responds optimally to a different stimulus shape, and neighboring areas tend to have similar but not identical representations.{#fig:fig-tanaka03-topo width=40% }

In addition to our mental categories being somewhat amorphous, they are also highly polymorphous: any given input can be categorized in many different ways at the same time --- there is no such thing as the appropriate level of categorization for any given thing. A chair can also be furniture, art, trash, firewood, doorstopper, plastic and any number of other such things. Both the amorphous and polymorphous nature of categories are nicely accommodated by the notion of a distributed representation. Distributed representations are made up of many individual neurons-as-detectors, each of which is detecting something different. The aggregate pattern of output activity ("detection alarms") across this population of detectors can capture the amorphousness of a mental category, because it isn't just one single discrete factor that goes into it. There are many factors, each of which plays a role. Chairs have seating surfaces, and sometimes have a backrest, and typically have a chair-like shape, but their shapes can also be highly variable and strange. They are often made of wood or plastic or metal, but can also be made of cardboard or even glass. All of these different factors can be captured by the whole population of neurons firing away to encode these and many other features (e.g., including surrounding context, history of actions and activities involving the object in question).

The same goes for the polymorphous nature of categories. One set of neurons may be detecting chair-like aspects of a chair, while others are activating based on all the different things that it might represent (material, broader categories, appearance, style etc). All of these different possible meanings of the chair input can be active simultaneously, which is well captured by a distributed representation with neurons detecting all these different categories at the same time.

Some real-world data on distributed representations is shown in Figures 3.9 and 3.10. These show that individual neurons respond in a graded fashion as a function of similarity to inputs relative to the optimal thing that activates them (we saw this same property in the detector exploration from the Neuron Chapter, when we lowered the leak level so that it would respond to multiple inputs). [@fig:fig-tanaka03-topo] shows an overall summary map of the topology of shape representations in monkey inferotemporal (IT) cortex, where each area has a given optimal stimulus that activates it, while neighboring areas have similar but distinct optimal stimuli. Thus, any given shape input will be encoded as a distributed pattern across all of these areas to the extent that it has features that are sufficiently similar to activate the different detectors.

Maps of neural activity in the human brain in response to different visual input stimuli (as shown --- faces, houses, chairs, shoes), recorded using functional magnetic resonance imaging (fMRI). There is a high level of overlap in neural activity across these different stimuli, in addition to some level of specialization. This is the hallmark of a distributed representation. Reproduced from Haxby et al. (2001).{#fig:fig-haxbyetal01-obj-maps width=50% }

Another demonstration of distributed representations comes from a landmark study by [@HaxbyGobbiniFureyEtAl01], using functional magnetic resonance imaging (fMRI) of the human brain, while viewing different visual stimuli ([@fig:fig-haxbyetal01-obj-maps]). They showed that contrary to prior claims that the visual system was organized in a strictly modular fashion, with completely distinct areas for faces vs. other visual categories, for example, there is in fact a high level of overlap in activation over a wide region of the visual system for these different visual inputs. They showed that you can distinguish which object is being viewed by the person in the fMRI machine based on these distributed activity patterns, at a high level of accuracy. Critically, this accuracy level does not go down appreciably when you exclude the area that exhibits the maximal response for that object. Prior "modularist" studies had only reported the existence of these maximally responding areas. But as we know from the monkey data, neurons will respond in a graded way even if the stimulus is not a perfect fit to their maximally activating input, and Haxby et al. showed that these graded responses convey a lot of information about the nature of the input stimulus.

Coarse Coding

Coarse coding, which is an instance of a distributed representation with neurons that respond in a graded fashion. This example is based on the coding of color in the eye, which uses only 3 different photoreceptors tuned to different frequencies of light (red, green blue) to cover the entire visible spectrum. This is a very efficient representation compared to having many more receptors tuned more narrowly and discretely to different frequencies along the spectrum.{#fig:fig-coarse-coding width=40% }

[@fig:fig-coarse-coding] illustrates an important specific case of a distributed representation known as coarse coding. This is not actually different from what we've described above, but the particular example of how the eye uses only 3 photoreceptors to capture the entire visible spectrum of light is a particularly good example of the power of distributed representations. Each individual frequency of light is uniquely encoded in terms of the relative balance of graded activity across the different detectors. For example, a color between red and green (e.g., a particular shade of yellow) is encoded as partial activity of the red and green units, with the relative strength of red vs. green determining how much it looks more orange vs. chartreuse. In summary, coarse coding is very important for efficiently encoding information using relatively few neurons.

Localist Representations

The opposite of a distributed representation is a localist representation, where a single neuron is active to encode a given category of information. Although we do not think that localist representations are characteristic of the actual brain, they are nevertheless quite convenient to use for computational models, especially for input and output patterns to present to a network. It is often quite difficult to construct a suitable distributed pattern of activity to realistically capture the similarities between different inputs, so we often resort to a localist input pattern with a single input neuron active for each different type of input, and just let the network develop its own distributed representations from there.

The famous case of a Halle Berry neuron recorded from a person with epilepsy who had electrodes implanted in their brain. The neuron appears sensitive to many different presentations of Halle Berry (including just seeing her name in text), but not to otherwise potentially similar people. Although this would seem to suggest the presence of localist "grandmother cells", in fact there are many other distributed neurons activated by any given input such as this within the same area, and even this neuron does exhibit some level of firing to similar distractor cases. Reproduced from Quiroga et al. (2005).{#fig:fig-halle-berry-neuron width=75% }

[@fig:fig-halle-berry-neuron] shows the famous case of a "Halle Berry" neuron, recorded from a person with epilepsy who had electrodes implanted in their brain [@QuirogaReddyKreimanEtAl05]. This would appear to be evidence for an extreme form of localist representation, known as a grandmother cell (a term apparently coined by Jerry Lettvin in 1969), denoting a neuron so specific yet abstract that it only responds to one's grandmother, based on any kind of input, but not to any other people or things. People had long scoffed at the notion of such grandmother cells. Even though the evidence for them is fascinating (including also other neurons for Bill Clinton and Jennifer Aniston), it does little to change our basic understanding of how the vast majority of neurons in the cortex respond. Clearly, when an image of Halle Berry is viewed, a huge number of neurons at all levels of the cortex will respond, so the overall representation is still highly distributed. But it does appear that, amongst all the different ways of categorizing such inputs, there are a few highly selective "grandmother" neurons! One other outstanding question is the extent to which these neurons actually do show graded responses to other inputs --- there is some indication of this in the figure, and more data would be required to really test this more extensively.

Explorations

Open the faces simulation in CCN Sims (Part I only) for an exploration of how face images can be categorized in different ways (emotion, gender, identity), each of which emphasizes some aspect of the input stimuli and collapses across others.

Bidirectional Excitatory Dynamics and Attractors

The feedforward flow of excitation through multiple layers of the neocortex can make us intelligent, but the feedback flow of excitation in the opposite direction is what makes us robust, flexible, and adaptive. Without this feedback pathway, the system can only respond on the basis of whatever happens to drive the system most strongly in the feedforward, bottom-up flow of information. But often our first impression is wrong, or at least incomplete. In the "searching for a friend" example from the introduction, we might not get sufficiently detailed information from scanning the crowd to drive the appropriate representation of the person. Top-down activation flow can help focus us on relevant perceptual information that we can spot (like the red coat). As this information interacts with the bottom-up information coming in as we scan the crowd, our brains suddenly converge on the right answer: There's my friend, in the red coat!

Illustration of attractor dynamics, in terms of a "gravity well". In the familiar gravity wells that suck in coins at science museums, the attractor state is the bottom hole in the well, where the coin inevitably ends up. This same dynamic can operate in more abstract cases inside bidirectionally connected networks. For example, the x and y axes in this diagram could represent the activities of two different neurons, and the attractor state indicates that the network connectivity prefers to have neuron x highly active, while neuron y is weakly active. The attractor basin indicates that regardless of what configuration of activations these two neurons start in, they'll end up in this same overall attractor state.{#fig:fig-attractor width=40% }

The overall process of converging on a good internal representation given a noisy, weak or otherwise ambiguous input can be summarized in terms of attractor dynamics ([@fig:fig-attractor]). An attractor is a concept from dynamical systems theory, representing a stable configuration that a dynamical system will tend to gravitate toward. A familiar example of attractor dynamics is the coin gravity well, often found in science museums. You roll your coin down a slot at the top of the device, and it rolls out around the rim of an upside-down bell-shaped "gravity well". It keeps orbiting around the central hole of this well, but every revolution brings it closer to the "attractor" state in the middle. No matter where you start your coin, it will always get sucked into the same final state. This is the key idea behind an attractor: many different inputs all get sucked into the same final state. If the attractor dynamic is successful, then this final state should be the correct categorization of the input pattern.

A well-known example of an image that is highly ambiguous, but we can figure out what is going on if an appropriate high-level cue is provided, e.g., "Dalmatian". This process of top-down knowledge helping resolve bottom-up ambiguity is a great example of bidirectional processing.{#fig:fig-dalmatian width=50% }

There are many different instances where bidirectional excitatory dynamics are evident:

  • Top-down imagery --- I can ask you to imagine what a purple hippopotamus looks like, and you can probably do it pretty well, even if you've never seen one before. Via top-down excitatory connections, high-level verbal inputs can drive corresponding visual representations. For example, imagining the locations of different things in your home or apartment produces reaction times that mirror the actual spatial distances between those objects --- we seem to be using a real spatial/visual representation in our imagery.

  • Top-down ambiguity resolution --- Many stimuli are ambiguous without further top-down constraints. For example, if you've never seen [@fig:fig-dalmatian] before, you probably won't be able to find the Dalmatian dog in it. But now that you've read that clue, your top-down semantic knowledge about what a dalmatian looks like can help your attractor dynamics converge on a coherent view of the scene.

  • Pattern completion --- If I ask you "what did you have for dinner last night", this partial input cue can partially excite the appropriate memory representation in your brain (likely in the hippocampus), but you need a bidirectional excitatory dynamic to enable this partial excitation to reverberate through the memory circuits and fill in the missing parts of the full memory trace. This reverberatory process is just like the coin orbiting around the gravity well --- different neurons get activated and inhibited as the system "orbits" around the correct memory trace, eventually converging on the full correct memory trace (or not!). Sometimes, in so-called tip of the tongue states, the memory you're trying to retrieve is just beyond grasp, and the system cannot quite converge into its attractor state. Man, that can be frustrating! Usually you try everything to get into that final attractor. We don't like to be in an unresolved state for very long.

Energy and Harmony

There is a mathematical way to capture something like the vertical axis in the attractor ([@fig:fig-attractor]), which in the physical terms of a gravity well is potential energy. Perhaps not surprisingly, this measure is called energy and it was developed by a physicist named John Hopfield. He showed that local updating of unit activation states ends up reducing a global energy measure, much in the same way that local motion of the coin in the gravity well reduces its overall potential energy [@Hopfield82; @Hopfield84]. Another physicist, Paul Smolensky, developed an alternative framework with the sign reversed, where local updating of unit activation states increases global Harmony [@Smolensky86]. That sounds nice, doesn't it? To see the mathematical details, see Chapter Appendix on Energy and Harmony. We don't actually need these equations to run our models, and the basic intuition for what they tell us is captured by the notion of an attractor, so we won't spend any more time on this idea in this main chapter.

Explorations

Open faces in CCN Sims (Part II) for an exploration of how top-down and bottom-up processing interact to produce imagery and help resolve ambiguous inputs (partially occluded faces). Open these additional simulations for further elaboration of bidirectional computation:

  • cats-and-dogs --- fun example of attractor dynamics in a simple semantic network.
  • necker-cube-- another fun example of attractor dynamics, showing also the important role of noise, and neural fatigue.

Inhibitory Competition and Activity Regulation

Inhibitory competition plays a critical role in enabling us to focus on a few things at a time, which we can then process effectively without getting overloaded. Inhibition also ensures that those detectors that do get activated are the ones that are the most excited by a given input --- in Darwinian evolutionary terms, these are the fittest detectors.

Without inhibition, the bidirectional excitatory connectivity in the cortex would quickly cause every neuron to become highly excited, because there would be nothing to check the spread of activation. There are so many excitatory connections among neurons that it doesn't take long for every neuron to become activated. A good analogy is placing a microphone near a speaker that is playing the sound from that microphone --- this is a bidirectional excitatory system, and it quickly leads to that familiar, very loud "feedback" squeal. If one's audio system had the equivalent of the inhibitory system in the cortex, it would actually be able to prevent this feedback by dynamically turning down the input gain on the microphone, and/or the output volume of the speaker.

Another helpful analogy is to an air conditioner (AC), which has a thermostat control that determines when it kicks in (and potentially how strong it is). This kind of feedback control system allows the room to warm up to a given set point (e.g., 75 degrees F) before it starts to counter the heat. Similarly, inhibition in the cortex is proportional to the amount of excitation, and it produces a similar set point behavior, where activity is prevented from getting too high: typically no more than roughly 15-25% of neurons in any given area are active at a time.

The importance of inhibition goes well beyond this basic regulatory function, however. Inhibition gives rise to competition --- only the most strongly excited neurons are capable of overcoming the inhibitory feedback signal to get activated and send action potentials to other neurons. This competitive dynamic has numerous benefits in processing and learning. For example, selective attention depends critically on inhibitory competition. In the visual domain, selective attention is evident when searching for a stimulus in a crowded scene (e.g., searching for a friend in a crowd as described in the introduction). You cannot process all of the people in the crowd at once, so only a relatively few capture your attention, while the rest are ignored. In neural terms, we say that the detectors for the attended few were sufficiently excited to out-compete all the others, which remain below the firing threshold due to the high levels of inhibition. Both bottom-up and top-down factors can contribute to which neural detectors get over threshold or not, but without inhibition, there wouldn't be any ability to select only a few to focus on in the first place. Interestingly, people with Balint's syndrome, who have bilateral damage to the parietal cortex (which plays a critical role in spatial attention of this sort), show reduced attentional effects and also are typically unable to process anything if a visual display contains more than one item (i.e., "simultanagnosia" --- the inability to recognize objects when there are multiple simultaneously present in a scene). We will explore these phenomena in the Perception Chapter.

We will see in the Learning Chapter that inhibitory competition facilitates learning by providing this selection pressure, whereby only the most excited detectors get activated, which then gets reinforced through the learning process to make the most active detectors even better tuned for the current inputs, and thus more likely to respond to them again in the future. This kind of positive feedback loop over episodes of learning leads to the development of very good detectors for the kinds of things that tend to arise in the environment. Without the inhibitory competition, a large percentage of neurons would get trained up for each input, and there would be no specialization of detectors for specific categories in the environment. Every neuron would end up weakly detecting everything, and thus accomplish nothing. Thus, again we see that competition and limitations can actually be extremely beneficial.

A summary term for the kinds of neural patterns of activity that develop in the presence of inhibitory competition is sparse distributed representations. These have relatively few (15-25%) neurons active at a time, and thus these neurons are more highly tuned for the current inputs than they would otherwise be in a fully distributed representation with much higher levels of overall activity. Thus, although technically inhibition does not contribute directly to the basic information processing functions like categorization, because inhibitory connectivity is strictly local within a given cortical area, inhibition does play a critical indirect role in shaping neural activity patterns at each level.

Feedforward and Feedback Inhibition

Feedforward and Feedback Inhibition. Feedback inhibition reacts to the actual level of activity in the excitatory neurons, by directly responding to this activity (much like an air conditioner reacts to excess heat). Feedforward inhibition anticipates the level of excitation of the excitatory neurons by measuring the level of excitatory input they are getting from the Input area. A balance of both types works best.{#fig:fig-inhib-types width=50% }

There are two distinct patterns of neural connectivity that drive inhibitory interneurons in the cortex, feedforward and feedback ([@fig:fig-inhib-types]). Just to keep things interesting, these are not the same as the connections among excitatory neurons. Functionally, feedforward inhibition can anticipate how excited the excitatory neurons will become, whereas feedback accurately reflects the actual level of activation they achieve.

Feedback inhibition is the most intuitive, so we'll start with it. Here, the inhibitory interneurons are driven by the same excitatory neurons that they then project back to and inhibit. This is the classical "feedback" circuit from the AC example. When a set of excitatory neurons starts to get active, they then communicate this activation to the inhibitory interneurons (via excitatory glutamatergic synapses onto inhibitory interneurons --- inhibitory neurons have to get excited just like everyone else). This excitation of the inhibitory neurons then causes them to fire action potentials that come right back to the excitatory neurons, opening up their inhibitory ion channels via GABA release. The influx of $Cl^-$ (chloride) ions from the inhibitory input channels on these excitatory neurons acts to drive them back down in the direction of the inhibitory driving potential (in the tug-of-war analogy, the inhibitory guy gets bigger and pulls harder). Thus, excitation begets inhibition which counteracts the excitation and keeps everything under control, just like a blast of cold air from the AC unit.

Feedforward inhibition is perhaps a bit more subtle. It operates when the excitatory synaptic inputs to excitatory neurons in a given area also drive the inhibitory interneurons in that area, causing the interneurons to inhibit the excitatory neurons in proportion to the amount of excitatory input they are currently receiving. This would be like a thermostat reacting to the anticipated amount of heat, for example, by turning on the AC based on the outside temperature. Thus, the key difference between feedforward and feedback inhibition is that feedforward reflects the net excitatory input, whereas feedback reflects the actual activation output of a given set of excitatory neurons.

As we will see in the exploration, the anticipatory function of feedforward inhibition is crucial for limiting the kinds of dramatic feedback oscillations that can develop in a purely feedback-driven system. However, too much feedforward inhibition makes the system very slow to respond, so there is an optimal balance of the two types that results in a very robust inhibitory dynamic. Furthermore, the way in which inhibition and excitation interact through the tug-of-war dynamic as we saw in the previous chapter is essential for enabling these inhibitory dynamics to be as robust as they are. For example, the shunting nature of inhibition, which only starts to resist once the membrane potential starts to rise, enables the neurons to get some level of activity and then get pulled back down --- an alternative form of inhibition (e.g., simply subtracting away from excitation) would either prevent activation entirely or not generate enough inhibition to control the excitation.

Exploration of Inhibitory Interneuron Dynamics

  • See the inhib simulation in CCN Sims --- this simulation shows how feedforward and feedback inhibitory dynamics lead to the robust control of excitatory pyramidal neurons, even in the presence of bidirectional excitation.

FFFB Inhibition Function

We can efficiently implement the feedforward (FF) and feedback (FB) form of inhibition, without actually requiring the inhibitory interneurons, by using the average excitatory input $g_e$ and activity levels in a given layer in a simple equation shown below. This works surprisingly well, without requiring subsequent parameter adaptation during learning, and this FFFB form of inhibition has now replaced the k-Winners-Take-All (kWTA) form of inhibition used in the 1st Edition of the textbook [@OReillyMunakata00].

The average excitatory conductance (net input) to a layer (or pool of units within a layer, if inhibition is operating at that level) is just the average of the $g_e$ of each unit indexed by $i$ in the layer / pool: $$ \langle g_e \rangle = \sum_n \frac{1}{n} ge_i $$ Similarly, the average activation is just the average of the activation values ($y_i$): $$ \langle y \rangle = \sum_n \frac{1}{n} y_i $$

We compute the overall inhibitory conductance applied uniformly to all the units in the layer / pool with just a few key parameters applied to each of these two averages. Because the feedback component tends to drive oscillations (alternately over and under reacting to the average activation), we apply a simple time integration dynamic on that term. The feedforward does not require this time integration, but it does require an offset term, which was determined by fitting the actual inhibition generated by our earlier kWTA equations. Thus, the overall inhibitory conductance, which then drives the inhibition in the tug-of-war pulling against the excitatory conductance, is just the sum of the two terms (ff and fb), with an overall inhibitory gain constant factor Gi: $$ g_i(t) = \mbox{Gi} \left[ \mbox{ff}(t) + \mbox{fb}(t) \right] $$

This Gi factor is typically the only parameter manipulated to determine overall layer activity level. The default value is 1.8. Higher values produce sparser levels of activity. For very sparse layers (e.g., a single output unit active), values up to around 3.5 can be used.

The feedforward (ff) term is: $$ \mbox{ff}(t) = \mbox{FF} \left[ \langle g_e \rangle - \mbox{FF0} \right]_+ $$ where FF is a constant gain factor for the feedforward component (set to 1.0 by default), and FF0 is a constant offset (set to 0.1 by default).

The feedback (fb) term is: $$ \mbox{fb}(t) = \mbox{fb}(t-1) + dt \left[ \mbox{FB} \langle y \rangle - \mbox{fb}(t-1) \right] $$ where FB is the overall gain factor for the feedback component (0.5 default), dt is the time constant for integrating the feedback inhibition (0.7 default), and the t-1 indicates the previous value of the feedback inhibition --- this equation specifies a graded folding-in of the new inhibition factor on top of what was there before, and the relatively fast dt value of 0.7 makes it track the new value fairly quickly --- there is just enough lag to iron out the oscillations.

Overall, it should be clear that this FFFB inhibition is extremely simple to compute (much simpler than the previous kWTA computation), and it behaves in a much more proportional manner relative to the excitatory drive on the units --- if there is higher overall excitatory input, then the average activation overall in the layer will be higher, and vice-versa. The previous kWTA-based computation tended to be more rigid and imposed a stronger set-point like behavior. The FFFB dynamics, being much more closely tied to the way inhibitory interneurons actually function, should provide a more biologically accurate simulation.

Exploration of FFFB Inhibition

To see FFFB inhibition in action, you can follow the instructions at the last part of the inhib simulation at CCN Sims.

Appendix

The following optional additional topics are covered here:

  • Philosophy of Categories: philosophical issues about the truth value of mental categories.

  • Energy and Harmony: mathematics of attractor dynamics in terms of Hopfield energy or Smolensky's Harmony.

Philosophy of Categories

This section delves a bit more into the philosophical issues associated with mental categories, and their apparent lack of obvious "truth" value, and what the implications of this might be.

IMPORTANT DISCLAIMER: this will probably be an interesting topic for various folks and certainly a lot has been written on this topic in the philosophical literature. At this point, however, the views represented here are those of the first author and perhaps a few other co-authors.

  • As we noted in the main chapter, it seems that mental categories are shaped by learning, and social interaction via language etc, and that there is likely some kind of underlying regularity that drives our ability to form stable internal category representations, but really, they are not "grounded" in any solid kind of "reality".

  • This accords with many facts about human cognition: it is highly fallible, people believe all manner of completely wrong things all the time (and often hold these beliefs extremely dearly..), etc.

  • But it is somewhat unsettling to embrace this view, as it seems to put one square in the full "cultural relativism" camp, with no hope of ever having any sense "universal truth". This makes objectivists puke, and is generally not great for scientists, who seek to discover the "true nature of the world".

  • However, there is a very good solution to this problem, even though it is in no way "absolute" and certainly takes a lot of time and patience (and cooperation among individuals). It also happens to be the bedrock of science. This solution is to develop an ever-broader self-consistent set of mental categories, based on replicable experiences that can be shared across individuals. In short, any given mental category you might happen to develop has a good chance of being wrong, but if you and a group of other people can all agree on a very reliable set of basic experiences and ways of categorizing those that is self-consistent over time and across the whole set of such categories, then it seems quite likely that these are "true".

  • In scientific terms, the "experiences" are experiments that can be replicated across different labs. And the mental categories are scientific theories which have to be consistent not only with a given set of experiments, but also with each other, and all the other experiments that support other such theories.

  • At this point in time, there is a collective understanding in science that encompasses a great deal of phenomena in the natural world, e.g., the "standard model" in physics, and all of chemistry, biology, molecular genetics, etc. Higher-level more complex phenomena such as human cognition and neuroscience have a lot more unresolved issues, but progress is being made and people would probably be surprised by how many important things have a strong overall consensus. But of course, no one individual knows all this stuff. But anyway, it is there for the knowing, and seems to constitute the closest approximation to the truth that we're going to get.

  • Short answer: if you want to find the truth, become a scientist! If not, be content to just make stuff up. The brain is very good at it, and it might serve you just fine.. If you don't want to go all the way to being a scientist, you can also try to just think about your different mental categories (beliefs) and see which ones seem consistent with each other and which ones don't. Then, try to resolve the inconsistencies, in a way that best matches your actual physical experiences in the real world. In so doing, you will likely improve the quality of your mental categories, making them closer approximations to some kind of underlying truth!

Energy and Harmony

This section describes the [@Hopfield82; @Hopfield84] energy and Smolensky Harmony equations, and how they help us understand more formally what our networks are doing as they settle into an attractor state.

The Hopfield energy equation is: $$ E = - \frac{1}{2} \sum_j \sum_i x_i w_{ij} y_j $$ where x and y represent the sending and receiving unit activations (indexed by i and j), respectively, and w is the weight between them.

Harmony is literally the same thing without the minus sign: $$ H = \frac{1}{2} \sum_j \sum_i x_i w_{ij} y_j $$

You can see that Harmony is maximized to the extent that, for each pair of sending and receiving units, the activations of these units $x_i$ and $y_j$ are consistent with the weight between these two units. If the weight is large and positive, the network is configured such that it is harmonious if these two units are both active together. If the weight is negative (a simple version of inhibitory projections), then those units contribute to greater harmony only if they have opposite sign (one is active and the other not active).

A key feature of these equations is that local updates drive reliable global effects on energy or Harmony (decreasing the energy or increasing Harmony). To see this, we can use the mathematics of calculus to take the derivative of the global equation with respect to changes in the receiving unit's activation: $$ \frac{\partial H}{ \partial y_j} = \sum_i x_i w_{ij} $$

Taking the derivative allows us to find the maximum of a function, which occurs when the derivative is zero. So, this gives us a prescriptive formula for deciding how $y_j$ should be changed (updated) as a function of inputs and weights so as to maximize Harmony. You might recognize this equation as essentially the net excitatory conductance or net input to a neuron, from the Neuron Chapter. This means that updating units with a linear activation function (where activation y = net input directly) would serve to maximize Harmony or minimize energy. To accommodate a non-linear activation function (e.g., a "sigmoidal" function of the same general shape as the XX1 function), one needs to introduce and additional "penalty" term (called entropy in the Hopfield framework, and stress in the Smolensky one), that essentially drives the saturation of the neural activation function for high or low values of net input.