Statistical learning and brain plasticity
25th Symposium: June 1-3, 2006
Perceptual and Motor Learning—David Williams, Session Chair
Mario Svirsky, Indiana University
Learning to understand frequency-shifted, spectrally degraded speech
The human brain is a remarkable speech recognizer, even in the face of such extreme distortions as sinewave speech, clipped speech, and four-channel noise vocoders. However, the simultaneous application of two types of distortion, such as a 1 or 2 octave frequency shift combined with the spectral degradation inherent in an 8-channel noise vocoder, renders the input speech signal unintelligible. This has important clinical implications because this type of combined distortion is a reasonable model of the speech input received by cochlear implant (CI) users. Fortunately, it is possible to learn how to interpret such a signal so that it becomes more intelligible over time.
It is generally believed that CI’s impose a basalward shift to the acoustic input, that is, sounds stimulate neurons with higher characteristic frequency than the acoustic frequency of the original stimulus. This frequency misalignment may have a negative influence on speech perception by postlingually deaf CI users. However, a perfect frequency-place alignment between analysis filters and stimulated electrodes may result in the loss of important low frequency speech information. A trade-off may involve a gradual approach: start with correct frequency-place alignment to allow listeners to adapt to the spectrally degraded signal first, and then gradually increase the basalward shift to allow them to adapt to it over time. A first study compared the gradual approach described above to the sudden approach normally used with CI users: immediate exposure to a constant frequency shift that does not change over time. The main finding was that speech perception scores were initially much higher with the gradual approach than with the sudden approach, but differences decreased over the course of 15 one-hour training sessions. A second study also employed spectrally degraded, frequency shifted speech, but listeners were allowed to adjust the input filter bank in real time to a preferred setting. Interestingly, the listener-selected filter banks represented a tradeoff between correct frequency-place alignment and low frequency speech information. Taken together, these results may have significant implications for the optimal fitting of sensory aids such as cochlear implants and frequency transposition hearing aids.
Jason Gold, Indiana University
Signal and noise in perceptual learning
Performance in perceptual tasks often improves dramatically with practice ('perceptual learning'). In this talk, I will discuss recent work directed at specifying the mechanisms that mediate perceptual learning, cast within the framework of signal detection theory and black-box perceptual information processing models. Within this context, I will also discuss a collection of system-identification techniques that use externally added noise to quantify the factors that limit performance and allow one to trace how these factors change as a function of practice.
Reza Shadmehr, Johns Hopkins University
Internal models, adaptation, and uncertainty
When the brain generates a motor command, it also predicts the sensory consequences of the motor commands in terms of how it will affect the environment, and the body. What is the purpose of this prediction? Here I will show that the brain integrates its predictions with the actual sensory feedback, arriving at an estimate that is better than possible from sensation alone. This Bayesian integration depends on sensorimotor maps that must are models of our body and the environment. Because these systems can change, the maps must adapt. I will show that a prediction error causes changes in multiple adaptive systems. Some are highly responsive to error, but rapidly forget. Others are poorly responsive to error but have high retention. This explains savings and spontaneous recovery. Fast and slow adaptive processes arose because disturbances to the motor system have various timescales (fatigue vs. disease). When faced with error, the brain faces a credit assignment problem: what is the timescale of the disturbance? To solve this problem, the brain likely keeps a measure of uncertainty about the timescales.
Alexandre Pouget, University of Rochester
Neural basis of perceptual learning: haven't we solved this issue already?
Extensive training on simple perceptual tasks often improves behavioral performance. The neural basis of this improvement is believed to be well-understood. It is thought to involve a combination of the following changes: 1- an increase in the number of neurons representing the sensory input, 2- a sharpening of the tuning properties of the sensory neurons, and 3- an increase in the average firing rates. Modeling studies have suggested that all three of those changes could indeed increase the information content of cortical population codes, which in turn could account for the improved performance. These studies however are all based on the assumption that the noise in the brain is independent. Unfortunately, this assumption is incorrect: neurons are correlated and, importantly, correlations are likely to change during learning. I will show that when correlations are taken into account, a mechanism such as tuning curve sharpening can either increase or decrease information depending on how it is implemented. Moreover, it is possible to increase the information content of a population code by adjusting correlations while leaving intact the tuning curves. Therefore, the neural basis of perceptual learning is still very much unclear, but could be resolved by using multielectrode recordings.
Learning: Role of Priors and Attention—Robbie Jacobs, Session Chair
David Knill, University of Rochester
Learning Bayesian priors for depth perception
Pictorial cues to depth rely on prior knowledge about statistical regularities in the environment; for example about the prevalence of symmetric objects, of parallel lines and lighting from above. In the first part of this talk, I will discuss evidence that the human visual system uses probabilistic characterizations for this prior knowledge that incorporate mixtures of multiple possible models of objects. This account explains a number of perceptual effects, most notably non-linear robust cue integration. In the second part of the talk, I will discuss the problem of how the visual system learns the statistical regularities needed to interpret pictorial cues to; in particular, how it adapts its internal model to environments with very different statistics. We specifically tested the hypothesis that the visual system can adapt its model of the statistics of planar figures for estimating 3D surface orientation. Taking elliptical figures as a prototypical case, we develop a Bayesian model that effectively learns the probability density function on shape from stereoscopic images of slanted ellipses. When the model adapts to an irregular environment, it gradually down-weights the pictorial cue to slant provided by the shapes of projected ellipses relative to stereopsis. When estimating surface slant in an environment containing randomly shaped ellipses, human subjects similarly down-weight the pictorial cue over time, but not in an environment containing mostly circles. This shows that they have adapted their internal model of the shape statistics of the environment.
Marvin Chun, Yale University
Attentional control of perceptual memory
Functional magnetic resonance imaging (fMRI) reveals not only where information is processed in the brain, but also what is encoded in a stimulus-specific manner. For example, just as infants, children, and adults habituate to events that they treat as "the same," underlying neural responses measured with fMRI are lower to repeated, familiar stimuli than to novel stimuli. This difference is known as repetition attenuation, and such attenuation signals allow researchers to study stimulus-specific perceptual and memory representations in the brain (Grill-Spector et al., 2006; Schacter & Buckner, 1998). Prior work has focused on linking this repetition attenuation signal with automatic perceptual processing and unconscious, implicit memory (Wiggs & Martin, 1998). In contrast, our work indicates that repetition attenuation is subject to attentional control and also correlated with conscious, explicit memory (Turk-Browne et al., 2006; Yi & Chun, 2005). Having established a strong association between repetition attenuation and perceptual memory, a final study will use repetition attenuation to reveal systematic distortions, that is, false memories, of scene layout information that was never seen by the observer (Park et al., submitted).
Nick Chater, University College London
Simplicity, probability and perception
Perception is a problem of abductive inference—inferring the structure of the environment, from patterns of stimulation on the sensory surfaces. Abductive inference can naturally be modelled in as Bayesian framework—but this requires assigning prior probabilities to what appears to an infinite range of perceptual hypotheses. How is this possible? This talk suggests that priors can usefully be viewed as set by the choice of representation 'language' in which perceptual input is coded. This leads to a simplicity principle in perception—that is, the perceptual system is viewed as preferring short codes for sensory input. This approach is part of a long tradition in the study of perception, dating back to Mach and the Gestalt psychologists.
This 'simplicity' perspective on perception provides a natural interpretation of the Gestalt Laws, and a range of phenomena of perceptual organization. I will also consider the question of the scope and limitations of this approach, and discuss the question of how far this framework is open to empirical test.
Josh Tenenbaum, Massachusetts Institute of Technology
Statistical learning of abstract knowledge
In accounts of cognitive development, statistical learning is generally seen as an alternative to the acquisition and use of abstract knowledge and richly structured representations. Empiricist typically embrace the former while shying away from the latter; rationalists or nativists typically take the opposite stance. Yet these two concepts—statistical inference and abstract, structured representation—are two of the most powerful tools that have ever been offered to explain the nature and origins of intelligence, in over two thousand years of trying. Must we be forced to choose between them?
I will argue that theories of learning and development can and must draw on the strengths of both structure and statistics. I will present a hierarchical Bayesian framework for inductive learning, in which statistical inference operates over structured representations of knowledge at multiple levels of abstraction. These knowledge representations may be thought of as simple forms of intuitive theories for various domains of entities, properties, and relations. The hierarchical Bayesian analysis shows how abstract knowledge of domain structure provides strong constraints on a learner's inductive generalizations, and how that abstract knowledge may itself be learned through rational statistical means. I will discuss applications of the framework to modeling learning and reasoning in several domains, such as natural kind categories and their properties, social relations or causal relations. (Joint work with Charles Kemp, Vikash Mansinghka, and Tom Griffiths.)
Constraints on Pattern Learning—Michael Weliky, Session Chair
Lori Holt, Carnegie Mellon University
Auditory categorization and tuning in speech perception
The ease of everyday conversation masks the cognitive and perceptual challenges of translating from acoustic signal to meaning. One of the fundamental reasons machine recognition of speech is so difficult is that the acoustics of spoken language are incredibly complex. The acoustic details of a spoken utterance vary with the rate of speech, speaker idiosyncrasies, phonetic context, accent, and even the reverberance of the speaking environment. With these diverse sources of acoustic variability - some linguistically relevant, some not - the mechanisms that transform the acoustic signal into a linguistic representation face a complex task.
However, the acoustic speech signal and the greater perceptual environment in which it is presented also possess much regularity. I will present data demonstrating that some of the perceptual challenges of speech perception, and thereby some of the challenges of early language processing, may be met by general-purpose perceptual mechanisms sensitive to regularity and change in the perceptual environment at multiple time scales.
Daniel Margoliash, University of Chicago
Pattern perception in songbirds
Temporal sequence is a rich and essential component of information in vocalizations. As part of a larger effort to characterize nonlinear receptive field properties of higher-order auditory neurons we have been studying sensitivity of starlings to sequences of naturally-occurring motifs. After achieving baseline performance on a go/nogo task that contrasted sets of strings drawn from finite-state and context-free grammars (CFG), birds were transferred to novel sets of strings of the same order, and probed with higher-order grammatical strings and agrammatical strings. The results indicate that birds learned a simple CFG, a level of perceptual syntactic complexity which has previously been posited to be uniquely available to humans. Starlings performed equally well on strings of human syllables whereas humans easily solved the problem when posed with strings of syllables but struggled with strings of motifs. These results emphasize the importance of species specificity in all animal behavior including language and challenge dogma that places language outside the realm of biological experimentation. Higher-order neurons in the starling auditory system respond selectively to motifs depending on learning and reward contingencies. Although we know little yet regarding temporal sequence sensitivity of these neurons, such analysis is likely to provide mechanistic insight into complex pattern perception in birds, which may constrain theories of the evolution of such behaviors.
Richard Aslin, University of Rochester
Statistical learning of visual patterns: Helmholtz, Bayes, and Dr. Spock
The natural visual environment is filled with objects and surfaces whose elements and their spatial arrangement form a myriad of possible underlying structures, only a subset of which correspond to the physical properties of the 3D world. How does a naive learner exposed to such a complex set of inputs manage to extract the "right" structure in finite time?
A series of experiments using arrays of simple shapes organized into "scenes" will be reviewed. These experiments show that adult learners can rapidly and efficiently extract the underlying structure of these scenes without feedback (i.e., by mere exposure). Despite the presence of many spurious coincidences, a variety of constraints enable this process of statistical learning to succeed. These empirical data are well described by a class of Bayesian models (Sigmoid Belief Networks).
From where do the constraints that enable statistical learning originate? Infants do not have access to top-down information and must either have intrinsic biases or acquire them from visual experience. These priors serve to make a seemingly intractable learning problem remarkably efficient, even for 9-month-olds.
Toby Mintz, University of Southern California
Learning syntactic categories from patterns in linguistic input
Grammatical categories—such as noun, verb, adjective, etc.—are the building blocks of syntactic structure. A crucial question in language acquisition research is how learners initially categorize words. One theory proposes that learners attend to lexical co-occurrence patterns, noting the environments in which words occur, and categorizing words together that can occur in similar environments. For example, noting that 'cat' and 'dog' both can occur after 'the' and 'a', or before 'runs', etc., would cause a learner to categorize them together. Research has shown that the linguistic input to children is structured such that distributional information of this type is indeed informative for categorizing words (Mintz et al., 2002; Redington et al., 1998). However, many behavioral studies have failed to find evidence that human learners use this kind of information on its own (e.g., Smith, 1966). I will discuss a particular sequential pattern called a frequent frame (Mintz, 2003), and show that it provides robust, cross-linguistically available cues to categories, and requires minimal computational resources. Further, I will present behavioral research with adults (Mintz, 2002) and infants (Mintz, 2006) suggesting that human learners are, in fact, especially sensitive to this kind of pattern and use it to categorize words.
Neural Mechanisms of Learning—Lizabeth Romanski, Session Chair
Takao Hensch, Riken Institute
Distinct adult perceptual learning and critical period plasticity in visual cortex
Competitive plasticity of binocular inputs following monocular deprivation (MD) is prominent in the primary visual cortex (V1) during an early critical period. In humans or animals raised with an occlusion of one eye, the consequent loss of cortical spiking response to the deprived eye (ocular dominance plasticity) leads to lifelong amblyopia (loss of visual acuity). Genetic or pharmacological manipulation of specific GABAergic circuits directly controls the timing of this critical period and its eventual anatomical consolidation (Hensch, 2005). Molecular 'brakes' in the extracellular milieu that are non-permissive for growth may further delimit the dramatic cortical plasticity and are targets for therapeutic strategies. Recently, MD has been reported to conversely enhance open eye responses in adult mice dependent upon the integrity of visual cortex. We address to what extent these two types of plasticity induced by the same MD share common mechanisms. Our observations support the view that adult perceptual learning and classical ocular dominance plasticity are independent processes.
Anthony Zador, Cold Spring Harbor Laboratory
How many synapses must change to form a memory?
To elucidate molecular, cellular, and circuit changes that occur in the brain during learning, we investigated the role of a glutamate receptor subtype in fear conditioning. In this form of learning, animals associate two stimuli, such as a tone and a shock. Here we report that fear conditioning drives AMPAtype glutamate receptors into the synapse of a large fraction of postsynaptic neurons in the lateral amygdala, a brain structure essential for this learning process. Furthermore, memory was reduced if AMPA receptor synaptic incorporation was blocked in as few as 10% to 20% of lateral amygdala neurons. Thus, the encoding of memories in the lateral amygdala is mediated by AMPA receptor trafficking, is widely distributed, and displays little redundancy.
Nathaniel Daw, University College London, Gatsby
Reward & exploration in human decision making
We have rather detailed, if tentative, information about how organisms learn from experience to choose better actions. But it is much less clear how they arrange to obtain this experience. The problem of sampling unfamiliar options is a classic theoretical dilemma the costs and benefits of exploring unfamiliar options must be balanced against those of exploiting the options that appear best on current knowledge.
Using behavioral analysis and functional neuroimaging, we study how humans approach this dilemma in a free-choice decision task. We assess the fit to participants' trial-by-trial choices of different exploratory strategies from reinforcement learning, and, having validated an algorithmic account of behavior, use it to infer subjective factors such as when subjects are exploring versus exploiting. These estimates are then used to search for neural signals related to these phenomena. The results support the hypothesis that exploration is encouraged by the active override of an exploitative choice system, rather than an alternative, computationally motivated hypothesis under which a single (putatively dopaminergic) choice system integrates information about both the exploitative and exploratory values of candidate actions. Although exploration is ubiquitous, it is also difficult to study in a controlled manner: We seize it only through the tight integration of computational, behavioral, and neural methods.
Leo Sugrue, Stanford University
Choosing the greater of two goods: a combined behavioral, modeling, and physiological approach to value based decision making
To forage successfully animals must learn and maintain an internal representation of the value of competing options and link that representation to the neural processes responsible for decision-making and motor planning. To explore the neural substrate of valuation and action we have modeled the proximal behavioral mechanism underling the choices of rhesus monkeys in a simple task that requires them to forage for resources in a dynamic environment. The resulting model suggests that our monkeys have learned to maximize their foraging efficiency given the underlying statistics of reward availability in this task. Moreover, the hidden variables revealed by the model provide us with a framework with which to interpret neurophysiological and brain imaging data collected while monkeys perform the task and to isolate the specific contributions of different brain areas to value based decision making.
Maturation and Plasticity—Daphne Bavelier, Session Chair
Daphne Maurer, McMaster University
Missed sights: consequences for visual development
Newborns can see but it takes many years for vision to reach adult levels. We have evaluated the contribution of early visual experience to the later development by comparing visually normal children to children who had been deprived of patterned visual input during the first 2-9 months after birth because they were born with dense cataracts in one or both eyes. Longitudinal studies indicate that some aspects of low-level vision normalize after treatment by improving faster than normal to make up for an initial deficit (e.g., contrast sensitivity for low spatial frequencies). For other aspects there are permanent deficits because development asymptotes at a level below normal (e.g., contrast sensitivity for mid-and high-spatial frequencies after binocular deprivation) or because earlier gains are lost (after monocular deprivation). Surprisingly, plasticity for higher-level visual functions cannot be predicted accurately from the results for low-level vision. This point will be illustrated by results from global form, holistic face processing, and biological motion.
Brian Wandell, Stanford University
Maps and reading development in visual cortex
Visual cortex has been an excellent model system for developing a quantitative understanding of brain function. We understand a great deal about the physical signals that initiate vision, and this knowledge has led to a relatively advanced understanding of the organization of major structures in visual cortex, such as visual field maps. This talk will explain several measurements and computational methods that are used to understand human brain development and plasticity.
First, we have developed functional magnetic resonance imaging (fMRI) methods for measuring and quantifying the properties of maps in individual human and macaque brains. To understand the development and plasticity of these maps, we have made measurements in several cases of abnormal development as well as in controlled experiments using macaque.
Second, we are combining fMRI with diffusion tensor imaging (DTI), a method that can be used to study the white matter fibers, to understand visual development. Specifically, as children develop and learn to read certain visual recognition skills become highly automatized and the brain develops specialized visual circuitry to support skilled reading. We are measuring how certain parts of these circuits develop, and how the signals from these circuits are transmitted to other cortical systems.
Elissa Newport, University of Rochester
Statistical language learning: computational and maturational constraints
In collaboration with Richard Aslin, I have been developing an approach to language acquisition known as 'statistical language learning.' Our basic idea is that important parts of human language acquisition involve computing, over a stream of speech, such things as how frequently sounds co-occur; how frequently words occur in similar contexts; and the like. The learner then uses these computations to determine regular versus accidental properties of the language being acquired. Our studies have shown that adults, infants, and even nonhuman primates are capable of performing such computations online and with remarkable speed, on both speech and nonspeech materials. However, when tested on more complex computations involving non-adjacent sounds, humans show strong selectivities (they can perform certain computations, but fail at others), corresponding to the patterns which natural languages do and do not exhibit. Primates are not capable of performing some of these more difficult computations.
In addition to this basic statistical mechanism, recent research has revealed that there are maturational changes in the ways in which various types of statistical arrays are compiled into generalizations. Given most types of linguistic input, adults will reproduce the statistics of the corpus that they hear. In contrast, young children will sharpen the statistics, often producing a dramatically more systematic and regular language than the one to which they are exposed. These sharpening processes are also an important part of statistical learning, potentially explaining not only why children acquire language (and other patterns) more effectively than adults, but also how systematic languages may emerge in communities where usages are varied and inconsistent.
Randy Gallistel, Rutgers University
Is mutual information the learning-relevant parameter of conditioning protocols?
Two independent lines of evidence indicate that the number of reinforced presentations of a warning stimulus (CS) is, in and of itself, an irrelevant parameter in basic learning protocols, a finding that is, we believe, devastating to all extant associative theories of learning. It is an irrelevant parameter because, for a fixed warning interval (CS-US) there is a perfect trade-off between the number of reinforced presentations required to bring the subject to the point where it makes a conditioned response to the CS and the ratio of the US-US interval to the CS-US interval. If you delete half the trials in a protocol without moving the location in time of the undeleted trials, you double the US-US interval. This operation on protocols has no effect on the progress of conditioning when that progress is plotted as a function of the duration of exposure to the conditioning protocol. The halving of the number of reinforced trials within a given exposure duration is exactly compensated by the doubling of the ratio between the mean US-US interval and the CS-US interval. So what is a relevant parameter. Under a particularly simple-minded calculation of the mutual information between the timing of CS onset and the timing of the US (an analysis that neglects several components of the mutual information), the mutual information between the CS and the US is the relevant parameter of the protocol. Indeed, the assumption that the mutual information unique to a given CS in a given protocol determines not only whether a response will ever develop to a CS but also how long the protocol will have to run for that to happen appears to predict a very wide range of findings in basic conditioning. Thus, the (simplified) unique mutual information between CS and US appears to determine the strength of a protocol, and the duration of exposure to that protocol required before responding begin appears to be inversely proportional to the strength of the protocol. If the rate at which predictive information is acquired is proportional to the strength of a protocol, then this may imply that responding begins when a critical quantity of predictive information has been extracted from exposure to the protocol.