Environmental Structure, Statistical Learning, & Visual Perception
21th Symposium: June 4-6, 1998
Image Statistics—Richard Aslin, Moderator
Eero Simoncelli, New York University
Visual cortical processing and the statistics of images
We examine the statistics of natural monochromatic images decomposed using a multi-scale wavelet basis. Although the coefficients of this representation are well decorrelated, they exhibit important higher-order statistical dependencies that cannot be eliminated with purely linear processing. In particular, rectified coefficients corresponding to basis functions at neighboring spatial positions, orientations, and scales are highly correlated. The optimal method of removing these dependencies is to divide each coefficient by a weighted combination of its rectified neighbors. Several successful models of neural processing in visual cortex are based on such "divisive normalization" computations, and thus our analysis provides a theoretical justification for these models. Perhaps more importantly, the statistical measurements explicitly specify the weights that should be used in computing the normalization signal. We demonstrate that this weighting is qualitatively consistent with recent physiological measurements, implying that early visual neural processing is well matched to the statistics of images.
Charles Chubb, University of California, Irvine
Statistically certified unsupervised learning
This talk will explore the conjecture that visual neurons have evolved cooperatively to maximize their collective power to reject the null hypothesis that their input is devoid of spatial structure, thereby evolving receptive fields that efficiently represent characteristic input structures. Under this hypothesis, the visual system adaptively refines the ensemble of V1 simple cell receptive fields (along with associated point nonlinearities) into an implement used in the context of a novel "structure test." The refinement of these domain-specific data structures increasingly empowers the structure test to reject the null hypothesis that new input images are devoid of spatial structure (i.e., are composed of randomly scrambled pixel intensities). Thus these data structures become increasingly sensitized to structures specific to the target image domain. Adaptive structure-detection procedures based on these ideas will be described. These procedures will be demonstrated in applications to several image domains.
Daniel Ruderman, The Salk Institute
From scaling to color in natural images
Scale invariance is the most universal property found in natural images. In this talk I will begin by summarizing our evidence for scaling as evidenced by the power spectrum. I will then present a more strict definition in terms of higher-order statistics and show that they too scale, giving a deeper and much richer meaning to natural image scaling. In the second part I present our recent data on predicted retinal photoreceptor response statistics to natural images. We find that once an appropriate metric space for cone responses is chosen that principal components analysis provides more than just the second-order decorrelation it guarantees. This "optimal" linear transformation appears similar to the blue-yellow and red-green opponent processes found both psychophysically and physiologically. Finally, I will discuss the consequences of scale invariance to the decorrelation of spatial and chromatic dimensions.
Color Constancy—Peter Lennie, Moderator
David Brainard, University of California, Santa Barbara
Color constancy in the nearly natural image
Color constancy refers to the stability of object color appearance despite changes in illumination. Color constancy is an important problem because it embodies an ambiguity that is at the core of all perceptual problems: multiple physical configurations can produce the same image. In color constancy, the ambiguity arises because we can trade off illuminant and surface spectra while keeping the image constant.
In this talk, I will review experiments conducted in my laboratory that measure human color constancy under nearly natural viewing conditions. The first set of experiments provide baseline measurments. These experiments show that under rich viewing conditions, color constancy is a robust and regular effect that may be studied quantitatively. The second set of experiments systematically manipulates the richness of the viewing conditions. These experiments rule out the possibility that constancy results entirely from the action of simple low-level processes such as simultaneous color contrast or chromatic adaptation to the spatial mean of the image.
Steven Shevell, University of Chicago
Inferred illumination in complex scenes: Beyond simultaneous color constancy
Color appearance depends on many aspects of vision that follow light absorption by receptors. Perception of natural scenes, which are composed of meaningful objects that form a complex mosaic of chromaticities, involves perceptual processes that are seldom studied in simpler laboratory experiments. Two such processes are considered here: (i) classification of objects according to a common illuminant and (ii) color memory.
Classification of objects by illumination is a different problem than color constancy. In color constancy, the aim is a color representation of an object that does not depend on the particular illuminating light. For example, an approach to color constancy is to represent object colors in a way that takes account of ("discounts") the inferred illumination. The classification-by-illumination problem, on the other hand, considers how the parts of a complex stimulus are separated into those regions sharing the same illuminant. Experiments show that minimal changes in the spatial configuration of ten distinct patches of light can alter the illuminant classification, and result in a surprising change in perceived brightness. Lower inferred illumination, without an actual change of light entering the eye, makes a fixed light appear dimmer. Similar results are found with color: a minimal reconfiguration of a fixed set of chromatic stimuli causes color appearance to shift in the direction of the inferred illumination.
The utility of object colors depends on color memory. Consider finding your navy-blue coat in a crowded coat closet, recognizing healthy from poorly watered grass, or observing a symptom of disease characterized by subtle changes in the color of skin or retina. How is the color that one learns encoded and stored in memory? Experiments show that (a) the remembered color of a patch is less affected by the illumination during learning when the color is learned with many different chromaticities also in view, compared to learning with a uniform background (even a background at the chromaticity of the illuminant); and (b) a color recalled after 10 minutes is less affected by the illuminant during learning than a color recalled after 10 seconds. These results suggest that colors viewed within complex scenes and held in memory for many minutes, as in natural vision, tend toward object colors more than would be expected from measurements of simultaneous color constancy or successive color matching.
Anya Hurlbert, University of Newcastle, England
Surface colour and 3D shape
Laboratory measurements of colour constancy are typically made for 2D surfaces with spatially uniform surface reflectance properties under uniform direct illumination. Colour appearance and colour constancy for simple scenes with these characteristics may largely be explained by low-level sensory mechanisms such as cone adaptation and local contrast computations.
Colour constancy of 3D objects is more difficult to quantify (but I will describe some attempts to do so). 3D objects in real scenes typically have large surface chromaticity and luminance variations, due to inherent surface reflectance variations, indirect illumination from other objects (mutual illumination), multiple light sources, and 3D surface shading. Mutual illumination, in particular, may have strong effects on surface chromaticity and luminance. Mutual illumination may also provide a cue to surface reflectance, and therefore has been proposed as a factor that may enhance colour constancy in the real world.
The Chromatic Mach Card demonstrates an intrinsic link between the perception of surface colour, 3D shape, and mutual illumination. The Chromatic Mach Card is a concave folded card with one side made of red paper and the other of white paper. The light reflected from the red side casts a pinkish glow on the white side, via mutual illumination. The perceived colour of the white side dramatically changes from white to pink when the perceived shape of the card flips from concave to convex. The effect cannot be explained in the same way as the traditional achromatic Mach Card, but instead is consistent with a Bayesian analysis of generic properties in which the visual system seeks image interpretations which are robust over varying colour and position of the light source. For a convex geometry in which the red card does not face the white one, red and pink surface colours are less accidental than red and white surface colours. For the concave geometry, the visual system appears to use its knowledge of the relationship of shape and mutual illumination to discount the reddish indirect light and to recover correctly the surface colour of the white paper.
Thus: (1) the human visual system incorporates knowledge of mutual illumination to recover surface colour; (2) perceived surface colour is not a property of an object that can be dissociated from its 3D shape; and (3) to interpret physically ambiguous images, the visual system appears to accept the least accidental set of object properties, assuming unknown variations in genetic environmental variables.
Surface Perception—Mary Hayhoe, Moderator
Ted Adelson, Massachusetts Institute of Technology
Lightness, transparency, and surfaces
The luminance of a surface is determined by a combination of factors, notably the reflectance of the surface, the light striking the surface, and the nature of transparent media between the surface and the eye. The viewing conditions may be described with a single transform called "atmosphere," which indicates how reflectance is mapped to luminance. Atmosphere varies from point to point and scene to scene. Humans are quite good at "seeing through" it in order to estimate the reflectance of the underlying surface. I will describe a number of illusions and other phenomena that point to some of the heuristics humans use. These include: junction analysis, grouping, and gathering local statistics on luminance.
Laurence Maloney, New York University
Measurement and modeling of cue combination in shape, depth, and color perception
There are a number of visual cues that can, in principle, provide information about the shape, location, and material surface properties of each small surface patch in a given scene. It is well known, for example, that multiple depth cues influence human perception of the locations of objects relative to the observer. When several cues are simultaneously available, for a single location, the visual system may attempt to combine them. I'll discuss three key issues relevant to the experimental analysis of visual cue combination in human vision, and review recent psychophysical and computational studies of human cue combination in light of these issues. The discussion and review are organized as the development of a model of the cue combination process termed "modified weak fusion" (MWF). this model was originally developed as as model of the depth and shape cue combination process in human vision. I'll describe how it might be extended to surface color perception by considering illuminant estimation as a cue combination process.
While the MWF model is motivated by normative considerations, it is primarily intended to guide experimental analysis of cue combination in human vision. I'll describe experimental methods that permit us to analyze cue combination in novel ways. In particular, these methods allow us to investigate the key issues mentioned above. I'll summarize recent experimental tests of the MWF framework for depth cue combination that use these methods, and describe work in progress relevant to surface color perception.
Zili Liu, NEC Research Institute
New modes of generalization in perceptual learning
The learning of many visual perceptual tasks, such as motion discrimination, has been shown to be specific to the practiced stimulus, and new stimuli have to be relearned from scratch. This specificity, found in so many different tasks, supports the hypothesis that perceptual learning takes place in early visual cortical areas. In contrast, using a novel paradigm in motion discrimination where learning has been shown to be specific, we found generalization. We trained subjects to discriminate the directions of moving dots, and verified that learning does not transfer from the trained direction to a new one. However, by tracking the subjects' performance across time in the new direction, we found that their rate of learning doubled. Moreover, after mastering the task with an easy stimulus, subjects generalized to a difficult stimulus in a new direction after a brief practice with the easy stimulus in that direction. This generalization required both the mastering and the brief practice. Thus learning in motion discrimination always generalizes to new stimuli. Learning is manifested in various forms: acceleration of learning rate, indirect transfer, or direct transfer. These results challenge existing theories of perceptual learning, and suggest a more complex picture in which learning takes place at multiple levels.
Object Perception—Robbie Jacobs, Moderator
David Knill, University of Pennsylvania
Bayes or bust: What can an 18th century preacher teach us about human visual perception?
Bayesian approaches are increasingly being applied to the computational solution of vision problems. Coincident with this development has been the growing appearance of "Bayesian" explanations of perceptual phenomena in human vision research. As is so commonly the case with "named" approaches to psychological problems (e.g. the Gibsonian approach, the computational approach, the Gestalt approach, etc.), the Bayesian approach to vision has engendered a great deal of heated, almost theological, debate. My goal in this talk is to move the discussion from the lofty heights of theology to a more pragmatic level. I will consider what the approach offers as a framework for practical research in human visual perception--particularly, its strengths and limitations as a framework for building testable theories of perceptual function. I will draw on existing psychophysical work to analyze the contributions of the Bayesian approach to human vision research and to sketch out future directions for its application. Particular emphasis will be placed on the use of ideal observers in the study of higher-level visual function and on Bayesian approaches to cue integration.
Ken Nakayama, Harvard University
Surface vs. image constraints in binocular space perception
Theories of binocular vision fall into two broad classes. Image based theories address the "correspondence" problem as the primary challenge, asking two questions: What image feature in one eye is matched to the same feature in the other, and how is this matching implemented? Scene based views argue that binocular vision should be considered more generally, asking what surfaces in the world are the most likely to have given rise to the image data.
In this talk, I outline a sample of four topics, assuming a scene based view of binocular vision. In particular I will (1) Examine binocular vision in terms of the generic view principle (2) Outline Da Vinci stereopsis (based on unpaired image regions) in terms of occlusion relationships (3) Extend the treatment of unpaired points, outlining a taxonomy of surface interpretations of such unpaired image regions in terms of border ownership assignments (4) Further question the necessity of binocular matching, specifically the polarity of edge assignments, for quantitative stereopsis.
Philip Kellman, University of California, Los Angeles
How the world gets to the mind: physical structure and perceptual process in seeing objects and surfaces
Certain stimulus relationships in space and time lead to segmentation of the optic array and connecting of separate visible regions into objects. These relationships reflect facts about both the physical world and about visual processing schemes. I will review some evidence for specific stimulus relationships in object perception, such as contour discontinuities and contour relatability, as well as edge and surface interactions, in 2D, 3D, and dynamic object completion. The focus will be on what the use of particular information sources tells us about ecology on one hand and process on the other. Finally, I will review some evidence about the developmental origins of sensitivity to these relationships. Is there any role for statistical learning in the emergence of basic object perception processes?
Neural Coding and Plasticity—Tatiana Pasternak, Moderator
William Geisler, University of Texas at Austin
Neural population performance of visual cortex for naturalistic stimuli
(with Duane G. Albrecht, Robert A. Frazor, & Alison M. Crane)
An ultimate goal of vision science is to understand the neural information processing of natural visual stimuli and to predict behavioral performance in natural visual tasks. We are attempting to make some progress toward this goal by quantitatively analyzing the responses of single neurons in primary visual cortex and by comparing the responses of populations of single neurons with behavioral performance. Our approach consists of three major steps. First, the responses of individual cortical neurons are measured for sine wave grating stimuli varied along a number of stimulus dimensions: spatial frequency, orientation, phase, contrast, temporal frequency, direction of motion. Second, the measured response means and standard deviations are fitted with a descriptive functional model (one for each neuron measured), which can be used to determine the response of the neuron to any sine wave stimulus, or to estimate the response of the neuron to any arbitrary stimulus (e.g., a complex natural scene). Third, the descriptive model for each measured neuron is combined with a decision/pooling rule to determine the performance of the neural population in some task. The performance of the neural population is then compared with behavioral performance in the same task. The validity of this approach depends upon several factors: the accuracy of the functional descriptive model, the degree of statistical independence of the single neuron responses, the form of the decision/pooling rule. After discussing these factors, I will describe results for contrast discrimination, spatial frequency discrimination and feature identification in complex natural images.
Nikos Logothetis, Max-Planck Institute, Germany
Object perception: Psychophysics and physiology in monkeys
The notion of the temporal lobe being involved in object recognition—as it emerged from clinical and lesion studies—has received strong support from previous electrophysiological experiments in monkeys. The results of these experiments showed that neurons in the inferior temporal cortex (IT) respond to a variety of complex two-dimensional patterns, including figures of animate objects, such as faces, hands, and body parts. To better understand the role of this area in object recognition, we set out to determine whether the configurational selectivity found for IT neurons is specific for faces or body parts, or whether it can be generated for any novel object as a result of extensive training.
Monkeys were trained to become "experts" at identifying exemplars of novel, computer-generated object classes. Critically, these objects had never been experienced by the monkeys, nor did they possess any inherent biological relevance. Nonetheless, after training, the animals learned to discriminate individual objects from a set of highly similar distractors, a task not unlike that of identifying a specific face or a particular bird species. Because all of the objects used in testing were composed of the same basic parts, good performance in this task had most likely to rely upon using holistic configurational information, and upon the detection of subtle shape differences.
Physiological recordings from individual neurons in IT revealed a subpopulation of cells that were activated selectively by views of these previously unfamiliar objects. Many neurons fired selectively for a small set of views of spheroidal or wire objects that the monkey had learned to recognize from all viewpoints. The cells were most active when the target was presented from one particular view, and their activity declined as the object was rotated in depth. Remarkably, these cells could not be consistently activated by any other tested object, including numerous visually similar distractors and views of the target more than about 45 deg away from the preferred view. Attempts to simplify the objects lead, in most cases, to a significant reduction of the neurons' responses, a finding suggesting that some neurons in IT show the type of specificity to complex configurations, which was previously described for faces or other animate objects. Testing IT neurons with stimuli that can be perceived more than one way showed that these cells discharge for their effective stimulus only when this stimulus is perceived. Phenomenal suppression of the preferred pattern caused almost invariably a profound suppression of the cells' activity. IT thus seems to be very closely related not only to shape representation but also to the conscious perception of a visual object.
Mriganka Sur, Massachusetts Institute of Technology
Vision, neural activity, and cortical development
Our experiments address the question of whether, and how, the pattern of afferent activity during development influences the function of cortical circuits. Routing visual projections to the auditory pathway in ferrets leads to visual activation of the developing auditory cortex, causing auditory cortex to receive patterns of input activity very different from normal. Visual inputs respecify the microcircuitry within primary auditory cortex, creating orientation-specific responses and maps of visual space and orientation-selective cells. They appear to alter the perceptual identity of primary auditory cortex, so that its activation is identified with visual stimuli. At the same time, several features of the cortex remain unaltered. Patterns of afferent activity thus profoundly influence cortical function, but they do not write on a blank slate.
Richard Aslin, Co-Director
Robert Jacobs, Co-Director