Our visual cortex can capture images and recognize objects in a fraction of a second, even if they are barely visible or only fragmentary. One reason for this fantastic peak performance is the highly efficient hierarchical layer architecture of the visual cortex. It filters the visual information, recognizes connections and completes the image using familiar patterns. The process behind this is still hardly understood in its complexity. It is true that deep learning algorithms now exist that can match or, in some cases, exceed human performance on certain pattern recognition tasks. One disadvantage of these algorithms, however, is that it is hard to understand what they have learned, how they work, or when they make mistakes.
Thomas Pock from the Institute of Computer Graphics and Vision at Graz University of Technology (TU Graz) was on the trail of this knowledge as part of his ERC Starting Grant project HOMOVIS (High Level Prior Models for Computer Vision). He worked intensively on the question of how known modes of operation of the visual cortex can be calculated using mathematical models and transferred to image processing applications. Five years of research, 41 publications and one patent later, the researcher and his research group have accumulated extensive knowledge that enables new image processing algorithms for a wide variety of applications.
Suggestions from Wertheimer and Euler
Pock based his work on Max Wertheimer's Gestalt laws of perception. The main founder of Gestalt psychology used these laws to try to explain the process of human vision, in which stimuli and sensory impressions are put together to form a large whole. "Humans can already correctly recognize partial or incomplete objects on the basis of single points or subjective contours (illusory contours). The human brain automatically fills in the missing image information. For example, by connecting the points via curves that are as smooth as possible," says Pock. Pock and his team described this phenomenon of shape finding for the first time using mathematical models based on Euler's elastic curves – a famous equation by the mathematician Leonhard Euler that can be used to calculate curves of minimum curvature.
Representation in a higher dimensional space
Based on Euler's elastic curves, Pock's group developed new algorithms to solve certain curvature-dependent image processing problems. Consequently, the solution is all the easier if the (2D) images and their features are represented as data points in three-dimensional space. "In the third dimension, we get an additional variable with the orientation of the object edges," Pock explains. This, too, is modelled on human vision and goes back to the pioneering work of two Nobel laureates, David Hubel and Torsten Wiesel, who established in 1959 that the visual cortex is composed of orientation-sensitive layers.
From a mathematical and computer science point of view, the biggest advantage of this three-dimensional embedding is that image processing problems can be solved using convex optimization algorithms. In mathematical optimization, the boundary between convex and non-convex optimization is considered as the great barrier that distinguishes solvable from unsolvable problems. "Thus, we are guaranteed to be able to calculate the best image for all the given input images – of course, only with respect to the mathematical model used," says Pock.
Now, Pock and his team are working on improved models that combine the known structural properties of the visual cortex with deep-learning algorithms. The goal is to develop models that perform as well as current deep-learning algorithms, but also allow a deeper understanding about the structures learned. Initial successes have already been achieved in the reconstruction of computer tomography and magnetic resonance images. "With the newly developed algorithms, it is now possible to reconstruct images with the highest quality despite less data being recorded. This saves time and computing power, and thus also costs," explains Pock.
The ERC research project HOMOVIS was funded by the European Research Council to the total amount of around 1.4 million euros. This research topic is anchored in the Field of Expertise “Information, Communication & Computing”, one of the five strategic research foci at TU Graz.
ERC grant holders at TU Graz can be found on our Website.
Univ.-Prof. Dipl.-Ing. Dr.techn.
TU Graz | Institute of Computer Graphics and Vision
Phone: +43 316 873 5056