Neural-Symbolic Integration for Semantic Computer Vision

Neural networks have been part of the AI field since the late 1940s, but their popularity has waxed and waned over the decades. In recent years, multilayer hierarchical neural nets (better known as deep neural nets) have become extraordinarily popular due to their successes in analyzing various sorts of data, especially visual and auditory data.

A few AI researchers believe this particular tool can be refined into a universal practical AI solution and even into an architecture for artificial general intelligence. However, most AI practitioners realize that different courses require different horses. Deep neural nets are the best solution for some problems, but other problems (in particular those requiring transparent, symbolic reasoning) call for other AI techniques.

Symbolic AI approaches, such as logic engines, and program learning systems (which have been under development since the 1960s and 1980s, respectively) have historically demonstrated different strengths than neural networks. They have been better at generalization and abstraction, at planning processes (either in the physical world or in the domains of discourse and science), and at formulating novel high-level hypotheses.

For example, although computer vision tasks can theoretically be formulated as tasks of logical reasoning starting at the pixel level, such reasoning would be hopelessly inefficient. Neural nets shine when applied to computer vision tasks. By contrast, it is hard to imagine neural networks alone forming automated theorem provers.

As we aim for a more flexible, broader intelligence, the need for both symbolic and neural components becomes clearer. Ultimately, the development of artificial general intelligence will most likely require a hybrid approach, and there are almost no purely symbolic or purely emergent (subsymbolic, neural) cognitive architectures. Most architectures have elements of both, although the symbolic/subsymbolic gap is far from being fully bridged.

The field of “neural-symbolic AI” explores methodologies for combining neural network and symbolic approaches into unified AI systems that manifest the strengths of both approaches. Recent mathematical advances in AI theory using tools such as algorithmic information theory and probabilistic programming provide a coherent conceptual and formal framework in which to pursue this integration. Osiris AI scientist Dr. Alexey Potapov has carried out a significant body of both theoretical and practical research in this direction. Click here to see some of the relevant output of his lab at ITMO University in St. Petersburg before he joined Osiris in 2018.

The necessity for deep neuro-symbolic integration can be seen in the example of the image-understanding (or semantic-vision) problem. On the one hand, vision cannot be considered as a peripheral module that merely forms an input to the symbolic AI system. On the other hand, even in the vision domain, which is most favorable for deep learning, purely neural systems are insufficient to capture compositional structure and to perform reasoning (especially if transparent, interpretable results are desirable).

It should also be noted that even image classification systems can benefit from external knowledge graphs. Consider the problem of learning visual concepts and their relations: it might be necessary to both integrate neural networks with symbolic models and modify traditional neural network formalisms. Tasks such as visual question-answering (e.g., asking an AI, “What is the cat in the photo wearing?”) require more top-down compositional reasoning integrated into the bottom-up image processing.

Last updated