Beyond Input-Output Reasoning: Four Key Properties of Cognitive AI

Published August 16, 2022

Image credit: James Thew via Adobe Stock.

Gadi Singer is Vice President and Director of Emergent AI Research at Intel Labs leading the development of the third wave of AI capabilities.

The Necessity of World Model, Theory of Mind, Continual Learning, and Late-Binding Context

AI research can be a humbling experience — some even claim that it remains at a relative standstill when it comes to replicating the most fundamental aspects of human intelligence. It can correct spelling, make financial investments, and even compose music; but it cannot alert someone to the fact that they have their t-shirt on inside out without being explicitly ”taught” to do so. More to the point, it cannot discern why this might be useful information to have. And yet a five-year-old, just beginning to dress herself, will notice and point out the white tag at the base of her father’s neck.

Most researchers in the AI field are well aware that amid the numerous advances in deep neural networks, the gap between human intelligence and AI has not significantly narrowed, and solutions allowing for computationally efficient adaptive reasoning and decision-making in complex real-life environments remain elusive. Cognitive AI, that is intelligence that would allow machines to comprehend, learn and perform intellectual tasks similarly to humans, remains as elusive as ever. In this blog post, we will explore why this chasm exists and where AI research must go if we are to have any hope of crossing it.

Why ‘Mr. AI’ Can’t Hold a Job

How great would it be to work side-by-side with an AI assistant that could shadow us throughout the day, picking up our slack? Imagine how wonderful it would be if the algorithms could truly unburden us humans from the “drudge” tasks of our workday so we could focus on the more strategic and/or creative aspects of our jobs? The problem is, with a partial exception for systems like GitHub Copilot, a fictional ‘Mr. AI’ based on current state-of-the-artwould likely receive its pink slip before the workday’s end.

For starters, Mr. AI is painfully forgetful, particularly when it comes to contextual memory. It also suffers from a crippling lack of attention. To some, this may seem surprising given the extraordinary large language models (LLM) of today, including LaMDA and GPT-3, which in some situations appear like they could well be conscious. However, even with the most advanced state-of-the-art deep learning models, Mr. AI’s work performance will invariably fall short of expectations. It doesn’t adapt well to changing environments and demands. It cannot independently ascertain that advice it provides is epistemologically and factually sound. It cannot even come up with a simple plan! And no matter how carefully engineered its social skills are, it’s bound to stumble in a highly dynamic world with complex social and ethical norms. It simply doesn’t have what it takes to thrive in a human world.

But what is that?

Four Key Properties of Advanced Intelligence

To impart more human-like intelligence to machines, one must first explore what makes human intelligence distinct from many current (circa 2022) neural networks typically used in AI applications. One way of drawing such a distinction is through the following four properties:

1. World Model

Humans naturally develop a “world model” that allows them to envision an endless number of short- and long-term “what if” scenarios that inform their decision-making and actions. AI models could become vastly more efficient with a similar capability, one that would allow them to simulate potential scenarios from beginning to end in a resource-efficient manner. An intelligent mechanism needs to model a complex environment with multiple interacting individual agents. An input-to-output mapping function (such as can be achieved with a complex feed-forward network), needs to “unroll” all the potential paths and interactions. The complexity of such an input-to-output unrolled model in a real-life environment would quickly explode in complexity, especially when accounting for tracking arbitrary duration of relevant history for each of the agents. In contrast, an intelligent mechanism that can model each of the factors and agents independently in a simulation environment, can evaluate numerous what-if future scenarios and grow the model by replicating copies of the actors, each with their knowable relevant history and behavior.

Key to acquiring a world model with such capacity for simulation is the decoupling between the construction of the building blocks of the world model (an epistemic reasoning process) and their subsequent use in simulation of possible outcomes. The resulting “what-if” scenarios could then be compared in a way that remains consistent even if the simulation methodology changes over time. A special case example of such approach can be found in Digital Twins, where the machine is equipped (through self-learning or explicit design) with a model of its environment and can simulate the potential futures of various interactions.

Figure 1. Digital Twin technology creates multifaceted models of complex actors in an interactive setting. Image credit: chesky via Adobe Stock

Intelligent beings and machines use models of the world (‘World Views’) to make sense of observations and assess potential futures to select the best course of action. In transitioning from a ‘generic’ large-scale setting (like replying to web queries) to direct interaction in a particular setting that includes multiple actors, the world model must be effectively scaled and customizable. A decoupled, modular, customizable approach is a logically different and far less complex architecture than one that attempts to simulate and reason all in a single “input-output function” step.

2. Theory of Mind

“Theory of mind” refers to a complex mental skill that has been defined by cognitive science as a capacity to understand and predict the actions and beliefs of another person by tracking that person’s attention and ascribing a mental state to them.

In the simplest of terms, it is what we do when we attempt to read another person’s mind. We develop and use this skill throughout our lives to help us navigate our social interactions. It’s why we don’t remind our dieting co-worker of the giant plate of fresh chocolate chip cookies in the breakroom.

We see traces of the theory of mind being used in AI applications such as chatbots that are uniquely attuned to the emotions of the customer they are serving, based on the reason they opened the chat, the language they are using, etc. However, performance metrics used to train such social chatbots — typically defined as conversation-turns per session, or CPS — merely train the model to maximize the attention from the human, and do not force the system to develop an explicit model of the human’s mind as measured by reasoning and planning tasks.

Figure 2. AI system with Theory of Mind capability can modify its output based on end user needs and preferences. Image credit: Intel Labs.

 

In a system that needs to interact with a particular set of individuals, theory of mind would require a more structured representation that is amenable to logical reasoning operations such as those employed in deductive and inductive reasoning, planning, inference of intent, and so on. Moreover, such a model would have to keep track of the varied behavioral profiles, be predictably updateable with the influx of new information and avoid relapsing into a previous model state.

3. Continual learning

With some exceptions, the standard machine learning paradigm of today is batched, offline learning, potentially followed by fine-tuning to the specific task. The resulting models are therefore unable to extract useful long-term updates from the information they are being exposed to while deployed. Humans have no such limitation. They learn continuously and use this information to build cognitive models like world views and theories of mind. In fact, continual learning is what enables humans to maintain and update their mental models.

The problem of continual learning (also dubbed lifelong and continuous learning) is now garnering much stronger interest in the AI research community, in part due to the practical demands brought by the advent of technologies like federated learning and workflows like medical data processing. An AI system that employs a world model of an environment it operates in, with theories of mind associated with the various agents in that environment, a continual learning capability would be critical to maintain and update historical and current state descriptors for each object and agent.

While the industry’s need is very clear, much remains to be done. Specifically, solutions that enable continual learning of information to then be used for reasoning or planning are still in their nascency — and such solutions would be required to enable the model-building capabilities above.

4. Late binding context

Late binding context refers to the combination of contextually specific (rather than generic) responses and utilizes the latest relevant information available at the time of query or decision. Contextual awareness embodies all the subtle nuances of human learning — it is the ‘who’, ‘why’, ‘when’ and ‘what’ that inform human decisions and behavior. It keeps humans from resorting to reasoning shortcuts and jumping to imprecise, generalized conclusions. Rather, contextual awareness allows us to build an adaptive set of behaviors tailored to the specific state of environment that needs to be addressed. Without this capability, our decision-making would be greatly impaired. Late binding context is also closely intertwined with continuous learning. For more information on late binding context please see the previous blog, Advancing Machine Intelligence: Why Context Is Everything.

Human Cognition as a Roadmap to the Future of AI

Without the key cognitive capabilities listed above, many of the critical needs of human industry and society will not be met. There is therefore a fairly urgent need to apportion more research to the translation of human cognitive capabilities into AI functionality — in particular those properties that make it unique from current AI models. The four properties listed above are a starting point. They lay at the center of a complex web of human cognitive function and provide a path towards computationally efficient adaptive models that can be deployed in real-life multi-actor environments. As AI proliferates from centralized, homogenized large models to the multitude of uses that are integrated within socially complex settings, the next set of properties will need to emerge.

References

Mitchell, M. (2021). Why AI is harder than we think. arXiv preprint arXiv:2104.12871. https://arxiv.org/abs/2104.12871
Marcus, G. (2022, July 19). Deep Learning Is Hitting a Wall. Nautilus | Science Connected. https://nautil.us/deep-learning-is-hitting-a-wall-14467/
Singer, G. (2022, January 7). The Rise of Cognitive AI — Towards Data Science. Medium. https://towardsdatascience.com/the-rise-of-cognitive-ai-a29d2b724ccc
Ziegler, A., Kalliamvakou, E., Li, X. A., Rice, A., Rifkin, D., Simister, S., … & Aftandilian, E. (2022, June). Productivity assessment of neural code completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (pp. 21–29). https://dl.acm.org/doi/pdf/10.1145/3520312.3534864
Malone, T. W., Rus, D., & Laubacher, R. (2020). Artificial intelligence and the future of work. A report prepared by MIT Task Force on the work of the future, Research Brief, 17, 1–39. https://workofthefuture.mit.edu/research-post/artificial-intelligence-and-the-future-of-work/
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H. T., … & Le, Q. (2022). Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239. https://arxiv.org/pdf/2201.08239.pdf
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165
Curtis, B., & Savulescu, J. (2022, June 15). Is Google’s LaMDA conscious? A philosopher’s view. The Conversation. https://theconversation.com/is-googles-lamda-conscious-a-philosophers-view-184987
Dickson, B. (2022, July 24). Large language models can’t plan, even if they write fancy essays. TechTalks. https://bdtechtalks.com/2022/07/25/large-language-models-cant-plan/
Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35. https://dl.acm.org/doi/abs/10.1145/3457607
LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence Version 0.9. 2, 2022–06–27. https://openreview.net/pdf?id=BZ5a1r-kVsf
El Saddik, A. (2018). Digital twins: The convergence of multimedia technologies. IEEE multimedia, 25(2), 87–92. https://ieeexplore.ieee.org/abstract/document/8424832
Frith, C., & Frith, U. (2005). Theory of mind. Current biology, 15(17), R644-R645. https://www.cell.com/current-biology/pdf/S0960-9822(05)00960-7.pdf
Apperly, I. A., & Butterfill, S. A. (2009). Do humans have two systems to track beliefs and belief-like states?. Psychological review, 116(4), 953. https://psycnet.apa.org/doiLanding?doi=10.1037%2Fa0016923
Baron-Cohen, S. (1991). Precursors to a theory of mind: Understanding attention in others. Natural theories of mind: Evolution, development and simulation of everyday mindreading, 1, 233–251.
Wikipedia contributors. (2022, August 14). Theory of mind. Wikipedia. https://en.wikipedia.org/wiki/Theory_of_mind
Shum, H. Y., He, X. D., & Li, D. (2018). From Eliza to XiaoIce: challenges and opportunities with social chatbots. Frontiers of Information Technology & Electronic Engineering, 19(1), 10–26. https://link.springer.com/article/10.1631/FITEE.1700826
Zhou, L., Gao, J., Li, D., & Shum, H. Y. (2020). The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics, 46(1), 53–93. https://direct.mit.edu/coli/article/46/1/53/93380/The-Design-and-Implementation-of-XiaoIce-an
Reina, G. A., Gruzdev, A., Foley, P., Perepelkina, O., Sharma, M., Davidyuk, I., … & Bakas, S. (2021). OpenFL: An open-source framework for Federated Learning. arXiv preprint arXiv:2105.06413. https://arxiv.org/abs/2105.06413
Vokinger, K. N., Feuerriegel, S., & Kesselheim, A. S. (2021). Continual learning in medical devices: FDA’s action plan and beyond. The Lancet Digital Health, 3(6), e337-e338. https://www.thelancet.com/journals/landig/article/PIIS2589-7500(21)00076-5/fulltext
Singer, G. (2022b, May 14). Advancing Machine Intelligence: Why Context Is Everything. Medium. https://towardsdatascience.com/advancing-machine-intelligence-why-context-is-everything-4bde90fb2d79

Publié dans Non classé | Laisser un commentaire

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

We developed VL-InterpreT, an interactive tool that provides novel visualizations and analysis for interpreting the attentions and hidden representations in multimodal transformers. Our paper on VL-InterpreT won the Best Demo Award at CVPR 2022.

Publié dans Non classé | Laisser un commentaire

A Guide to AI Partner and Accelerator Programs for Startups

What to look for in AI Partner and Accelerator Programs for Startups and Collaborating with Intel on AI

Publié dans Non classé | Laisser un commentaire

Enabling AI Developers on Their Journey to Scale with Intel AI

Overview and link to Ramtin Davanlou, from Accenture* presentation at the oneAPI DevSummit for AI.

Publié dans Non classé | Laisser un commentaire

Multimodality: A New Frontier in Cognitive AI

In this blog, we will introduce the concept of multimodal learning along with some of its main use cases, and discuss the progress made at Intel Labs towards creating of robust multimodal reasoning systems.

Publié dans Non classé | Laisser un commentaire

Accelerate Deep Learning with Intel® Optimization for TensorFlow*

Intel and Google* have been collaborating to deliver optimized implementations of some of the most compute-intensive TensorFlow operations.

Publié dans Non classé | Laisser un commentaire

Conceptualization as a Basis for Cognition — Human and Machine

To pursue this path to better AI, it is essential to understand what “understanding” really means for the human brain.

Publié dans Non classé | Laisser un commentaire

Best Practice for Text-Classification with Distillation Part (4/4)

In this post, I present Tango architecture, a simple cascade student-teacher model, and exploit the simplicity of task instances to gain maximum throughput for text classification.  

Publié dans Non classé | Laisser un commentaire

No One Rung to Rule Them All: Addressing Scale and Expediency in Knowledge-Based AI

Published October 26, 2021 

Image adapted from Elentaris Photo / Shutterstock.com

Gadi Singer is Vice President and Director of Emergent AI Research at Intel Labs leading the development of the third wave of AI capabilities.

Cognitive AI, hierarchy of knowledge, and the answer to the unsustainable growth of all-inclusive deep-learning models

Can we drive effectiveness and efficiency of AI at the same time?

If we want our systems to be more intelligent, do they have to become more expensive? Our goal should be to significantly increase the capabilities and improve the results of AI technologies while minimizing power and system cost, not by increasing it.

Achieving this could be possible if we follow the architectural design observed time and again in natural control systems, that is, a hierarchy of specialized levels. This article challenges the single neural network’s current large language model (LLM) approach, which attempts to encompass all world knowledge. I will posit that operational efficiency within a heterogeneous environment at multiple orders of magnitude requires a tiered architecture.

Generative Pre-trained Transformer 3 (GPT-3) can correctly answer many factual questions. For example, it knows the location of the 1992 Olympics; it can also name the daughters of the 44th president, the code for the San Francisco airport or the distance between the earth and the sun. However, as exciting as this might sound, as a long-term trend, such memorization of endless factoids is as much a ‘bug’ as it is a ‘feature.’

Figure 1: GPT-3 is pretty good at answering trivia questions. (image credit: Intel Labs)

The exponential growth of neural network-based models has been a hot discussion topic over the past couple of years, and many researchers are warning about their computationalunsustainability. Variants of Figure 2 are now a common occurrence in the media, and it is becoming clear that it’s only a matter of time before the AI industry is forced to seek a different approach to scaling the capabilities of its solutions. This growth trend will be further driven by the emergence of multi-modal applications where the model has to capture both linguistic and visual information.

 Figure 2: The exponential growth of neural network models cannot be sustained.

A key reason for this exponential growth in computational cost is that additional capabilities are being achieved through heavy overparameterization and vast training data sets. According to recent research done at MIT, this approach requires k² more data and k⁴ more computational cost to achieve a k-fold improvement.

A large modern neural network has, a consolidated way of storing information — its parametric memory. While this type of storage can be vast and the algorithms of its encoding very sophisticated, maintaining a uniform speed of access to an exploding number of parameters is driving up the computational cost. Such a monolithic storage style is a highly inefficient approach at the very large scale and is not something commonly seen in the architecture of computer hardware, search algorithms or even in nature.

The key insight is that single-tier systems will be fundamentally disadvantaged due to their inherent trade-off between scale and expediency. The larger the scale of items (such as chunks of information) that the system has to deal with, the slower the access to those items at inference time, assuming that computational capacity is costly or constrained. On the other hand, if the system architects choose to prioritize speed of access, the cost of retrieval can be very high. For example, GPT-3 costs 6 cents per 1000 tokens (roughly 750 words) of queries. Aside from inference time, encoding vast amounts of information into parametric memory is also very resource intensive. GPT-3 reportedly cost at least $4.6 million to train. The scale of world information relevant to AI systems is huge when considering the combination of language, visual and other knowledge sources. Even the most impressive SOTA models would pale in comparison with what could be achieved if all of that information became accessible for processing and retrieval by AI.

The way to address the contention between cost-effective expediency and ensuring access to all the needed information is to establish a tiered access structure. Important and frequently-required information should be most accessible, while less frequently used information needs to be stored in progressively slower repositories.

The architecture in modern computers is similar. CPU registers and cache utilize information within one nanosecond or less but have the least amount of storage space (on the scale of megabytes). Meanwhile, larger constructs like shared platform storage can retrieve information orders of magnitude slower (~100ms), but are also proportionally larger in capacity (petabyte-scale) and cheaper in terms of cost per bit.

Figure 3: In computer systems, high-speed modules are also the most expensive and have the lowest capacity.

Similarly, nature also shows abundant examples of tiered architectures. One example is how human bodies access energy. There is a readily available supply of glucose circulating in the bloodstream, which can instantly convert to ATP. Glycogen stored in the liver, as well as fatty and muscle tissue, offer somewhat slower access. If the internal stores are depleted, the body can use an external energy source (food) to replenish them. Human bodies do not store all energy to be instantaneously accessible, since doing so would be prohibitively biologically expensive.

Figure 4: Efficient systems in nature are tiered.

If there is anything to learn from these examples, there is no one rung to rule them all — systems that operate at multiple scales of magnitude and scope will need to be stratified to be feasible. As this principle is applied to information in AI, the trade-off between expediency and scale can be addressed by creating a stratified architecture for our knowledge systems. This prioritization of information access is the design constraint that systems like large language models are violating by claiming to deliver scope and expediency uniformly.

As previously described in the Thrill-K post, one approach to solving the prioritization constraint is dividing information used by the system (or, eventually, its knowledge) into three tiers: instantaneous, standby and retrieved external knowledge. The order of magnitude of these information tiers could roughly correspond to giga-, tera- and zetta- scale, as illustrated in Figure 5.

Figure 5. Information used by an AI system can be broken down into three tiers.

The choice of magnitudes in the above tiering is not arbitrary. According to Cisco, IP traffic will reach an annual run rate of 3.3 zettabytes in 2021. An average capacity of hard disk drives will be 5.4 terabytes in 2021, which is also in line with the amount of data stored in what can be considered a very large database (VLDB). The full T5 model has 11 billion parameters.

In AI systems of the future, the instantaneous knowledge tier can be directly stored in the parametric memory of the neural network. In contrast, standby knowledge can be refactored and stored in an adjacent, integrated structured knowledge base. When these two sources are insufficient, the system should retrieve external knowledge, for example, by searching through a background corpus or using sensors and actuators to collect data from its physical environment.

With tiered knowledge-based systems, a sustainable cost of scaling can be achieved. Evolutionary pressure has forced the human brain to maintain efficiency and energy consumption while growing its capabilities. Similarly, design constraints need to be applied to AI system architectures to stratify the distribution of computational resources coupled with varied expediency of information retrieval. Only then can AI systems reach the full scope of cognitive capabilities demanded by the human industry and civilization without infringing on its resources.

References

Thompson, N. C., Greenewald, K., Lee, K., & Manso, G. F. (2021, October 1). Deep Learning’s Diminishing Returns. IEEE Spectrum. https://spectrum.ieee.org/deep-learning-computational-cost

Wiggers, K. (2020, July 16). MIT researchers warn that deep learning is approaching computational limits. VentureBeat. https://venturebeat.com/2020/07/15/mit-researchers-warn-that-deep-learning-is-approaching-computational-limits/

Thompson, N. C. (2020, July 10). The Computational Limits of Deep Learning. ArXiv.Org. https://arxiv.org/abs/2007.05558

OpenAI API. (2021). https://beta.openai.com/pricing

Li, C. (2020, September 11). OpenAI’s GPT-3 Language Model: A Technical Overview. Lambda Blog. https://lambdalabs.com/blog/demystifying-gpt-3/#:%7E:text=1.%20We%20use,M

Singer, G. (2021, July 29). Thrill-K: A Blueprint for The Next Generation of Machine Intelligence. Medium. https://towardsdatascience.com/thrill-k-a-blueprint-for-the-next-generation-of-machine-intelligence-7ddacddfa0fe

Cisco. (2016). Global — 2021 Forecast Highlights. https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2021_Forecast_Highlights.pdf

Statista. (2021, August 12). Seagate global average hard disk drive capacity FY2015–2021. https://www.statista.com/statistics/795748/worldwide-seagate-average-hard-disk-drive-capacity/

Wikipedia contributors. (2021, October 16). Very large database. Wikipedia. https://en.wikipedia.org/wiki/Very_large_database

Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer. (2020, February 24). Google AI Blog. https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html#:%7E:text=T5%20is%20surprisingly%20good%20at,%2C%20and%20Natural%20Questions%2C%20respectively.

 

 

 
Publié dans Non classé | Laisser un commentaire

Intel® Labs’ Graph Neural Networks Research Featured as Groundbreaking Work in AI

Research finds that reversible Graph Neural Networks (RevGNNs) significantly outperform existing methods on multiple datasets and improve large models’ memory efficiency for AI applications.

Publié dans Non classé | Laisser un commentaire