Generalizing Models Using Retrieval Augmented Generation

We are at the frontier of significant advances in Artificial Intelligence. Needless to say, having a disruptive effect over multiple areas, that is both exciting, giving its unprecedented potential, and worrisome, due to the uncertainty of its impact - capable of generating human-like assays and convincing images and videos [1]. A recent study from McKinsey's 2023 Global Survey [2] highlighted this impact by showing that 40% of the respondents admitted increasing the investments in AI given the new advances in generative AI. However, AI is far from perfect, there are multiple issues to be addressed – many of them are discussed on the recent regulatory framework developed by EU, the AI Act [3]. Some issues of particular importance are that:

  • AI models, such as Large Language Models (LLMs), need to be retrained with new data if we want updated information, and,
  • Relying purely on the previously trained knowledge (parametric approach) may result in unreliable responses (referred to as "hallucinations").

One novel technique to remedy these issues is the method of Retrieval Augmented Generation (RAG) [4,5]. The core concept is to use the cognitive capabilities of Large Language Models (LLMs) to retrieve and process relevant information, rather than trying to answer directly using pre-trained data [6]. Despite looking like a simple concept, it brings groundbreaking changes to the field of natural language processing (NLP) by creating hybrid frameworks that produce context-aware responses [7]. You can conceptualise this idea by thinking about how much more capable LLMs could be if they had access to the entire corpus of knowledge available on the internet.

As a further example, consider how humans would attempt to answer a novel question about an unknown concept, such as a novel field of scientific research. Instead of using previous experience and memory, which one could compare to a model referencing its training data, an astute human would probably attempt to answer this question by:

  • Finding relevant information on the subject matter by referencing documentation and research articles, and,
  • Using this as context to develop an answer to this question.

In the same way, we can leverage the power of language models to do the same, where, for a given question we can provide relevant content and context which then can be used by the model to answer the question based on its interpretation of the input data.

This strategy was first devised in a seminal paper published by a team from Facebook AI in 2020, Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks [8], which details a non-parametric approach to work with LLMs. This approach avoids utilising a sole dependency on the trained model parametric response, and allows for generalised contexts to be provided to the model. As a result of this finding, RAG models were found to effectively merge pre-trained parametric (deriving from LLM’s pre-trained data) and non-parametric (deriving from external sources) approaches to effectively answer questions with a high degree of accuracy.

This opened several possibilities that go beyond giving reliable responses based on verified sources. In addition, a RAG model can give updated answers or even work with things that it was not initially trained for, that is, it can generalize. The former example is especially useful because some tasks that would require retraining (that can be exhaustive and expensive), can be achieved by a small adaptation to add contextually relevant information to accomplish the same objective.

In this framework, one fundamental step is information retrieval. Although it looks like a basic question, it can have a fundamental impact on the performance to achieve the task's objectives. We can do this with simple approaches, such as term frequency–inverse document frequency (tf-idf) to find similar content to what we have on the query or use complex, semantic-aware language models to create embedding vectors that can be later used to compare and find relevant documents [9]. However, we must be aware that finding relevant documents may be not just a matter of finding similar words or texts, being an intense research topic [10]. One example can be the analysis of a judicial case where we want to give relevant laws as context and how they can be applied to the case. If we just use the mere similarity of text, we may find and give as context similar cases, but not the laws that are relevant and would be helpful to the case. Therefore, for such a case we must design different approaches to retrieve information.

By utilising this methodology, we are able to produce models which surpass others specifically trained for a particular subject. For instance, in open-domain question answering tasks, the frameworks that use RAG perform significantly better than traditional models trained for these specific tasks [11]. This example demonstrates how RAG forms a powerful role in achieving more cognitively capable models which can generalise.

In conclusion, we believe that the future AI landscape may not just rely on huge, data-intensive models that are designed to answer all questions we might have. But, instead, will employ the cognitive power of such models to do something akin to humans: search, understand, and apply. By achieving this, we are not constrained either by the possible lack of content that the model has been trained on, or by the temporal discrepancies that novelties may cause. Paradoxically, these results show that we can improve artificial intelligence by adding native characteristics from human intelligence.

At Diffusion, we build and deploy advanced artificial intelligence models to solve consumer problems in the blockchain industry. RAG methodologies form a key component of our approach to solve these novel problems, and build models which generalise in key ways away from their training data. For more information, please contact us.

References

[1] https://www.technologyreview.com/2022/12/23/1065852/whats-next-for-ai/

[2] https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-in-2023-generative-AIs-breakout-year

[3] https://www.europarl.europa.eu/news/en/headlines/society/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

[4] https://arxiv.org/abs/2005.11401

[5] https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/#:~:text=Retrieval,how many parameters they contain

[6] https://research.ibm.com/blog/retrieval-augmented-generation-RAG

[7] https://www.datastax.com/guides/what-is-retrieval-augmented-generation

[8] https://arxiv.org/abs/2005.11401

[9] https://ar5iv.labs.arxiv.org/html/2301.08801

[10] https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/widm.1412

[11] https://arxiv.org/abs/2005.11401