Is 2024 year going to be AI RAG year?

What is Retrieval-Augmented Generation (RAG)?

Before we dive into discussion weather RAG will be a big thing of year 2024, let's take a step back and do a brief intro of what RAG is.

Pre-trained large language models (LLMs) have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results in process of understanding, transforming, and generating texts. Nevertheless, their capacity to access and finely control information remains restricted, resulting in a performance gap on knowledge-intensive tasks when compared to architectures designed for specific tasks. Furthermore, there are ongoing research challenges related to offering clear origins for their decisions and keeping their knowledge base up-to-date. Investigating pretrained models equipped with a differentiable mechanism for accessing explicit non-parametric memory has been primarily focused on extractive downstream tasks thus far.

Retrieval-Augmented Generation (RAG) is approach in natural language processing that combines the strengths of both information retrieval and language generation models. RAG integrates a pre-trained language model, such as a variant of OpenAI's GPT, with a retriever component. This retriever is responsible for efficiently retrieving relevant information from a large external knowledge base during the generation process.

The architecture of RAG involves two main components: the retriever and the generator. The retriever focuses on efficiently selecting and retrieving pertinent information from a diverse knowledge source, typically a massive collection of documents. Meanwhile the generator is a language model capable of producing coherent and contextually relevant text. By combining these two components, RAG enhances the model's ability to generate high-quality responses by incorporating information from an external knowledge base, making it particularly effective in tasks that require context-aware responses.

RAG has demonstrated its effectiveness in various natural language understanding and generation tasks, such as question-answering and dialogue systems. This approach leverages the power of pre-trained language models for creative text generation while incorporating the advantages of information retrieval to ensure the accuracy and relevance of the generated content. As a result, RAG represents a significant advancement in the field, showcasing the potential of combining retrieval and generation techniques to address complex language processing challenges.

One of the authors of paper: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (2020.), Patrick Lewis, coined the term and same time apologized for the unflattering acronym that now describes a growing family of methods across hundreds of papers and dozens of commercial services he believes represent the future of generative AI.

Is RAG going to be "bullish" or "bearish"?

It is clear that we are at first AI juncture. The evolution of Language Models (LLMs) like ChatGPT has reached an exceptionally advanced and advantageous stage, marking what appears to be a transition into a post-Before ChatGPT (BC) era, often referred to as the "Generative Era" or GE. This progression signifies that LLMs have achieved sufficient maturity to integrate with business-specific data such as knowledge bases and databases, leading to the emergence of new use cases.

In comparison to foundational LLMs, Retrieval-Augmented Generation (RAG) carries significantly greater implications for both businesses and consumers. It can be likened to the transformation of raw oil into refined fuel that powers vehicles. While the foundational LLMs are noteworthy, the amalgamation of LLM capabilities with domain-specific knowledge unleashes true benefits for businesses.

This synergy between LLMs and knowledge, embodied in RAG, holds the potential to revolutionize various aspects of business, from customer support and employee productivity to AI-enhanced workflows. As businesses and consumers become increasingly aware of its transformative effects, the combined power of LLMs and knowledge is poised to generate substantial revenue and productivity gains across diverse sectors.

For example, a generative AI model supplemented with market data could be a great assistant for a financial analysts. In fact, almost any business can turn its technical or policy manuals, videos or logs into resources called knowledge bases that can enhance LLMs. These sources can enable use cases such as customer or field support, employee training and developer productivity.

The broad potential is why companies including AWS, IBM, Glean, Google, Microsoft, NVIDIA, Oracle and Pinecone are adopting RAG.

Similarly, when the power of the LLM is combined with knowledge, the true benefits to businesses start getting unleashed.

Are there winners?

There are three main categories that can clearly benefit of using RAGs:

  1. "No-code" systems - these systems empower individuals with limited or no programming expertise to create functional applications or automate processes without writing traditional code. This democratization of software development is reshaping the landscape, allowing non-technical users to actively participate in building digital solutions. It is driven by the growing demand for agility and flexibility in the digital era. Organizations are recognizing the value of empowering a broader spectrum of employees to actively engage in the development process. This trend is fostering innovation, increasing efficiency, and promoting a collaborative approach to problem-solving. No-code systems are becoming integral tools in the toolkit of businesses seeking to adapt quickly to changing requirements and capitalize on the collective intelligence within their teams.

    The friction and barrier to entry is virtually zero with even non-technical people being able to create sophisticated generative AI chatbots.

  2. "RAG APIs" - With the release of OpenAI’s new Assistants API, which has some very limited built-in RAG and other more sophisticated RAG APIs, businesses — with very little effort — can create sophisticated generative AI chatbot functionality and workflows using their own data, website content and account specific data.

    As more developers start understanding the power these APIs, more RAG-based systems and workflows are going to start appearing.

    If 2023 was the year of the OpenAI wrapper applications, 2024 will be the year of the RAG wrapper applications.

    They might have sophisticated names, like “Custom GPTs” or “Augmented GPTs” or maybe some thought leader or journalist might even come up with a better name.

  3. "RAG templates" - as already explained, through vector search, RAG identifies and retrieves pertinent documents from databases, which it uses as context sent to the LLM along with the query, thereby improving the LLM's response quality. This approach decreases inaccuracies by anchoring responses in factual content and ensures responses remain relevant with the most current data. RAG optimizes token use without expanding an LLM's token limit, focusing on the most relevant documents to inform the response process. The template offers a developer-friendly approach to crafting and deploying chatbot applications tailored to specific data sets.

    Templates present a selection of reference architectures that are designed for quick deployment, available to any user. These templates introduce an innovative system for the crafting, exchanging, refreshing, acquiring, and tailoring of diverse chains and agents. They are crafted in a uniform format for smooth integration with LangServe, enabling the swift deployment of production-ready APIs. Additionally, these templates provide a free sandbox for experimental and developmental purposes.

  4. "Workflows" - as we approach the conclusion of 2023, there is a notable trend among cloud platforms, where the integration of API-based workflows is becoming increasingly prevalent. These workflows play a pivotal role in facilitating access to account-level data and enabling the implementation of Retrieval-Augmented Generation (RAG) based workflows.

    The significance of these workflows is exemplified by their ability to simplify processes like capturing HTML form inputs and subsequently generating PDF documents based on the provided information. Imagine the seamless creation of dynamic travel itineraries or invoices in PDF format, incorporating a generative AI component. PDF generation is just one facet of this evolution. Consider any workflow where the fundamental data flow is now enhanced by the integration of generative AI content. The year 2023 witnessed widespread enthusiasm for Large Language Models (LLMs), but the practical applications and advantages for end users are anticipated to experience substantial growth and realization in the year 2024.

Conclusion

ChatGPT and LLMs has the mind-blowing nature. This stems from their ability to understand, generate, and adapt to human language in ways that were previously considered challenging for machines. Keep in mind that individuals like John Doe on Wall Street might not have had much interest in ChatGPT or Large Language Models (LLMs). However, as we enter 2024 and RAG-based applications begin to directly impact whole humanity, the significance becomes tangible.

By addressing the limitations of traditional models, RAG paves the way for a future where AI-powered applications are not just powerful but also trustworthy, conducting in a new era of human-AI collaboration.