RAG Faith & In-Context Recall

How Faithful are RAG Models?

This new paper by Wu et al. (2024) aims to quantify the tug-of-war between RAG and LLMs' internal prior.

It focuses on GPT-4 and other LLMs on question answering for the analysis.

It finds that providing correct retrieved information fixes most of the model mistakes (94% accuracy).

Source: Wu et al. (2024)

When the documents contain more incorrect values and the LLM's internal prior is weak, the LLM is more likely to recite incorrect information. However, the LLMs are found to be more resistant when they have a stronger prior.

The paper also reports that "the more the modified information deviates from the model's prior, the less likely the model is to prefer it."

So many developers and companies are using RAG systems in production. This work highlights the importance of assessing risks when using LLMs given different kinds of contextual information that may contain supporting, contradicting, or completely incorrection information.

LLM In-Context Recall is Prompt Dependent

This new paper by Machlab and Battle (2024) analyzes the in-context recall performance of different LLMs using several needle-in-a-haystack tests.

It shows that various LLMs recall facts at different lengths and placement depths. It finds that a model's recall performance is significantly affected by small changes in the prompt.

Source: Machlab and Battle (2024)

In addition, the interplay between prompt content and training data can degrade the response quality.

The recall ability of a model can be improved with increasing size, enhancing the attention mechanism, trying different training strategies, and applying fine-tuning.

Important practical tip from the paper: "Continued evaluation will further inform the selection of LLMs for individual use cases, maximizing their impact and efficiency in real-world applications as the technology continues to evolve."

The takeaways from this paper are the importance of careful prompt design, establishing a continuous evaluation protocol, and testing different model enhancement strategies to improve recall and utility.

RAG Faith & In-Context Recall

How Faithful are RAG Models?

LLM In-Context Recall is Prompt Dependent

Info@s-ai-l.org