top of page
Applications: Context-Caching with Gemini Flash 1.5

Applications: Context-Caching with Gemini Flash 1.5

Context Caching with Gemini 1.5 Flash

Google recently released a new feature called context-caching which is available via the Gemini APIs through the Gemini 1.5 Pro and Gemini 1.5 Flash models. This guide provides a basic example of how to use context-caching with Gemini 1.5 Flash.

https://youtu.be/987Pd89EDPs?si=j43isgNb0uwH5AeI

The Use Case: Analyzing a Year's Worth of ML Papers


The guide demonstrates how you can use context caching to analyze the summaries of all the ML papers we've documented over the past year. We store these summaries in a text file, which can now be fed to the Gemini 1.5 Flash model and query efficiently.


The Process: Uploading, Caching, and Querying

  1. Data Preparation: First convert the readme file (containing the summaries) into a plain text file.

  2. Utilizing the Gemini API: You can upload the text file using the Google generativeai library.

  3. Implementing Context Caching: A cache is created using the caching.CachedContent.create() function. This involves:

    • Specifying the Gemini Flash 1.5 model.

    • Providing a name for the cache.

    • Defining an instruction for the model (e.g., "You are an expert AI researcher...").

    • Setting a time-to-live (TTL) for the cache (e.g., 15 minutes).

  4. Creating the Model: We then create a generative model instance using the cached content.

  5. Querying: We can start querying the model with natural language questions like:

    • "Can you please tell me the latest AI papers of the week?"

    • "Can you list the papers that mention Mamba? List the title of the paper and summary."

    • "What are some of the innovations around long-context LLMs? List the title of the paper and summary."


The results were promising. The model accurately retrieved and summarized information from the text file. Context caching proved highly efficient, eliminating the need to repeatedly send the entire text file with each query.


This workflow has the potential to be a valuable tool for researchers, allowing them to:

  • Quickly analyze and query large amounts of research data.

  • Retrieve specific findings without manually searching through documents.

  • Conduct interactive research sessions without wasting prompt tokens.


We are excited to explore further applications of context caching, especially within more complex scenarios like agentic workflows.

Sail London gives you the know-how to turn prospects into loyal clients.

​

Discover in 20 mins how you can gain more use from instructional insights that last longer, build heightened client familiarity, and minimise your sales cycle.

Thank you for submitting

  • Black LinkedIn Icon
bottom of page