Learn AI - Guides & Articles
Introduction: LLM Settings
Tweaking these settings are important to improve reliability and desirability of responses and it takes a bit of experimentation to figure out the proper settings for your use cases. Below are the common settings you will come across when using different LLM providers.
Introduction: Prompting Basics
You can achieve a lot with simple prompts, but the quality of results depends on how much information you provide it and how well-crafted the prompt is. A prompt can contain information like the instruction or question you are passing to the model and include other details such as context, inputs, or examples. You can use these elements to instruct the model more effectively to improve the quality of results.
Introduction: Prompt Elements
As we cover more and more examples and applications with prompt engineering, you will notice that certain elements make up a prompt.
A prompt contains any of the following elements:
- Instruction - a specific task or instruction you want the model to perform.
- Context - external information or additional context that can steer the model to better responses.
- Input Data - the input or question that we are interested to find a response for.
Introduction: Prompt Design Tips
As you get started with designing prompts, you should keep in mind that it is really an iterative process that requires a lot of experimentation to get optimal results. Using a simple playground from OpenAI or Cohere is a good starting point.
Introduction: Example Prompts
This section will provide more examples of how to use prompts to achieve different tasks and introduce key concepts along the way. Often, the best way to learn concepts is by going through examples. The few examples below illustrate how you can use well-crafted prompts to perform different types of tasks.
Prompting Techniques: Zero-shot
Large language models (LLMs) today, such as GPT-3.5 Turbo, GPT-4, and Claude 3, are tuned to follow instructions and are trained on large amounts of data. Large-scale training makes these models capable of performing some tasks in a "zero-shot" manner. Zero-shot prompting means that the prompt used to interact with the model won't contain examples or demonstrations. The zero-shot prompt directly instructs the model to perform a task without any additional examples to steer it.
Prompting Techniques: Few-Shot
While large-language models demonstrate remarkable zero-shot capabilities, they still fall short on more complex tasks when using the zero-shot setting. Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.
Prompting Techniques: Chain-Of-Thought
Introduced in Wei et al. (2022), chain-of-thought (CoT) prompting enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.
Prompting Techniques: Meta Prompting
Meta Prompting is an advanced prompting technique that focuses on the structural and syntactical aspects of tasks and problems rather than their specific content details. This goal with meta prompting is to construct a more abstract, structured way of interacting with large language models (LLMs), emphasizing the form and pattern of information over traditional content-centric methods.
Prompting Techniques: Self-Consistency
Perhaps one of the more advanced techniques out there for prompt engineering is self-consistency. Proposed by Wang et al. (2022), self-consistency aims "to replace the naive greedy decoding used in chain-of-thought prompting". The idea is to sample multiple, diverse reasoning paths through few-shot CoT, and use the generations to select the most consistent answer. This helps to boost the performance of CoT prompting on tasks involving arithmetic and commonsense reasoning.
Prompting Techniques: Generated Knowledge Prompting
LLMs continue to be improved and one popular technique includes the ability to incorporate knowledge or information to help the model make more accurate predictions. Using a similar idea, can the model also be used to generate knowledge before making a prediction? That's what is attempted in the paper by Liu et al. 2022 - generate knowledge to be used as part of the prompt. In particular, how helpful is this for tasks such as commonsense reasoning?
Prompting Techniques: Prompt Chaining
To improve the reliability and performance of LLMs, one of the important prompt engineering techniques is to break tasks into its subtasks. Once those subtasks have been identified, the LLM is prompted with a subtask and then its response is used as input to another prompt. This is what's referred to as prompt chaining, where a task is split into subtasks with the idea to create a chain of prompt operations.
Prompting Techniques: Tree of Thoughts (ToT)
For complex tasks that require exploration or strategic lookahead, traditional or simple prompting techniques fall short. Yao et el. (2023) and Long (2023) recently proposed Tree of Thoughts (ToT), a framework that generalizes over chain-of-thought prompting and encourages exploration over thoughts that serve as intermediate steps for general problem solving with language models.
Prompting Techniques: Retrieval Augmented Generation (RAG)
For more complex and knowledge-intensive tasks, it's possible to build a language model-based system that accesses external knowledge sources to complete tasks. This enables more factual consistency, improves reliability of the generated responses, and helps to mitigate the problem of "hallucination".
Prompting Techniques: Automatic Reasoning and Tool-use (ART)
Combining CoT prompting and tools in an interleaved manner has shown to be a strong and robust approach to address many tasks with LLMs. These approaches typically require hand-crafting task-specific demonstrations and carefully scripted interleaving of model generations with tool use. Paranjape et al., (2023) propose a new framework that uses a frozen LLM to automatically generate intermediate reasoning steps as a program.
Prompting Techniques: Automatic Prompt Engineer (APE)
Zhou et al., (2022) propose automatic prompt engineer (APE) a framework for automatic instruction generation and selection. The instruction generation problem is framed as natural language synthesis addressed as a black-box optimization problem using LLMs to generate and search over candidate solutions.
Prompting Techniques: Directional Stimulus & Active-Prompting
A tuneable policy LM is trained to generate the stimulus/hint. Seeing more use of RL to optimize LLMs.
Diao et al., (2023) recently proposed a new prompting approach called Active-Prompt to adapt LLMs to different task-specific example prompts (annotated with human-designed CoT reasoning).
Prompting Techniques: PAL (Program-Aided Language Models)
Gao et al., (2022) presents a method that uses LLMs to read natural language problems and generate programs as the intermediate reasoning steps. Coined, program-aided language models (PAL), it differs from chain-of-thought prompting in that instead of using free-form text to obtain solution it offloads the solution step to a programmatic runtime such as a Python interpreter.
Prompting Techneques: ReAct
Yao et al., 2022 introduced a framework named ReAct where LLMs are used to generate both reasoning traces and task-specific actions in an interleaved manner. Generating reasoning traces allow the model to induce, track, and update action plans, and even handle exceptions. The action step allows to interface with and gather information from external sources such as knowledge bases or environments.
Prompting Techniques: Reflexion
Reflexion is a framework to reinforce language-based agents through linguistic feedback. According to Shinn et al. (2023), "Reflexion is a new paradigm for ‘verbal‘ reinforcement that parameterizes a policy as an agent’s memory encoding paired with a choice of LLM parameters."
Prompting Techniques: Multimodal CoT & Graph Prompting
Zhang et al. (2023) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales.
Agents
Agents are revolutionizing the way we approach complex tasks, leveraging the power of large language models (LLMs) to work on our behalf and achieve remarkable results. In this guide we will dive into the fundamentals of AI agents, exploring their capabilities, design patterns, and potential applications.
Guide: Optimizing Prompts
Large Language Models (LLMs) offer immense power for various tasks, but their effectiveness hinges on the quality of the prompts. This blog post summarize important aspects of designing effective prompts to maximize LLM performance.
Guide: OpenAI Deep Research
Deep Research is OpenAI ’s new agent that can perform multi-step research on the internet for performing complex tasks like generating reports and competitor analysis. It is an agentic reasoning system that has access to tools such as Python and web browsing to perform advanced research across a wide range of domains.
Guide: Reasoning LLMs
Large reasoning models (LRMs) or simply, reasoning LLMs, are models explicitly trained to perform native thinking or chain-of-thought. Popular examples of reasoning models include Gemini 2.5 Pro, Claude 3.7 Sonnet, and o3.
Guide: 4o Image Generation
4o Image Generation is OpenAI’s latest image model embedded into ChatGPT. It can create photorealistic outputs, take images as inputs and transform them, and follow detailed instructions, including generating text into images. OpenAI has confirmed that the model is autoregressive, and uses the same architecture as the GPT-4o LLM. The model essentially generates images in the same way as the LLM generates text. This enables improved capabilities in rendering text on top of images, more granular image editing, and editing images based on image inputs.
Guide: Context Engineering
Context engineering is the next phase, where you architect the full context, which in many cases requires going beyond simple prompting and into more rigorous methods to obtain, enhance, and optimize knowledge for the system.
Applications: GPT-4o Fine-Tuning
OpenAI recently announced the availability of fine-tuning for its latest models, GPT-4o and GPT-4o mini. This new capability enables developers to customize the GPT-4o models for specific use cases, enhancing performance and tailoring outputs.
Applications: Function-Calling
Function calling is the ability to reliably connect LLMs to external tools to enable effective tool usage and interaction with external APIs. LLMs like GPT-4 and GPT-3.5 have been fine-tuned to detect when a function needs to be called and then output JSON containing arguments to call the function. The functions that are being called by function calling will act as tools in your AI application and you can define more than one in a single request.
Applications: Context-Caching with Gemini Flash 1.5
Google recently released a new feature called context-caching which is available via the Gemini APIs through the Gemini 1.5 Pro and Gemini 1.5 Flash models. This guide provides a basic example of how to use context-caching with Gemini 1.5 Flash.
Applications: Generating Data
LLMs have strong capabilities to generate coherent text. Using effective prompt strategies can steer the model to produce better, consistent, and more factual responses. LLMs can also be especially useful for generating data which is really useful to run all sorts of experiments and evaluations.
Applications: Synthetic data generation for RAG
Unfortunately, in the life of a Machine Learning Engineer, there's often a lack of labeled data or very little of it. Typically, upon realizing this, projects embark on a lengthy process of data collection and labeling. Only after a couple of months can one start developing a solution.
Applications: Dataset Diversity
Let's say you are working on a legal document classification problem but are not permitted to send any data to an external API. In this situation, you would need to train a local model. However, collecting data could become a significant obstacle, causing delays in product development.
Applications: Code Generation
LLMs like ChatGPT are very effective at code generation. In this section, we will cover many examples of how to use ChatGPT for code generation.
Applications: Prompt Function
When we draw a parallel between GPT's dialogue interface and a programming language's shell, the encapsulation prompt can be thought of as forming a function. This function has a unique name, and when we call this name with the input text, it produces results based on the set internal rules. In a nutshell, we build a reusable prompt with a name that makes it easy to engage with GPT. It's like having a handy tool that lets GPT carry out particular tasks on our behalf – we just need to give the input, and we receive the desired output.
Models: ChatGPT & Claude 3
In this section, we cover the latest prompt engineering techniques for ChatGPT& Claude 3, including tips, applications, limitations, papers, and additional reading materials.
Models: Code-Llama
Code Llama is a family of large language models (LLM), released by Meta, with the capabilities to accept text prompts and generate and discuss code. The release also includes two other variants (Code Llama Python and Code Llama Instruct) and different sizes (7B, 13B, 34B, and 70B).
Models: Flan
This paper explores the benefits scaling instruction finetuning and how it improves performance on a variety of models (PaLM, T5), prompting setups (zero-shot, few-shot, CoT), and benchmarks (MMLU, TyDiQA). This is explored with the following aspects: scaling the number of tasks (1.8K tasks), scaling model size, and finetuning on chain-of-thought data (9 datasets used).
Models: Gemini
In this guide, we provide an overview of the Gemini models and how to effectively prompt and use them. The guide also includes capabilities, tips, applications, limitations, papers, and additional reading materials related to the Gemini models.
Models: Gemini Advanced
Google recently introduced its latest chat-based AI product called Gemini Advanced. This AI system is a more capable version of Gemini (powered by their best-in-class multimodal model called Gemini Ultra 1.0.) which also replaces Bard.
Models: Gemini 1.5 Pro
Google introduces Gemini 1.5 Pro, a compute-efficient multimodal mixture-of-experts model. This AI model focuses on capabilities such as recalling and reasoning over long-form content. Gemini 1.5 Pro can reason over long documents potentially containing millions of tokens, including hours of video and audio...
Models: Gemma
Google DeepMind releases Gemma, a series of open language models inspired by the same research and technology used to create Gemini. The Gemma model release includes 2B (trained on 2T tokens) and 7B (trained on 6T tokens) models including base and instruction-tuned checkpoints. The models are trained on a context length of 8192 tokens and generally outperform Llama 2 7B and Mistral 7B models on several benchmarks.
Models: GPT4
In this section, we cover the latest prompt engineering techniques for GPT-4, including tips, applications, limitations, and additional reading materials. More recently, OpenAI released GPT-4, a large multimodal model that accept image and text inputs and emit text outputs. It achieves human-level performance on various professional and academic benchmarks.
Models: Llama 3
Meta recently introduced their new family of large language models (LLMs) called Llama 3. This release includes 8B and 70B parameters pre-trained and instruction-tuned models. Notably, Llama 3 8B (instruction-tuned) outperforms Gemma 7B and Mistral 7B Instruct. Llama 3 70 broadly outperforms Gemini Pro 1.5 and Claude 3 Sonnet and falls a bit behind on the MATH benchmark when compared to Gemini Pro 1.5.
Models: Mistral 7B
In this guide, we provide an overview of the Mistral 7B LLM and how to prompt with it. It also includes tips, applications, limitations, papers, and additional reading materials related to Mistral 7B and finetuned models.
Models: Mixtral
In this guide, we provide an overview of the Mixtral 8x7B model, including prompts and usage examples. The guide also includes tips, applications, limitations, papers, and additional reading materials related to Mixtral 8x7B.
Models: Phi
In this guide, we provide an overview of the Phi-2, a 2.7 billion parameter language model, how to prompt Phi-2, and its capabilities. This guide also includes tips, applications, limitations, important references, and additional reading materials related to Phi-2 LLM.
Models: Sora
OpenAI introduces Sora, its new text-to-video AI model. Sora can create videos of up to a minute of realistic and imaginative scenes given text instructions. OpenAI reports that its vision is to build AI systems that understand and simulate the physical world in motion and train models to solve problems requiring real-world interaction.
LLM Collection
This section consists of a collection and summary of notable and foundational LLMs.
Adversariel Prompting, Factuality & Biases
Adversarial prompting is an important topic in prompt engineering as it could help to understand the risks and safety issues involved with LLMs. It's also an important discipline to identify these risks and design techniques to address the issues.
LLM Agents
LLM based agents, hereinafter also referred to as LLM agents for short, involve LLM applications that can execute complex tasks through the use of an architecture that combines LLMs with key modules like planning and memory. When building LLM agents, an LLM serves as the main controller or "brain" that controls a flow of operations needed to complete a task or user request. The LLM agent may require key modules such as planning, memory, and tool usage.
RAG for LLMs
There are many challenges when working with LLMs such as domain knowledge gaps, factuality issues, and hallucination. Retrieval Augmented Generation (RAG) provides a solution to mitigate some of these issues by augmenting LLMs with external knowledge such as databases. RAG is particularly useful in knowledge-intensive scenarios or domain-specific applications that require knowledge that's continually updating.
LLM Reasoning
Over the last couple of years, large language models (LLMs) have made significant progress in a wide range of tasks. More recently, LLMs have shown the potential to exhibit reasoning abilities when scaled to a large enough size. Different types of reasoning are fundamental to intelligence but it's not fully understood how AI models can learn and harness this capability to solve complex problems. It is an area of huge focus and investment for many research labs.
RAG Faith & In-Context Recall
This new paper by Machlab and Battle (2024) analyzes the in-context recall performance of different LLMs using several needle-in-a-haystack tests.
It shows that various LLMs recall facts at different lengths and placement depths. It finds that a model's recall performance is significantly affected by small changes in the prompt.
Reducing RAG Hallucination, Synthetic Data & ThoughtSculpt
The RAG system combines a small language model with a very small retriever. It shows that RAG can enable deploying powerful LLM-powered systems in limited-resource settings while mitigating issues like hallucination and increasing the reliability of outputs.
Infini-Attention, LM-Guided CoT, LLM Trust & Tokenization
A new paper by Lee et al. (2024) proposes to improve reasoning in LLMs using small language models. It first applies knowledge distillation to a small LM with rationales generated by the large LM with the hope of narrowing the gap in reasoning capabilities. Essentially, the rationale is generated by the lightweight LM and the answer prediction is then left for the frozen large LM. This resource-efficient approach avoids the need to fine-tune the large model and instead offloads the rationale generation to the small language model.