Models: ChatGPT & Claude 3

ChatGPT Prompt Engineering

In this section, we cover the latest prompt engineering techniques for ChatGPT, including tips, applications, limitations, papers, and additional reading materials.

ChatGPT Introduction

ChatGPT is a new model trained by OpenAI that has the capability to interact in a conversational way. This model is trained to follow instructions in a prompt to provide appropriate responses in the context of a dialogue. ChatGPT can help with answering questions, suggesting recipes, writing lyrics in a certain style, generating code, and much more.

ChatGPT is trained using Reinforcement Learning from Human Feedback (RLHF). While this model is a lot more capable than previous GPT iterations (and also trained to reduce harmful and untruthful outputs), it still comes with limitations. Let's cover some of the capabilities and limitations with concrete examples.

You can use the research preview of ChatGPT, but for the examples below we will use the Chat mode on the OpenAI Playground.

Reviewing The Conversation Task

In one of the previous guides, we covered a bit about conversation capabilities and role prompting. We covered how to instruct the LLM to have a conversation in a specific style, with a specific intent, behavior, and identity.

Let's review our previous basic example where we created a conversational system that's able to generate more technical and scientific responses to questions.

Prompt:

The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.Human: Hello, who are you?AI: Greeting! I am an AI research assistant. How can I help you today?Human: Can you tell me about the creation of black holes?AI:

From the example above, you can see two important components:

the intent or explanation of what the chatbot is
the identity which instructs the style or tone the chatbot will use to respond

The simple example above works well with the text completion APIs that uses text-davinci-003. More recently, OpenAI announced the ChatGPT APIs, which is a more powerful and cheaper model called gpt-3.5-turbo was specifically built for this type of functionality (chat completions). In fact, OpenAI recommends this as their best model even for non-chat use cases. Other benefits of using the ChatGPT APIs are significant cost reduction (90%) and efficiency.

Big companies like Snap Inc. and Instacart are already integrating conversational features powered by ChatGPT on their products that range from personalized recommendations to open-ended shopping goals.

Conversations with ChatGPT

Multi-turn Conversations

To begin demonstrating the capabilities of ChatGPT, we will use the chatbot assistant example above and discuss the results. Compared to text-davinci-003, the gpt-3.5-turbo model that powers ChatGPT uses a chat format as input. The model expects a series of messages as input and uses those to generate a response.

Input:

SYSTEM: You are an AI research assistant. You use a tone that is technical and scientific.USER: Hello, who are you?ASSISTANT: Greeting! I am an AI research assistant. How can I help you today?USER: Can you tell me about the creation of black holes?

Output:

ASSISTANT: Of course! Black holes are created when a massive star runs out of fuel and collapses under its own gravity. This collapse causes the star to become incredibly dense, with a gravitational field so strong that not even light can escape it. This point of no return is known as the event horizon. Once a star crosses the event horizon, it is considered a black hole...

Note that in the example above, I have simplified the input and output but the ChatGPT chat completion API requires messages to be in a specific format. I have added a snapshot below of how this example would look using the Chat Mode in the OpenAI Playground:

The more formal API call for our example would look something like the example below:

import openai openai.ChatCompletion.create(  model="gpt-3.5-turbo",  messages=[        {"role": "system", "content": "You are an AI research assistant. You use a tone that is technical and scientific."},        {"role": "user", "content": "Hello, who are you?"},        {"role": "assistant", "content": "Greeting! I am an AI research assistant. How can I help you today?"},        {"role": "user", "content": "Can you tell me about the creation of black holes?"}    ])

In fact, the way developers interact with ChatGPT in the future is expected to be done via the Chat Markup Language (ChatML for short).

Single-turn tasks

The chat format enables multi-turn conversations but it also supports single-turn tasks similar to what we used with text-davinci-003. This means we can use ChatGPT to perform similar tasks as what we have demonstrated for the original GPT models. For example, let's try to perform the following question answering task using ChatGPT:

Input:

USER: Answer the question based on the context below. Keep the answer short and concise. Respond "Unsure about answer" if not sure about the answer.Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.Question: What was OKT3 originally sourced from?Answer:

Output:

ASSISTANT: Mice.

Keep in mind that I am adding the USER and ASSISTANT labels to better demonstrate how the task can be performed using ChatGPT. Here is the example using the Playground:

More formally, this is the API call (I've only included the message component of the request):

CONTENT = """Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer. Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use. Question: What was OKT3 originally sourced from? Answer:""" response = openai.ChatCompletion.create(    model="gpt-3.5-turbo",    messages=[        {"role": "user", "content": CONTENT},    ],    temperature=0,)

Instructing Chat Models

According to the official OpenAI docs, snapshots of the gpt-3.5-turbo model will also be made available. For example, we can access the snapshot from March 1 gpt-3.5-turbo-0301. This allows developers to opt for specific model versions. This also means that the best practices for instructing models may change from version to version.

The current recommendation for gpt-3.5-turbo-0301 is to add instructions in the user message as opposed to the available system message.

Claude 3

Anthropic announces Claude 3, their new family of models that include Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus.

Claude 3 Opus (the strongest model) is reported to outperform GPT-4 and all other models on common benchmarks like MMLU and HumanEval.

Results and Capabilities

Claude 3 capabilities include advanced reasoning, basic mathematics, analysis, data extraction, forecasting, content creation, code generation, and converting in non-English languages like Spanish, Japanese, and French. The table below demonstrates how Claude 3 compares with other models on several benchmarks with Claude 3 Opus outperforming all the mentioned models:

Claude 3 Haiku is the fastest and most cost-effective model of the series. Claude 3 Sonnet is 2x faster than previous iterations of Claude and Opus is as fast as Claude 2.1 with more superior capabilities.

The Claude 3 models offer support for 200K context windows but can be extended to 1M tokens to select customers. Claude 3 Opus achieved near-perfect recall on the Needle In A Haystack (NIAH) evaluation which measures the model's ability to recall information in a large corpus and effectively process long context prompts.

The models also have strong vision capabilities for processing formats like photos, charts, and graphs.

Anthropic also claim that these models have a more nuanced understanding of requests and make fewer refusals. Opus also shows significant improvements in factual question answering in open-ended questions while reducing incorrect answers or hallucinations. Claude 3 models are also better than the Claude 2 models at producing structured outputs like JSON objects.