Reducing Hesitation and Uncertainty

Some models struggle to confidently generate responses, leading to hesitation, incomplete answers, or repeated disclaimers. For example, consider this prompt and response:

Prompt: "Tell me about climate change."

Model Response: “Well, there are many aspects to climate change. Some people think it’s caused by humans, and others think it’s just natural. It’s hard to say exactly.”

What went wrong?

The prompt did not provide enough context for confident decision-making
The model allowed too much randomness in token selection
The prompt was ambiguous in the response it expected

How it showed up in metrics

High Uncertainty: The model hesitated in its response
High Prompt Perplexity: The model struggled with predicting the next token
Mid-range Instruction Adherence: The model understood the instructions but lacked decisiveness

Improvements and solutions

For the following improvements, we will be showing how we could change a simple prompt script like the below example:

import os
from galileo.openai import openai
from dotenv import load_dotenv
load_dotenv()

client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

prompt = "Tell me about climate change."
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": prompt}],
)
print(response.choices[0].message.content.strip())

Provide Stronger Context in Prompts

Include explicit guiding statements, for example:

  prompt = """Tell me about climate change.
    Assume the following is true:
    1. Climate change is driven by human activity.
    2. Provide an explanation based on scientific evidence."""

This should reduce the uncertaincy and perplexity in your metrics on Galileo.

Adjust Model Sampling Parameters

Lower temperature to make the model more deterministic, for example:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a joke."}],
    temperature=0.3  # Reduced temperature for more predictable responses
)

Use top-k sampling to limit options and prevent hesitation, for example:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me about climate change."}],
    # This is often how top k would be set.
    # In practice, ChatGPT does not allow modifying top-k, but other models do
    top_k=50  # Limits token selection to the top 50 choices
)

Lowering the temperature and decreasing top_k both generally increase the prompt adherence.

Modify Prompt Structure

Use direct phrasing to force a single, clear response, for example:

python example.py prompt = "Provide a single, definitive explanation of climate change in two sentences."  typescript example.ts const prompt = "Provide a single, definitive explanation of climate change in two sentences.";

Avoid prompts that allow multiple equally valid answers, for example avoiding confusing by adding the source of the opinion we care about:

prompt = "What are the primary causes of climate change according to scientists?"

Apply Uncertainty-Based Filtering

Automatically reject responses with an Uncertainty score above a set threshold

Documentation Index

​What went wrong?

​How it showed up in metrics

​Improvements and solutions

What went wrong?

How it showed up in metrics

Improvements and solutions