> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Uncertainty

> Measure and analyze model confidence in AI outputs using Galileo's Uncertainty Metric to identify potential hallucinations and improve response quality

export const DefinitionCard = ({children}) => {
  return <Card variant="secondary">
    <div style={{
    padding: '0.5rem',
    border: '5px solid var(--primary-light)',
    borderRadius: '0.5rem',
    fontSize: '1.3rem',
    lineHeight: '1.4',
    boxShadow: '0 0 10px 10px var(--primary-light)'
  }}>
        {children}
      </div>

</Card>;
};

<DefinitionCard>
  <strong>Uncertainty</strong> measures how much a model is deciding randomly between multiple ways of continuing the output, indicating the model's confidence level in its responses.
</DefinitionCard>

Uncertainty is measured at both the token level and the response level:

* **Token-level uncertainty**: Indicates how confident the model is about each individual token given the preceding tokens
* **Response-level uncertainty**: Represents the maximum token-level uncertainty across all tokens in the model's response

Higher uncertainty scores indicate the model is less certain about its output, which often correlates with:

* Hallucinations
* Made-up facts
* Citations
* Areas where the model is struggling with the content

## Calculation method

Uncertainty is calculated using log probabilities from the model:

<Steps>
  <Step title="Token Analysis">
    For each token in the sequence, the model calculates its confidence in predicting that token based on all preceding tokens in the context.
  </Step>

  <Step title="Response Aggregation">
    The system identifies the highest uncertainty value across all tokens in the response to determine the overall response-level uncertainty.
  </Step>

  <Step title="Model Integration">
    The calculation leverages log probabilities from OpenAI's Davinci models or Chat Completion models, available through both OpenAI and Azure platforms.
  </Step>
</Steps>

<Warning>
  <strong>Uncertainty canonly be calculated with LLM integrations that provide log probabilities</strong>:

  ### OpenAI

  * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model
  * Any Evaluate workflow runs using `davinci-001`
  * Any Observe workflows using `davinci-001`

  ### Azure OpenAI

  * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model
  * Any Evaluate workflow runs using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment
  * Any Observe workflows using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment
</Warning>

<Note>
  To calculate the Uncertainty metric, we require having `text-curie-001` or `text-davinci-003` models available in your Azure environment to fetch log probabilities. For Galileo's Guardrail metrics that rely on GPT calls (Factuality and Groundedness), we require using `0613` or above versions of `gpt-3-5-turbo`.
</Note>

## Optimizing your AI system

<Card>
  <div style={{display: 'flex', alignItems: 'center', gap: '0.5rem', marginBottom: '0.75rem'}}>
    <div style={{fontSize: '1.25rem', color: 'var(--primary-color)'}}>
      <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
        <path d="M12 20h9" />

        <path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4L16.5 3.5z" />
      </svg>
    </div>

    <h3 style={{margin: 0, fontSize: '1.25rem', fontWeight: '600'}}>Addressing High Uncertainty</h3>
  </div>

  When responses show high uncertainty scores, your model is likely struggling with the content. To improve your system:

  <div style={{ marginTop: "1rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Identify uncertainty patterns:</strong> Analyze where in responses uncertainty spikes occur.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Enhance knowledge sources:</strong> Provide better context or retrieval results for topics with high uncertainty.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Refine prompts:</strong> Add more specific instructions or constraints for areas where the model shows uncertainty.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Consider model selection:</strong> Some models may be more confident in specific domains.
  </div>
</Card>

## Best practices

<CardGroup cols={2}>
  <Card title="Monitor Uncertainty Hotspots" icon="temperature-high">
    Track tokens and phrases that consistently trigger high uncertainty to identify knowledge gaps.
  </Card>

  <Card title="Implement Confidence Thresholds" icon="filter">
    Set uncertainty thresholds to flag or reject responses that exceed acceptable uncertainty levels.
  </Card>

  <Card title="Compare Across Models" icon="chart-line">
    Evaluate how different models perform on the same inputs to identify which ones have lower uncertainty in your domain.
  </Card>

  <Card title="Combine with Factual Metrics" icon="scale-balanced">
    Use Uncertainty alongside Correctness metrics to identify correlations between model confidence and factual accuracy.
  </Card>
</CardGroup>

<Note>
  When analyzing Uncertainty, remember that some level of uncertainty is normal and even desirable in certain contexts. Very low uncertainty might indicate the model is being overly deterministic or repeating memorized patterns rather than reasoning about the content.
</Note>
