> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Prompt Perplexity

> Measure and optimize prompt quality using Galileo's Prompt Perplexity Metric to improve model performance and response generation

export const DefinitionCard = ({children}) => {
  return <Card variant="secondary">
    <div style={{
    padding: '0.5rem',
    border: '5px solid var(--primary-light)',
    borderRadius: '0.5rem',
    fontSize: '1.3rem',
    lineHeight: '1.4',
    boxShadow: '0 0 10px 10px var(--primary-light)'
  }}>
        {children}
      </div>

</Card>;
};

export const Scale = ({low, mid, high, lowLabel = "Low", midLabel = "Mid", highLabel = "High", lowDescription, midDescription, highDescription, midColor = "yellow", inverted = false}) => {
  const lowColor = inverted ? "green" : "red";
  const highColor = inverted ? "red" : "green";
  const gradientId = inverted ? "greenToRed" : "redToGreen";
  return <div style={{
    display: 'flex',
    flexDirection: 'column',
    width: '100%'
  }}>
      <svg width="100%" height="30" style={{
    marginBottom: '8px'
  }}>
        <defs>
          <linearGradient id={gradientId} x1="0%" y1="0%" x2="100%" y2="0%">
            <stop offset="0%" stopColor={lowColor} />
            <stop offset="100%" stopColor={highColor} />
          </linearGradient>
        </defs>
        <rect width="100%" height="100%" fill={`url(#${gradientId})`} rx="4" ry="4" />
      </svg>

      <div style={{
    display: 'flex',
    justifyContent: 'space-between',
    width: '100%',
    marginBottom: '16px'
  }}>
        <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{low}</p>
        {mid && <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{mid}</p>}
        <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{high}</p>
      </div>

      <div style={{
    display: 'flex',
    justifyContent: 'space-between',
    width: '100%'
  }}>
        <div style={{
    maxWidth: '40%'
  }}>
          <div style={{
    display: 'flex',
    alignItems: 'center',
    marginBottom: '4px'
  }}>
            <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: lowColor,
    borderRadius: '50%',
    marginRight: '8px'
  }}></div>
            <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{lowLabel}</p>
          </div>
          {lowDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    lineHeight: '1.4'
  }}>{lowDescription}</p>}
        </div>
        {mid && <div style={{
    maxWidth: '40%',
    textAlign: 'center'
  }}>
            <div style={{
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'center',
    marginBottom: '4px'
  }}>
              <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: midColor,
    borderRadius: '50%',
    marginRight: '8px'
  }}></div>
              <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{midLabel}</p>
            </div>
            {midDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    textAlign: 'center',
    lineHeight: '1.4'
  }}>{midDescription}</p>}
          </div>}


        <div style={{
    maxWidth: '40%',
    textAlign: 'right'
  }}>
          <div style={{
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'flex-end',
    marginBottom: '4px'
  }}>
            <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{highLabel}</p>
            <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: highColor,
    borderRadius: '50%',
    marginLeft: '8px'
  }}></div>
          </div>
          {highDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    marginLeft: 'auto',
    lineHeight: '1.4'
  }}>{highDescription}</p>}
        </div>
      </div>
    </div>;
};

<DefinitionCard>
  <strong>Prompt Perplexity</strong> measures how predictable or familiar a prompt is to a language model, using the log probabilities provided by the model.
</DefinitionCard>

## How it works

Prompt Perplexity is a continuous metric ranging from 0 to infinity:

<Scale low="0" lowLabel="Low Perplexity" high="∞" highLabel="High Perplexity" lowDescription="The model is highly certain about predicting tokens in the prompt" midDescription="The model has moderate certainty about predicting tokens" highDescription="The model finds the prompt less predictable" />

This metric helps evaluate how well your prompts are tuned to your chosen model, which research has shown correlates with better response generation.

## Calculation method

Prompt Perplexity is computed through a specific mathematical process that measures how difficult it is for the model to predict each token in the prompt:

<Steps>
  <Step title="Calculate Token Probabilities" description="For each token in the prompt:">
    1. The model processes the prompt token by token
    2. For each position, it computes the probability distribution over the next token
    3. We extract the log probability of the actual next token that appears in the prompt
  </Step>

  <Step title="Average Log Probabilities" description="Combine the token-level information:">
    1. Sum all the log probabilities across the entire prompt
    2. Divide by the total number of tokens to get the average
    3. This gives us the average log probability per token
  </Step>

  <Step title="Apply Exponential" description="Transform to final perplexity score:">
    1. Take the negative of the average log probability
    2. Apply the exponential function to this value
    3. This converts log probabilities to a more interpretable perplexity score
  </Step>

  <Step title="Final Formula" description="The complete calculation:">
    Perplexity = exp(-average(log\_probabilities))
  </Step>
</Steps>

<Card>
  <div style={{display: 'flex', alignItems: 'center', gap: '0.5rem', marginBottom: '0.75rem'}}>
    <div style={{fontSize: '1.25rem', color: 'var(--primary-color)'}}>
      <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
        <path d="M12 22c5.523 0 10-4.477 10-10S17.523 2 12 2 2 6.477 2 12s4.477 10 10 10z" />

        <path d="m9 12 2 2 4-4" />
      </svg>
    </div>

    <h3 style={{margin: 0, fontSize: '1.25rem', fontWeight: '600'}}>Key Properties</h3>
  </div>

  Understanding the mathematical properties of perplexity:

  <div style={{ marginTop: "1rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Range:</strong> Always positive, with lower values indicating better predictability
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Scale:</strong> Exponential scale means small changes in log probabilities can lead to large perplexity differences
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Length independence:</strong> Using the average makes the metric comparable across prompts of different lengths
  </div>
</Card>

<Warning>
  ### Availability

  Prompt Perplexity can only be calculated with LLM integrations that provide log probabilities:

  ### *OpenAI*

  * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model
  * Any Evaluate workflow runs using `davinci-001`
  * Any Observe workflows using `davinci-001`

  ### *Azure OpenAI*

  * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model
  * Any Evaluate workflow runs using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment
  * Any Observe workflows using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment
</Warning>

<Note>
  To calculate the Prompt Perplexity metric, we require models that provide log probabilities. This typically includes older models like `davinci-001`, `text-davinci-003`, or `text-curie-001`.
</Note>

## Understanding perplexity

<Card>
  <div style={{display: 'flex', alignItems: 'center', gap: '0.5rem', marginBottom: '0.75rem'}}>
    <div style={{fontSize: '1.25rem', color: 'var(--primary-color)'}}>
      <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
        <path d="M12 22c5.523 0 10-4.477 10-10S17.523 2 12 2 2 6.477 2 12s4.477 10 10 10z" />

        <path d="m9 12 2 2 4-4" />
      </svg>
    </div>

    <h3 style={{margin: 0, fontSize: '1.25rem', fontWeight: '600'}}>Interpreting Perplexity Scores</h3>
  </div>

  Lower Prompt Perplexity scores generally indicate better prompt quality:

  <div style={{ marginTop: "1rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Lower perplexity:</strong> Suggests your model is better tuned toward your data, as it can better predict the next token.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Research findings:</strong> The paper "Demystifying Prompts in Language Models via Perplexity Estimation" has shown that lower perplexity values in prompts lead to better outcomes in the generated responses.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Monitoring value:</strong> Tracking perplexity can help you iteratively improve your prompts.
  </div>
</Card>

## Optimizing your AI system

<CardGroup cols={2}>
  <Card title="Use Familiar Language" icon="language">
    Phrase prompts using language patterns similar to the model's training data to reduce perplexity.
  </Card>

  <Card title="Provide Clear Context" icon="circle-info">
    Include sufficient context that helps the model predict what comes next in the prompt.
  </Card>

  <Card title="Avoid Unusual Formatting" icon="text-slash">
    Use standard formatting and avoid unusual syntax that might confuse the model.
  </Card>

  <Card title="Test Variations" icon="vials">
    Experiment with different phrasings of the same prompt to find lower perplexity versions.
  </Card>
</CardGroup>

<Note>
  When optimizing for Prompt Perplexity, remember that the goal isn't always to minimize perplexity at all costs. Sometimes a slightly higher perplexity prompt might be necessary to communicate specific or technical requirements. The key is finding the right balance for your use case.
</Note>
