> ## Documentation Index > Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt > Use this file to discover all available pages before exploring further. # Prompt Perplexity > Measure and optimize prompt quality using Galileo's Prompt Perplexity Metric to improve model performance and response generation export const DefinitionCard = ({children}) => { return

{children}

; }; export const Scale = ({low, mid, high, lowLabel = "Low", midLabel = "Mid", highLabel = "High", lowDescription, midDescription, highDescription, midColor = "yellow", inverted = false}) => { const lowColor = inverted ? "green" : "red"; const highColor = inverted ? "red" : "green"; const gradientId = inverted ? "greenToRed" : "redToGreen"; return

{low}

{mid &&

{mid}

}

{high}

{lowLabel}

{lowDescription &&

{lowDescription}

}

{mid &&

{midLabel}

{midDescription &&

{midDescription}

}

{highLabel}

{highDescription &&

{highDescription}

}

; }; Prompt Perplexity measures how predictable or familiar a prompt is to a language model, using the log probabilities provided by the model. ## How it works Prompt Perplexity is a continuous metric ranging from 0 to infinity: This metric helps evaluate how well your prompts are tuned to your chosen model, which research has shown correlates with better response generation. ## Calculation method Prompt Perplexity is computed through a specific mathematical process that measures how difficult it is for the model to predict each token in the prompt: 1. The model processes the prompt token by token 2. For each position, it computes the probability distribution over the next token 3. We extract the log probability of the actual next token that appears in the prompt 1. Sum all the log probabilities across the entire prompt 2. Divide by the total number of tokens to get the average 3. This gives us the average log probability per token 1. Take the negative of the average log probability 2. Apply the exponential function to this value 3. This converts log probabilities to a more interpretable perplexity score Perplexity = exp(-average(log\_probabilities))

Key Properties

Understanding the mathematical properties of perplexity:

Range: Always positive, with lower values indicating better predictability

Scale: Exponential scale means small changes in log probabilities can lead to large perplexity differences

Length independence: Using the average makes the metric comparable across prompts of different lengths

### Availability Prompt Perplexity can only be calculated with LLM integrations that provide log probabilities: ### *OpenAI* * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model * Any Evaluate workflow runs using `davinci-001` * Any Observe workflows using `davinci-001` ### *Azure OpenAI* * Any Evaluate runs created from the Galileo Playground or with `pq.run(...)`, using the chosen model * Any Evaluate workflow runs using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment * Any Observe workflows using `text-davinci-003` or `text-curie-001`, if available in your Azure deployment To calculate the Prompt Perplexity metric, we require models that provide log probabilities. This typically includes older models like `davinci-001`, `text-davinci-003`, or `text-curie-001`. ## Understanding perplexity

Interpreting Perplexity Scores

Lower Prompt Perplexity scores generally indicate better prompt quality:

Lower perplexity: Suggests your model is better tuned toward your data, as it can better predict the next token.

Research findings: The paper "Demystifying Prompts in Language Models via Perplexity Estimation" has shown that lower perplexity values in prompts lead to better outcomes in the generated responses.

Monitoring value: Tracking perplexity can help you iteratively improve your prompts.

## Optimizing your AI system Phrase prompts using language patterns similar to the model's training data to reduce perplexity. Include sufficient context that helps the model predict what comes next in the prompt. Use standard formatting and avoid unusual syntax that might confuse the model. Experiment with different phrasings of the same prompt to find lower perplexity versions. When optimizing for Prompt Perplexity, remember that the goal isn't always to minimize perplexity at all costs. Sometimes a slightly higher perplexity prompt might be necessary to communicate specific or technical requirements. The key is finding the right balance for your use case.