> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Instruction Adherence

> Assess instruction adherence in AI outputs using Galileo Guardrail Metrics to ensure prompt-driven models generate precise and actionable results

export const DefinitionCard = ({children}) => {
  return <Card variant="secondary">
    <div style={{
    padding: '0.5rem',
    border: '5px solid var(--primary-light)',
    borderRadius: '0.5rem',
    fontSize: '1.3rem',
    lineHeight: '1.4',
    boxShadow: '0 0 10px 10px var(--primary-light)'
  }}>
        {children}
      </div>

</Card>;
};

export const Scale = ({low, mid, high, lowLabel = "Low", midLabel = "Mid", highLabel = "High", lowDescription, midDescription, highDescription, midColor = "yellow", inverted = false}) => {
  const lowColor = inverted ? "green" : "red";
  const highColor = inverted ? "red" : "green";
  const gradientId = inverted ? "greenToRed" : "redToGreen";
  return <div style={{
    display: 'flex',
    flexDirection: 'column',
    width: '100%'
  }}>
      <svg width="100%" height="30" style={{
    marginBottom: '8px'
  }}>
        <defs>
          <linearGradient id={gradientId} x1="0%" y1="0%" x2="100%" y2="0%">
            <stop offset="0%" stopColor={lowColor} />
            <stop offset="100%" stopColor={highColor} />
          </linearGradient>
        </defs>
        <rect width="100%" height="100%" fill={`url(#${gradientId})`} rx="4" ry="4" />
      </svg>

      <div style={{
    display: 'flex',
    justifyContent: 'space-between',
    width: '100%',
    marginBottom: '16px'
  }}>
        <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{low}</p>
        {mid && <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{mid}</p>}
        <p style={{
    margin: 0,
    fontSize: '12px'
  }}>{high}</p>
      </div>

      <div style={{
    display: 'flex',
    justifyContent: 'space-between',
    width: '100%'
  }}>
        <div style={{
    maxWidth: '40%'
  }}>
          <div style={{
    display: 'flex',
    alignItems: 'center',
    marginBottom: '4px'
  }}>
            <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: lowColor,
    borderRadius: '50%',
    marginRight: '8px'
  }}></div>
            <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{lowLabel}</p>
          </div>
          {lowDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    lineHeight: '1.4'
  }}>{lowDescription}</p>}
        </div>
        {mid && <div style={{
    maxWidth: '40%',
    textAlign: 'center'
  }}>
            <div style={{
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'center',
    marginBottom: '4px'
  }}>
              <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: midColor,
    borderRadius: '50%',
    marginRight: '8px'
  }}></div>
              <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{midLabel}</p>
            </div>
            {midDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    textAlign: 'center',
    lineHeight: '1.4'
  }}>{midDescription}</p>}
          </div>}


        <div style={{
    maxWidth: '40%',
    textAlign: 'right'
  }}>
          <div style={{
    display: 'flex',
    alignItems: 'center',
    justifyContent: 'flex-end',
    marginBottom: '4px'
  }}>
            <p style={{
    margin: 0,
    fontWeight: 'bold',
    fontSize: '14px'
  }}>{highLabel}</p>
            <div style={{
    width: '12px',
    height: '12px',
    backgroundColor: highColor,
    borderRadius: '50%',
    marginLeft: '8px'
  }}></div>
          </div>
          {highDescription && <p style={{
    margin: 0,
    fontSize: '14px',
    color: '#666',
    maxWidth: '250px',
    marginLeft: 'auto',
    lineHeight: '1.4'
  }}>{highDescription}</p>}
        </div>
      </div>
    </div>;
};

<DefinitionCard>
  <strong>Instruction Adherence</strong> measures whether a model followed or adhered to the system or prompt instructions when generating a response.
</DefinitionCard>

## How it works

This metric is particularly valuable for uncovering hallucinations where the model is ignoring instructions, which can lead to responses that don't meet user requirements or business rules.

Here's a scale that shows the relationship between Instruction Adherence and the potential impact on your AI system:

<Scale low="0" lowLabel="Low Adherence" high="1" highLabel="High Adherence" lowDescription="The model ignored its instructions when generating its response." midDescription="The model partially followed its instructions when generating its response." highDescription="The model followed its instructions when generating its response." />

## Calculation method

Instruction Adherence is computed through a multi-step process:

<Steps>
  <Step title="Model Evaluation">
    The system sends multiple evaluation requests to OpenAI's GPT4o model to analyze whether the response follows the provided instructions.
  </Step>

  <Step title="Analysis Process">
    A specialized chain-of-thought prompt guides the model through a detailed evaluation of how well the response adheres to the specific instructions given.
  </Step>

  <Step title="Multiple Assessments">
    The system requests and collects multiple distinct responses to ensure a robust evaluation through consensus.
  </Step>

  <Step title="Result Generation">
    Each evaluation produces both a detailed explanation of the reasoning and a binary judgment (yes/no) on instruction adherence.
  </Step>

  <Step title="Score Calculation">
    The final score is computed as the ratio of positive ('yes') responses to the total number of evaluation responses.
  </Step>
</Steps>

We also surface one of the generated explanations, always choosing one that aligns with the majority judgment among the responses.

<Note>
  This metric is computed by prompting an LLM multiple times, and thus requires additional LLM calls to compute, which may impact usage and billing.
</Note>

## Understanding instruction adherence

<Card>
  <div style={{display: 'flex', alignItems: 'center', gap: '0.5rem', marginBottom: '0.75rem'}}>
    <div style={{fontSize: '1.25rem', color: 'var(--primary-color)'}}>
      <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
        <path d="M12 22c5.523 0 10-4.477 10-10S17.523 2 12 2 2 6.477 2 12s4.477 10 10 10z" />

        <path d="m9 12 2 2 4-4" />
      </svg>
    </div>

    <h3 style={{margin: 0, fontSize: '1.25rem', fontWeight: '600'}}>Differentiating from Context Adherence</h3>
  </div>

  It's important to understand the distinction between related metrics:

  <div style={{ marginTop: "1rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Instruction Adherence:</strong> Measures whether the response follows the instructions in your prompt template.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Context Adherence:</strong> Measures whether the response adheres to the context provided (e.g., your retrieved documents).
  </div>
</Card>

## Optimizing your AI system

<Card>
  <div style={{display: 'flex', alignItems: 'center', gap: '0.5rem', marginBottom: '0.75rem'}}>
    <div style={{fontSize: '1.25rem', color: 'var(--primary-color)'}}>
      <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
        <path d="M12 20h9" />

        <path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4L16.5 3.5z" />
      </svg>
    </div>

    <h3 style={{margin: 0, fontSize: '1.25rem', fontWeight: '600'}}>Addressing Low Instruction Adherence</h3>
  </div>

  When a response has a low Instruction Adherence score, the model likely ignored its instructions. To improve your system:

  <div style={{ marginTop: "1rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Flag and examine non-compliant responses:</strong> Identify patterns in responses that don't follow instructions.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Experiment with prompt engineering:</strong> Test different prompt formulations to find versions the model is more likely to adhere to.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Implement guardrails:</strong> Take precautionary measures to prevent non-compliant responses from reaching end users.
  </div>

  <div style={{ marginTop: "0.75rem", paddingTop: "0.75rem", borderTop: "1px solid rgba(209, 213, 219, 0.33)" }}>
    <strong>Consider model selection:</strong> Some models may be better at following instructions than others.
  </div>
</Card>

## Best practices

<CardGroup cols={2}>
  <Card title="Clarify Instructions" icon="pen-to-square">
    Write clear, specific instructions without ambiguity or contradictions to improve adherence rates.
  </Card>

  <Card title="Prioritize Critical Instructions" icon="list-check">
    Place the most important instructions prominently in your prompt and consider repeating them for emphasis.
  </Card>

  <Card title="Monitor Across Models" icon="chart-line">
    Compare Instruction Adherence scores across different LLMs to identify which models best follow your specific instructions.
  </Card>

  <Card title="Implement Feedback Loops" icon="rotate">
    Use low-adherence examples to refine your prompts and create test cases for future prompt iterations.
  </Card>
</CardGroup>

<Note>
  When optimizing for Instruction Adherence, balance strict adherence with allowing the model some flexibility. Overly rigid instructions may limit the model's ability to provide helpful responses in edge cases.
</Note>
