> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Preventing Out of Context Information

> Learn how to prevent out of context information from being generated by your AI models

Picture this scenario: You've built a RAG system to answer questions about famous landmarks using your carefully curated knowledge base. A user asks, "When was the Eiffel Tower completed?" Your system retrieves a relevant document:

```output theme={null}
The Eiffel Tower is an iron lattice tower located in Paris, France.
It was designed by Gustave Eiffel.
```

The response comes back:

```output wrap theme={null}
The Eiffel Tower was completed in 1889 for the World's Fair in Paris and is the most visited paid monument in the world.
```

At first glance, this might seem like a helpful response. It's detailed, informative, and answers the user's question. There's just one problem: most of this information isn't from your knowledge base. The model has ventured beyond the retrieved context, drawing from its pre-trained knowledge to provide what it thinks is a helpful answer.

This is the out-of-context problem in RAG systems - when a language model generates information not found in the retrieved documents. It's one of the most challenging issues in RAG implementations, often referred to as "closed-domain hallucination."

## Understanding the challenge

The root of this problem lies in how modern language models work. These models are trained on vast amounts of data and retain this knowledge. When asked a question, they naturally try to be helpful by combining:

* Information from the provided context
* Their pre-existing knowledge
* Patterns they've learned from similar questions

In our Eiffel Tower example, the model:

* Used the context correctly (location and designer)
* Added the completion date (1889) from its training data
* Included visitor statistics it "knew" from pre-training

While this additional information might be factually correct, it creates several problems:

* Users can't verify the source of this information
* The response mixes verified knowledge base facts with external information
* There's no distinction between what came from our documents and what didn't

This behavior particularly impacts RAG systems because they're specifically designed to provide information from a controlled set of documents. When the model starts adding external information, it undermines the entire purpose of having a curated knowledge base.

## The path to a solution

The key to solving this problem lies in understanding that language models will naturally try to be helpful by providing complete answers. They need explicit constraints and clear instructions to override this behavior. Let's look at how we can transform our example to prevent out-of-context information.

First, here's how we typically implement a RAG system with minimal constraints:

```python theme={null}
@log(name="rag_with_hallucination")
def rag_with_hallucination(query: str):
    documents = retrieve_documents(query)
    formatted_docs = format_documents(documents)

    weak_prompt = f"""
    Answer the following question based on the context provided.

    Question: {query}
    Context: {formatted_docs}
    """

    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": weak_prompt}
        ]
    )
```

This implementation has several weaknesses:

* The system message is too general
* The prompt doesn't explicitly restrict external knowledge
* There's no guidance on handling missing information

Here's how we can strengthen our implementation:

```python theme={null}
@log(name="rag_with_constraint")
def rag_with_constraint(query: str):
    documents = retrieve_documents(query)
    formatted_docs = format_documents(documents)

    strong_prompt = f"""
    Answer the following question based STRICTLY on the context provided.
    If the information needed to answer the question is not explicitly
    contained in the context, respond with: "I don't have enough information
    in the provided context to answer this question."

    DO NOT use any knowledge outside of the provided context.

    Question: {query}
    Context: {formatted_docs}
    """

    response = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": """You are a helpful assistant that ONLY answers
                based on the provided context. Never use external knowledge."""
            },
            {
                "role": "user",
                "content": strong_prompt
            }
        ]
    )
```

Now, when we ask about the Eiffel Tower's completion date, we get a different response:

```output theme={null}
I don't have enough information in the provided context to answer this question.
The context only mentions that the Eiffel Tower is an iron lattice tower in Paris
and was designed by Gustave Eiffel.
```

This response might seem less helpful at first, but it's actually much better because:

* It's honest about what information is available
* It clearly indicates what the source documents tell us
* It maintains the integrity of our knowledge base

The improvement comes from two key changes:

1. A stronger system message that explicitly defines the model's role and limitations
2. A structured prompt that:
   * Uses clear, direct language ("STRICTLY", "DO NOT")
   * Provides explicit instructions for handling missing information
   * Reinforces the importance of staying within the provided context

## Building a complete solution

To implement this approach in your own RAG system, you'll need several components working together. Let's walk through a complete implementation that demonstrates both the problem and its solution.

<Steps>
  <Step title="Setting Up the Environment">
    First, let's set up our environment with the necessary imports and configurations:

    ```python theme={null}
    import os
    from dotenv import load_dotenv
    from galileo import openai, log, galileo_context
    import questionary

    load_dotenv()

    # Check if Galileo logging is enabled
    logging_enabled = os.environ.get("GALILEO_API_KEY") is not None

    galileo_context.init(project="out-of-context", log_stream="dev")

    # Initialize OpenAI client
    client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
    ```

    This setup includes:

    * Loading environment variables for API keys
    * Setting up Galileo logging for tracking operations
    * Creating an OpenAI client for model interactions
  </Step>

  <Step title="Understanding the Document Retriever">
    The document retriever is designed to demonstrate how incomplete context can lead to out-of-context information:

    ```python theme={null}
    @log(span_type="retriever")
    def retrieve_documents(query: str):
        """
        Simulated document retrieval that intentionally returns incomplete information
        to demonstrate the out-of-context problem.
        """
        # Dictionary of queries and their intentionally incomplete contexts
        incomplete_contexts = {
            "eiffel tower": [
                {
                    "content": """The Eiffel Tower is an iron lattice tower located
                    in Paris, France. It was designed by Gustave Eiffel.""",
                    "metadata": {
                        "id": "doc1",
                        "source": "travel_guide",
                        "category": "landmarks",
                        "relevance": "high"
                    }
                }
            ],
            "python language": [
                {
                    "content": """Python is a high-level programming language known
                    for its readability and simple syntax.""",
                    "metadata": {
                        "id": "doc1",
                        "source": "programming_guide",
                        "category": "languages",
                        "relevance": "high"
                    }
                }
            ],
            "climate change": [
                {
                    "content": """Climate change refers to long-term shifts in
                    temperatures and weather patterns. Human activities have
                    been the main driver of climate change since the 1800s.""",
                    "metadata": {
                        "id": "doc1",
                        "source": "environmental_science",
                        "category": "global_issues",
                        "relevance": "high"
                    }
                }
            ],
            "artificial intelligence": [
                {
                    "content": """Artificial intelligence involves creating systems
                    capable of performing tasks that typically require human
                    intelligence.""",
                    "metadata": {
                        "id": "doc1",
                        "source": "technology_overview",
                        "category": "ai",
                        "relevance": "high"
                    }
                }
            ],
            "quantum computing": [
                {
                    "content": """Quantum computing uses quantum bits or qubits that
                    can represent multiple states simultaneously.""",
                    "metadata": {
                        "id": "doc1",
                        "source": "computing_technology",
                        "category": "quantum",
                        "relevance": "high"
                    }
                }
            ]
        }

        # Default case for queries not in our predefined list
        default_docs = [
            {
                "content": """This is a generic response with limited information
                about the query topic.""",
                "metadata": {
                    "id": "default_doc",
                    "source": "general_knowledge",
                    "category": "miscellaneous",
                    "relevance": "low"
                }
            }
        ]

        # Find the most relevant predefined query
        for key in incomplete_contexts:
            if key in query.lower():
                return incomplete_contexts[key]

        return default_docs
    ```

    Key points about the retriever:

    * It simulates real-world document retrieval with intentionally incomplete information
    * Uses predefined contexts to demonstrate the out-of-context problem
    * Includes metadata for tracking document sources and relevance
  </Step>

  <Step title="Demonstrating the Problem">
    Let's look at how a weak prompt can lead to out-of-context information:

    ```python theme={null}
    @log(name="rag_with_hallucination")
    def rag_with_hallucination(query: str):
        """
        RAG implementation that demonstrates the out-of-context problem by using
        a system prompt that doesn't properly constrain the model.
        """
        documents = retrieve_documents(query)

        # Format documents for better readability in the prompt
        formatted_docs = ""
        for i, doc in enumerate(documents):
            formatted_docs += f"Document {i+1} (Source: {doc['metadata']['source']}):\n{doc['content']}\n\n"

        # This prompt doesn't strongly constrain the model
        weak_prompt = f"""
        Answer the following question based on the context provided.

        Question: {query}

        Context:
        {formatted_docs}
        """

        try:
            print("Generating answer (prone to out-of-context information)...")
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": weak_prompt}
                ],
            )
            return response.choices[0].message.content.strip()
        except Exception as e:
            return f"Error generating response: {str(e)}"
    ```

    Problems with this approach:

    * The weak prompt doesn't explicitly constrain the model
    * The system message is too generic
    * No explicit instruction to avoid using external knowledge
  </Step>

  <Step title="Implementing the Solution">
    Now, let's see how to prevent out-of-context information with a stronger prompt:

    ```python theme={null}
    @log(name="rag_with_constraint")
    def rag_with_constraint(query: str):
        """
        RAG implementation that demonstrates how to mitigate the out-of-context problem
        by using a stronger system prompt and explicit instructions.
        """
        documents = retrieve_documents(query)

        # Format documents for better readability in the prompt
        formatted_docs = ""
        for i, doc in enumerate(documents):
            formatted_docs += f"Document {i+1} (Source: {doc['metadata']['source']}):\n{doc['content']}\n\n"

        # This prompt strongly constrains the model
        strong_prompt = f"""
        Answer the following question based STRICTLY on the context provided.
        If the information needed to answer the question is not explicitly
        contained in the context, respond with: "I don't have enough information
        in the provided context to answer this question."

        DO NOT use any knowledge outside of the provided context.

        Question: {query}

        Context:
        {formatted_docs}
        """

        try:
            print("Generating answer (constrained to context)...")
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[
                    {
                        "role": "system",
                        "content": """You are a helpful assistant that ONLY answers
                        based on the provided context.
                        Never use external knowledge."""
                    },
                    {
                        "role": "user",
                        "content": strong_prompt
                    }
                ],
            )
            return response.choices[0].message.content.strip()
        except Exception as e:
            return f"Error generating response: {str(e)}"
    ```

    Key improvements:

    * Explicit instruction to use only provided context
    * Clear directive to acknowledge when information is missing
    * Stronger system message that reinforces context adherence
  </Step>

  <Step title="Running the Interactive Demo">
    The main function provides an interactive way to test and compare both approaches:

    ```python theme={null}
    @log
    def main():
        print("Out-of-Context RAG Demo")
        print("This demo shows how RAG systems can generate out-of-context"+
              "information and how to prevent it.")

        # Check environment setup
        if logging_enabled:
            print("Galileo logging is enabled")
        else:
            print("Galileo logging is disabled")

        api_key = os.environ.get("OPENAI_API_KEY")
        if not api_key:
            print("OpenAI API Key is missing")
            return

        # Example queries that demonstrate the problem
        suggested_queries = [
            "When was the Eiffel Tower completed?",
            "Who created the Python language and when?",
            "What are the main effects of climate change?",
            "When was artificial intelligence first developed?",
            "How many qubits are in the most powerful quantum computer?"
        ]

        print("\nSuggested queries (these will demonstrate the problem):")
        for i, q in enumerate(suggested_queries):
            print(f"{i+1}. {q}")

        # Main interaction loop
        while True:
            try:
                # Get user query
                query = questionary.text(
                    "Enter your question (or type a number 1-5 to use a suggested query):",
                    validate=lambda text: len(text) > 0
                ).ask()

                if query.lower() in ['exit', 'quit', 'q']:
                    break

                # Check if user entered a number for suggested queries
                if query.isdigit() and 1 <= int(query) <= len(suggested_queries):
                    query = suggested_queries[int(query)-1]
                    print(f"Using query: {query}")

                # Generate both types of responses
                hallucinated_result = rag_with_hallucination(query)
                constrained_result = rag_with_constraint(query)

                # Display the responses
                print("\nUnconstrained Response (Prone to Out-of-Context Information):")
                print(hallucinated_result)

                print("\nConstrained Response (Limited to Context):")
                print(constrained_result)

                # Ask if user wants to continue
                continue_session = questionary.confirm(
                    "Do you want to ask another question?",
                    default=True
                ).ask()

                if not continue_session:
                    break

            except Exception as e:
                print(f"Error: {str(e)}")

    if __name__ == "__main__":
        try:
            main()
        except KeyboardInterrupt:
            print("\nExiting Out-of-Context RAG Demo. Goodbye!")
        finally:
            galileo_context.flush()  # Only flush at the very end
    ```

    Using this code, you can:

    * Test predefined queries that highlight the problem
    * Compare responses from both approaches
    * See the effectiveness of context constraints
  </Step>

  <Step title="Analyzing the Results">
    When running the demo, you'll notice:

    * **Unconstrained Responses**: May include information not present in the context
    * **Constrained Responses**: Strictly adhere to provided information
    * **Completeness vs. Accuracy**: Trade-off between complete answers and factual accuracy

    Here's an example comparison:

    Query: "When was the Eiffel Tower completed?"

    Unconstrained Response:

    ```output theme={null}
    The Eiffel Tower was completed in 1889 for the World's Fair in Paris.
    ```

    Constrained Response:

    ```output theme={null}
    I don't have enough information in the provided context to answer this question.
    The context only mentions that the Eiffel Tower is an iron lattice tower in Paris
    and was designed by Gustave Eiffel.
    ```

    The constrained response demonstrates better adherence to the available context, even though it provides less information.
  </Step>

  <Step title="Best Practices and Recommendations">
    To prevent out-of-context information in your RAG system:

    1. **Strong Prompting**:

       * Be explicit about using only provided context
       * Include clear instructions for handling missing information
       * Use system messages that reinforce context adherence

    2. **Context Management**:

       * Ensure retrieved documents are relevant and complete
       * Include metadata for tracking document sources
       * Monitor and log context utilization

    3. **Response Validation**:

       * Compare responses against provided context
       * Track and measure context adherence
       * Use Galileo metrics to monitor performance

    4. **User Experience**:
       * Clearly communicate when information is limited
       * Provide transparent source attribution
       * Balance completeness with accuracy
  </Step>
</Steps>
