> ## Documentation Index
> Fetch the complete documentation index at: https://docs.galileo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Build a RAG Application with Elasticsearch, LangGraph, and Galileo

> Guide to using Elasticsearch with LangGraph for the Chatbot RAG app, logging to Galileo

## Overview

In this tutorial, you'll learn how to build a **Retrieval-Augmented Generation (RAG)** application that combines:

* **Elasticsearch** for document storage and semantic search using the ELSER model
* **LangGraph** for building conversational agents
* **Galileo** for end-to-end observability and logging

This tutorial is intended for Python developers who want to build production-ready RAG applications. By the end, you'll have a working chatbot that can answer questions about your documents with full observability.

## What you'll build

You'll create a RAG chatbot that:

1. Use ELSER model for semantic search
2. Stores documents in the Elasticsearch vector store
3. Uses LangGraph to orchestrate retrieval and generation steps
4. Monitor traces with Galileo

## Prerequisites

Before starting, you'll need:

* **Python 3.10+** installed
* **An Elasticsearch instance** (we'll use Elastic Cloud Serverless)
* **OpenAI API key** for the language model
* **Galileo account** for observability

## Step 1: set up Elasticsearch cloud serverless

First, let's set up your Elasticsearch instance for document storage and retrieval.

### Create your Elasticsearch project

1. Navigate to [cloud.elastic.co](https://cloud.elastic.co) and create an account or log in
2. Click [**Create serverless project**](https://cloud.elastic.co/projects/create)
3. Choose **Elasticsearch** as the project type
4. Select **Optimized for Vectors** configuration
5. Name your project (e.g., "rag-chatbot") and click **Create project**

### Create your first index

1. Once your project is ready, you'll see the index creation page
2. Enter an index name: `demo`
3. Click **Create my index**
4. **Important**: Copy and save your **Elasticsearch URL** and **API key** - you won't see the API key again

### Deploy or configure the ELSER model

ELSER (Elastic Learned Sparse EncodeR) provides semantic search capabilities:

1. In your Elasticsearch project, go to **Relevance** → **Inference Endpoints**
2. If **ELSER** does not exist, click **Create endpoint**
3. Follow [the ELSER docs](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser#download-deploy-elser) or [Elastic guide](https://www.elastic.co/getting-started/enterprise-search/vector-search)
4. Note the model ID (typically `.elser_model_2_linux-x86_64`)

## Step 2: set up your Python environment

Create a new project and install the required dependencies in a virtual environment:

```bash theme={null}
# Install dependencies
pip install \
    elasticsearch \
    langchain-elasticsearch \
    langchain-openai \
    langgraph \
    galileo \
    openai \
    dotenv # Optionally depending on your environment
```

## Step 3: configure environment variables

Create a `.env` file or set these environment variables:

```bash theme={null}
# Galileo Environment Variables

# Your Galileo API key
GALILEO_API_KEY=your-galileo-api-key

# Your Galileo project name
GALILEO_PROJECT=your-galileo-project-name

# The name of the Log stream you want to use for logging
GALILEO_LOG_STREAM=your-galileo-log-stream

# Provide the console url below if you are using a
# custom deployment, and not using the free tier, or app.galileo.ai.
# This will look something like “console.galileo.yourcompany.com”.
# GALILEO_CONSOLE_URL=your-galileo-console-url

# OpenAI properties
OPENAI_API_KEY=your-openai-api-key

# Optional. The base URL of your OpenAI deployment.
# Leave this commented out if you are using the default OpenAI API.
# OPENAI_BASE_URL=

# Optional. Your OpenAI organization.
# OPENAI_ORGANIZATION=

# Elasticsearch
ES_HOST="your-elasticsearch-host-here"  # e.g., "https://your-cluster.es.us-central1.gcp.cloud.es.io:443"
ES_API_KEY="your-api-key-here"
ES_INDEX="demo"
ES_INDEX_CHAT_HISTORY="chat-history"
ELSER_MODEL=".elser_model_2_linux-x86_64"  # Adjust based on your deployment
```

**Note**: The ELSER model name varies by platform:

* Linux x86\_64: `.elser_model_2_linux-x86_64`
* Check your Elasticsearch ML models for the exact name

## Step 4: build the RAG application

Now, let's build the RAG application step-by-step. Create a Python file (e.g., `demo.py`) and add the following code snippets.

### Imports and configuration

First, we import the necessary libraries and configure our environment variables. This part of the script loads your API keys and sets up the connection details for Elasticsearch, OpenAI, and Galileo.

```python Python theme={null}
from dotenv import load_dotenv

import os
import time
from typing import Annotated, Sequence

# Load environment variables from your .env file
load_dotenv()

from elasticsearch import Elasticsearch, NotFoundError
from langchain_core.tools.retriever import create_retriever_tool
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.documents import Document
from langchain_elasticsearch import ElasticsearchStore, SparseVectorStrategy
from langchain_elasticsearch import ElasticsearchChatMessageHistory
from langchain_openai import ChatOpenAI
from langgraph.graph import END, StateGraph, START
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode, tools_condition
from typing_extensions import TypedDict

# --- 1. Configuration ---
# Set up your connection details and index names.
# It's recommended to use environment variables for sensitive data.
ES_HOST = os.environ["ES_HOST"]
ES_API_KEY = os.environ["ES_API_KEY"]
ES_INDEX = os.environ["ES_INDEX"] # For example  "demo"
ES_INDEX_CHAT_HISTORY = os.environ["ES_INDEX_CHAT_HISTORY"] # For example "chat-history"
ELSER_MODEL = os.getenv("ELSER_MODEL", ".elser_model_2_linux-x86_64")
```

### 1. Elasticsearch setup

```python Python theme={null}
# --- 2. Elasticsearch Setup ---
# Connect to Elasticsearch ensure your IP is unblocked
print("Connecting to Elasticsearch...")
elasticsearch_client = Elasticsearch(hosts=[ES_HOST], api_key=ES_API_KEY)
print(elasticsearch_client.info())

def setup_elasticsearch():
    """
    Ensures the ELSER model is deployed and sample documents are indexed.
    """
    # 2a. Deploy ELSER Model (Elastic's NLP model for semantic search)
    try:
        elasticsearch_client.ml.get_trained_models(model_id=ELSER_MODEL)
        print(f'ELSER model "{ELSER_MODEL}" is already available.')
    except NotFoundError:
        print(f'ELSER model "{ELSER_MODEL}" not found, starting deployment...')
        elasticsearch_client.ml.put_trained_model(
            model_id=ELSER_MODEL, input={"field_names": ["text_field"]}
        )
        while True:
            status = elasticsearch_client.ml.get_trained_models(
                model_id=ELSER_MODEL,
                include="definition_status"
            )
            if status["trained_model_configs"][0]["fully_defined"]:
                break
            time.sleep(1)
        elasticsearch_client.ml.start_trained_model_deployment(
            model_id=ELSER_MODEL, wait_for="fully_allocated"
        )
        print(f'ELSER model "{ELSER_MODEL}" deployed successfully.')

    store = ElasticsearchStore(
        es_connection=elasticsearch_client,
        index_name=ES_INDEX,
        strategy=SparseVectorStrategy(model_id=ELSER_MODEL),
    )
    sample_docs = [
        Document(
            page_content="""
            Our company offers comprehensive health insurance
            including medical, dental, and vision coverage.""",
            metadata={"source": "employee_handbook"}
        ),
        Document(
            page_content="""
            Remote work policy allows employees to work from home up
            to 3 days per week.""", 
            metadata={"source": "employee_handbook"}
        ),
        Document(
            page_content="""
            The company's vacation policy provides 15 days of paid time
            off for new employees, increasing to 20 days after 3
            years of service.""",
            metadata={"source": "employee_handbook"}
        ),
    ]
    store.add_documents(sample_docs)
    time.sleep(2) # Give time for indexing
    print(f"{len(sample_docs)} documents indexed successfully.")
    return store
```

The code automatically:

* Connects to your Elasticsearch instance
* Connects to the ELSER model for semantic search
* Creates an index and stores sample documents

### 2. agent architecture

```python Python theme={null}
# --- 3. Agent Definition ---
# Define the state, tools, and the graph that powers the agent.


class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]

# The retriever tool searches Elasticsearch for relevant documents.
def setup_agent_and_graph(store: ElasticsearchStore):
    """
    Sets up the agent, tools, and the LangGraph workflow.
    """
    retriever = store.as_retriever()
    retriever_tool = create_retriever_tool(
        retriever,
        "retrieve_workplace_documents",
        "Search and return information about company policies, benefits, and processes.",
    )
    tools = [retriever_tool]

    # Use a model that is good at tool use
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, streaming=True, api_key=os.environ["OPENAI_API_KEY"])
    agent_runnable = llm.bind_tools(tools)

    # 3c. Define the Graph
    # The graph defines the flow of control for the agent.
    def run_agent(state: AgentState):
        """Invokes the agent to decide on the next action."""
        return {"messages": [agent_runnable.invoke(state["messages"])]}

    tool_node = ToolNode(tools)
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", run_agent)
    workflow.add_node("tools", tool_node)
    workflow.add_edge(START, "agent")
    workflow.add_conditional_edges("agent", tools_condition)
    workflow.add_edge("tools", "agent")

    graph = workflow.compile()
    return graph
```

* **State Management**: Uses `AgentState` to track conversation messages
* **Tool Integration**: Creates a retriever tool that searches Elasticsearch
* **LangGraph Workflow**: Defines the flow between agent reasoning and tool usage

### 3. conversation flow

```python Python theme={null}
# --- 4. Run the Agent ---
# Now, we can ask questions and get answers.
def ask_question(graph, question: str, session_id: str):
    """
    Asks a question to the RAG agent and returns the answer.
    """
    chat_history = ElasticsearchChatMessageHistory(
        es_connection=elasticsearch_client,
        index=ES_INDEX_CHAT_HISTORY,
        session_id=session_id
    )

    inputs = {"messages": [HumanMessage(content=question)]}
    final_state = graph.invoke(inputs, config={"recursion_limit": 5})
    response = final_state["messages"][-1].content

    # Save conversation history
    chat_history.add_user_message(question)
    chat_history.add_ai_message(response)
    return response
```

1. User asks a question
2. Agent decides whether to use the retriever tool
3. If needed, searches Elasticsearch for relevant documents
4. Generates a response based on retrieved context
5. Saves the conversation to chat history

Finally, we put everything together. This block of code initializes the Elasticsearch setup, compiles the agent, and starts a Q\&A session. You can see how to call the `ask_question` function with a sample query.

```python Python theme={null}
# Step 1: Set up Elasticsearch index and data
document_store = setup_elasticsearch()

# Step 2: Compile the agent and its workflow
rag_agent_graph = setup_agent_and_graph(document_store)

# Step 3: Start a Q&A session
print("\n--- Starting Q&A Session ---")
session_id = f"session-{int(time.time())}"

# Ask the first question
question1 = "How many vacation days do new hires get?"
print(f"\n❓ Question: {question1}")
answer1 = ask_question(rag_agent_graph, question1, session_id)
print(f"✅ Answer: {answer1}")

# Ask a follow-up question
question2 = "What about health insurance?"
print(f"\n❓ Question: {question2}")
answer2 = ask_question(rag_agent_graph, question2, session_id)
print(f"✅ Answer: {answer2}")
```

## Step 4: run the application

To run your RAG application, save all the code into a single `demo.py` file and execute it from your terminal:

```bash theme={null}
python demo.py
```

### 5. adding Galileo observability

1. Open your [Galileo Console](https://app.galileo.ai/)
2. Navigate to your project (f.e. `elasticsearch-rag-demo`)
3. You'll see traces for each question, showing:
   * Document retrieval steps
   * LLM generation
   * Full conversation context
   * Performance metrics

The script will:

1. **Connect to Elasticsearch** and verify the connection
2. **Use the ELSER model**
3. **Index sample documents** about company policies
4. **Create the RAG agent** with LangGraph workflow
5. **Run sample questions** and display answers and log them to Galileo

Expected output:

```text theme={null}
Connecting to Elasticsearch...
{'name': 'your-cluster', 'cluster_name': '...', ...}
ELSER model ".elser_model_2_linux-x86_64" is already available.
3 documents indexed successfully.

--- Starting Q&A Session ---

❓ Question: How many vacation days do new hires get?
✅ Answer: New hires get 15 days of paid time off, which increases to 20 days
after 3 years of service.

❓ Question: What about health insurance?
✅ Answer: The company offers comprehensive health insurance including medical,
dental, and vision coverage.
```

## Troubleshooting

### Connection issues

* Verify your Elasticsearch host URL and API key
* Ensure your IP is whitelisted if using Elastic Cloud if not using serverless
* Check that Elasticsearch is running and accessible

### ELSER model issues

* Verify the model name matches your platform
* Ensure machine learning features are enabled
* Check that you have sufficient resources for model deployment

### Missing documents in search

* Wait a few seconds after indexing for documents to be available
* Verify the index name matches your configuration
* Check Elasticsearch logs for indexing errors

## Next steps

Now that you have a working RAG application, you can add **evaluation metrics** in Galileo to measure chunk attribution

## Additional resources

* [Elasticsearch Vector Search Documentation](https://www.elastic.co/getting-started/enterprise-search/vector-search)
* [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
* [Galileo Python SDK](/sdk-api/python/sdk-reference)
* [Example Chat App with streaming](https://github.com/rungalileo/sdk-examples/tree/main/python/rag/elastic-chatbot-rag-app)
