Run this notebook to create this run in your Galileo cluster: https://github.com/rungalileo/examples/blob/main/examples/RAG/evaluate/integrate_galileo_evaluate_with_LangChain.ipynb

In this example, we will demonstrate how to create a Galileo Evaluate run for a Q&A workflow.

Setup: Install Libraries

    ! pip install promptquality
    ! pip install --upgrade --quiet langchain langchain-openai langchain-community chromadb langchainhub

Construct Dataset and Embed Documents

For our RAG application, we will have the following pieces.

  • Dataset: Galileo blog post

  • Chunking: LangChain RecursiveCharacterTextSplitter

  • Embeddings: text-embedding-ada-002

  • Vector Store: ChromaDB in-memory

  • Retriever: Chroma document retriever with k=3 docs


    from langchain_openai import OpenAIEmbeddings
    from langchain.vectorstores import Chroma
    from langchain.document_loaders import WebBaseLoader
    from langchain.text_splitter import RecursiveCharacterTextSplitter
    from google.colab import userdata
    import os

    # Load sample data (text) from webpage
    loader = WebBaseLoader("https://www.rungalileo.io/blog/deep-dive-into-llm-hallucinations-across-generative-tasks")
    data = loader.load()

    # Split text into documents
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
    splits = text_splitter.split_documents(data)

    # Define key to embed docs via OpenAI embeddings
    os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

    # Embed split text and insert into vector db
    embedding = OpenAIEmbeddings()
    vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

    # Create our retriever
    retriever = vectordb.as_retriever(search_kwargs={'k': 3})

Define the Pieces of Our Chain

Now we have the retriever, we can build our chain. The chain will:

  1. Take in a question.

  2. Feed that question to our retriever for some context based on distance in embedding space.

  3. Fill out the prompt template with the question and context.

  4. Feed the prompt to our chat model.

  5. Output and parse the answer from the model.


    from langchain.prompts import ChatPromptTemplate
    from langchain.schema import StrOutputParser
    from langchain.schema.document import Document
    from langchain.schema.runnable import RunnablePassthrough
    from langchain_openai import ChatOpenAI
    from typing import List

    def format_docs(docs: List[Document]) -> str:
        return "\n\n".join([d.page_content for d in docs])

    template = """Answer the question based only on the following context:

        {context}

        Question: {question}
        """
    prompt = ChatPromptTemplate.from_template(template)
    model = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)

    chain = (
        {"context": retriever | format_docs, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
    )

Run Our Chain and Submit Callback to Galileo

Next, we will set our Galileo cluster url, API key, and project name in order to define where we want to log our results.

Finally, we can run our chain and configure a callback to the GalileoPromptCallback to log our results.

    import promptquality as pq

    # Environment variable 'GALILEO_API_KEY' will be retrieved by the login() sequence to the Galileo cluster url
    os.environ['GALILEO_API_KEY'] = userdata.get('GALILEO_API_KEY_DEMO')
    os.environ['GALILEO_CONSOLE_URL'] = 'https://console.demo.rungalileo.io/'
    GALILEO_PROJECT_NAME = 'galileoblog-rag'
    config = pq.login(os.environ['GALILEO_CONSOLE_URL'])

    q_list = [
        "What are hallucinations in LLMs?",
        "What is the difference between intrinsic and extrinsic hallucinations?",
        "How do hallucinations impact abstractive summarization?",
        "What are some examples of hallucinations in dialogue generation?",
        "How does generative question answering lead to hallucinations?",
        "What intrinsic and extrinsic errors occur in neural machine translation?",
        "How does data-to-text generation exhibit hallucinations?",
        "What are intrinsic and extrinsic object hallucinations in vision-language models?",
        "Why is addressing hallucinations important for AI applications?",
        "What methods are suggested to mitigate hallucinations in LLMs?"
    ]

    # Create callback handler
    prompt_handler = pq.GalileoPromptCallback(
        project_name=GALILEO_PROJECT_NAME, scorers=[pq.Scorers.latency, pq.Scorers.groundedness, pq.Scorers.factuality]
    )

    # Run your chain experiments across multiple inputs with the galileo callback
    chain.batch(q_list, config=dict(callbacks=[prompt_handler]))

    # publish the results of your run
    prompt_handler.finish()

The callback will return a URL for you to inspect your run in the Galileo Evaluate UI.

In the below run view, you can see each question in our Q&A example. To dive deeper into the retrieved documents and metrics, simply click into any one of the samples in your UI.