Run this notebook to create this run in your Galileo cluster: https://github.com/rungalileo/examples/blob/main/examples/RAG/evaluate/integrate_galileo_evaluate_with_LangChain.ipynb
In this example, we will demonstrate how to create a Galileo Evaluate run for a Q&A workflow.
Setup: Install Libraries
! pip install promptquality
! pip install --upgrade --quiet langchain langchain-openai langchain-community chromadb langchainhub
Construct Dataset and Embed Documents
For our RAG application, we will have the following pieces.
-
Dataset: Galileo blog post
-
Chunking: LangChain RecursiveCharacterTextSplitter
-
Embeddings: text-embedding-ada-002
-
Vector Store: ChromaDB in-memory
-
Retriever: Chroma document retriever with k=3 docs
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from google.colab import userdata
import os
loader = WebBaseLoader("https://www.rungalileo.io/blog/deep-dive-into-llm-hallucinations-across-generative-tasks")
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(data)
os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
embedding = OpenAIEmbeddings()
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)
retriever = vectordb.as_retriever(search_kwargs={'k': 3})
Define the Pieces of Our Chain
Now we have the retriever, we can build our chain. The chain will:
-
Take in a question.
-
Feed that question to our retriever for some context based on distance in embedding space.
-
Fill out the prompt template with the question and context.
-
Feed the prompt to our chat model.
-
Output and parse the answer from the model.
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.document import Document
from langchain.schema.runnable import RunnablePassthrough
from langchain_openai import ChatOpenAI
from typing import List
def format_docs(docs: List[Document]) -> str:
return "\n\n".join([d.page_content for d in docs])
template = """Answer the question based only on the following context:
{context}
Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0)
chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
Run Our Chain and Submit Callback to Galileo
Next, we will set our Galileo cluster url, API key, and project name in order to define where we want to log our results.
Finally, we can run our chain and configure a callback to the GalileoPromptCallback
to log our results.
import promptquality as pq
os.environ['GALILEO_API_KEY'] = userdata.get('GALILEO_API_KEY_DEMO')
os.environ['GALILEO_CONSOLE_URL'] = 'https://console.demo.rungalileo.io/'
GALILEO_PROJECT_NAME = 'galileoblog-rag'
config = pq.login(os.environ['GALILEO_CONSOLE_URL'])
q_list = [
"What are hallucinations in LLMs?",
"What is the difference between intrinsic and extrinsic hallucinations?",
"How do hallucinations impact abstractive summarization?",
"What are some examples of hallucinations in dialogue generation?",
"How does generative question answering lead to hallucinations?",
"What intrinsic and extrinsic errors occur in neural machine translation?",
"How does data-to-text generation exhibit hallucinations?",
"What are intrinsic and extrinsic object hallucinations in vision-language models?",
"Why is addressing hallucinations important for AI applications?",
"What methods are suggested to mitigate hallucinations in LLMs?"
]
prompt_handler = pq.GalileoPromptCallback(
project_name=GALILEO_PROJECT_NAME, scorers=[pq.Scorers.latency, pq.Scorers.groundedness, pq.Scorers.factuality]
)
chain.batch(q_list, config=dict(callbacks=[prompt_handler]))
prompt_handler.finish()
The callback will return a URL for you to inspect your run in the Galileo Evaluate UI.
In the below run view, you can see each question in our Q&A example. To dive deeper into the retrieved documents and metrics, simply click into any one of the samples in your UI.