To log your runs with Galileo, you’d start with the same typical flow of logging into Galileo:

import promptquality as pq

pq.login()

Next you can construct your EvaluateRun object:

from promptquality import EvaluateRun

metrics = [pq.Scorers.context_adherence_plus, pq.Scorers.prompt_injection]

evaluate_run = EvaluateRun(run_name="my_run", project_name="my_project", scorers=metrics)

Then you can generate your workflows. One workflow represents one end to end interaction. Each input in your evaluation dataset corresponds to one workflow, which can have multiple steps. Each evaluate run will consist of multiple workflows. Here’s an example of how you can log your workflows using your llm app:

def my_llm_app(input, evaluate_run):
    context = "You're an AI assistant helping a user with hallucinations."
    template = "Given the following context answer the question. \n Context: {context} \n Question: {question}"
    wf = evaluate_run.add_workflow(input=input)
    # Get response from your llm.
    prompt = template.format(context=context, question=input)
    llm_response = llm.call(prompt) # Pseudo-code, replace with your LLM call.
    # Log llm step to Galileo
    wf.add_llm(input=prompt, output=llm_response, model=<model_name>)
    # Conclude the workflow and add the final output.
    wf.conclude(output=llm_response)
    return llm_response

# Your evaluation dataset.
eval_set = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]
for input in eval_set:
    my_llm_app(input, evaluate_run)

Finally, log your Evaluate run to Galileo:

evaluate_run.finish()

Logging RAG Workflows

If you’re looking to log RAG workflows it’s easy to add a retriever step. Here’s an example with RAG:

def my_llm_app(input, evaluate_run):
    template = "Given the following context answer the question. \n Context: {context} \n Question: {question}"
    wf = evaluate_run.add_workflow(input=input)
    # Fetch documents from your retriever
    documents = retriever.retrieve(input) # Pseudo-code, replace with your real retriever.
    # Log retriever step to Galileo
    wf.add_retriever(input=input, documents=documents)
    # Get response from your llm.
    prompt = template.format(context="\n".join(documents), question=input)
    llm_response = llm.call(prompt) # Pseudo-code, replace with your LLM call.
    # Log llm step to Galileo
    wf.add_llm(input=prompt, output=llm_response, model=<model_name>)
    # Conclude the workflow and add the final output.
    wf.conclude(output=llm_response)
    return llm_response

# Your evaluation dataset.
eval_set = [
    "What are hallucinations?",
    "What are intrinsic hallucinations?",
    "What are extrinsic hallucinations?"
]
context = "You're an AI assistant helping a user with hallucinations."
for input in eval_set:
    my_llm_app(input, evaluate_run)

Logging Agent Workflows

We also support logging Agent workflows. Here’s an example of how you can log an Agent workflow:

agent_wf = evaluate_run.add_agent_workflow(input=<input>, output=<output>, duration_ns=100)
agent_wf.add_tool(
    input=<tool query>, output=<tool response>, duration_ns=50
)

Logging Retriever and LLM Metadata

If you want to log more complex inputs and outputs to your nodes, we provide support for that as well. For retriever outputs we support the Document object.

wf = evaluate_run.add_workflow(input="Who's a good bot?", output="I am!", duration_ns=2000)
wf.add_retriever(
    input="Who's a good bot?",
    documents=[pq.Document(content="Research shows that I am a good bot.", metadata={"length": 35})],
    duration_ns=1000
)

For LLM inputs and outputs we support the Message object.

wf = evaluate_run.add_workflow(input="Who's a good bot?", output="I am!", duration_ns=2000)
wf.add_llm(
    input=pq.Message(content="Given this context: Research shows that I am a good bot. answer this: Who's a good bot?"),
    output=pq.Message(content="I am!", role=pq.MessageRole.assistant),
    model=pq.Models.chat_gpt,
    input_tokens=25,
    output_tokens=3,
    total_tokens=28,
    duration_ns=1000
)

Often times an llm interaction consists of multiple messages. You can log these as well.

wf = evaluate_run.add_workflow(input="Who's a good bot?", output="I am!", duration_ns=2000)
wf.add_llm(
    input=[
        pq.Message(content="You're a good bot that answers questions.", role=pq.MessageRole.system),
        pq.Message(content="Given this context: Research shows that I am a good bot. answer this: Who's a good bot?"),
    ],
    output=pq.Message(content="I am!", role=pq.MessageRole.assistant),
    model=pq.Models.chat_gpt,
)

Logging Nested Workflows

If you have more complex workflows that involve nesting workflows within workflows, we support that too. Here’s an example of how you can log nested workflow using conclude to step out of the nested workflow, back into the base workflow:

wf = evaluate_run.add_workflow("input", "output", duration_ns=100)
# Add a workflow inside the base workflow.
nested_wf = wf.add_workflow(input="inner input")
# Add an LLM step inside the nested workflow.
nested_wf.add_llm(input="prompt", output="response", model=pq.Models.chatgpt, duration_ns=60)
# Conclude the nested workflow and step back into the base workflow.
nested_wf.conclude(output="inner output", duration_ns=60)
# Add another LLM step in the base workflow.
wf.add_llm("outer prompt", "outer response", "chatgpt", duration_ns=40)