Create a new Evaluate Run
Create a new Evaluate run with workflows.
Use this endpoint to create a new Evaluate run with workflows. The request body should contain the workflows
to be ingested and evaluated.
Additionally, specify the project_id
or project_name
to which the workflows should be ingested. If the project does not exist, it will be created. If the project exists, the workflows will be logged to it. If both project_id
and project_name
are provided, project_id
will take precedence. The run_name
is optional and will be auto-generated (timestamp-based) if not provided.
The body is also expected to include the configuration for the scorers to be used in the evaluation. This configuration will be used to evaluate the workflows and generate the results.
curl --request POST \
--url https://api.acme.rungalileo.io/v1/evaluate/runs \
--header 'Content-Type: application/json' \
--header 'Galileo-API-Key: <api-key>' \
--data '{
"project_name": "my-evaluate-project",
"run_name": "my-evaluate-run",
"scorers": [
{
"name": "correctness"
},
{
"name": "output_pii"
}
],
"workflows": [
{
"created_at_ns": 1739567790708355300,
"duration_ns": 0,
"input": "who is a smart LLM?",
"metadata": {},
"name": "llm",
"output": "I am!",
"type": "llm"
}
]
}'
{
"message": "<string>",
"project_id": "<string>",
"project_name": "<string>",
"run_id": "<string>",
"run_name": "<string>",
"workflows_count": 123,
"records_count": 123
}
WorkflowStep
A workflow step is the atomic unit of logging to Galileo. They represent a single execution of a workflow, such as a chain, agent, or a RAG execution. Workflows can have multiple steps, each of which can be a different type of node, such as an LLM, Retriever, or Tool.
You can log multiple workflows in a single request. Each workflow step must have the following fields:
type
: The type of the workflow.input
: The input to the workflow.output
: The output of the workflow.
Examples
LLM Step
{
"type": "llm",
"input": "What is the capital of France?",
"output": "Paris"
}
Retriever Step
{
"type": "retriever",
"input": "What is the capital of France?",
"output": [{ "content": "Paris is the capital and largest city of France." }]
}
Multi-Step
Workflow steps of type workflow
, agent
or chain
can have sub-steps with children. A workflow with a retriver and an LLM step would look like this:
{
"type": "workflow",
"input": "What is the capital of France?",
"output": "Paris",
"steps": [
{
"type": "retriever",
"input": "What is the capital of France?",
"output": [{ "content": "Paris is the capital and largest city of France." }]
},
{
"type": "llm",
"input": "What is the capital of France?",
"output": "Paris"
}
]
}
Authorizations
Body
List of workflows to include in the run.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Input to the step.
Type of the step. By default, it is set to workflow.
"workflow"
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
Parent node of the current node. For internal use only.
Input to the step.
Type of the step. By default, it is set to workflow.
chain
, chat
, llm
, retriever
, tool
, agent
, workflow
, trace
Output of the step.
Name of the step.
Timestamp of the step's creation, as nanoseconds since epoch.
Duration of the step in nanoseconds.
Metadata associated with this step.
Status code of the step. Used for logging failed/errored steps.
Ground truth expected output for the step.
Steps in the workflow.
Parent node of the current node. For internal use only.
List of Galileo scorers to enable.
"agentic_workflow_success"
List of filters to apply to the scorer.
eq
, ne
, contains
"node_name"
"string"
"plus"
Alias of the model to use for the scorer.
Number of judges for the scorer.
1 < x < 10
List of registered scorers to enable.
Name of the scorer to enable.
List of filters to apply to the scorer.
eq
, ne
, contains
"node_name"
"string"
List of generated scorers to enable.
Name of the scorer to enable.
List of filters to apply to the scorer.
eq
, ne
, contains
"node_name"
"string"
Evaluate Project ID to which the run should be associated.
Evaluate Project name to which the run should be associated. If the project does not exist, it will be created.
Name of the run. If no name is provided, a timestamp-based name will be generated.
Was this page helpful?
curl --request POST \
--url https://api.acme.rungalileo.io/v1/evaluate/runs \
--header 'Content-Type: application/json' \
--header 'Galileo-API-Key: <api-key>' \
--data '{
"project_name": "my-evaluate-project",
"run_name": "my-evaluate-run",
"scorers": [
{
"name": "correctness"
},
{
"name": "output_pii"
}
],
"workflows": [
{
"created_at_ns": 1739567790708355300,
"duration_ns": 0,
"input": "who is a smart LLM?",
"metadata": {},
"name": "llm",
"output": "I am!",
"type": "llm"
}
]
}'
{
"message": "<string>",
"project_id": "<string>",
"project_name": "<string>",
"run_id": "<string>",
"run_name": "<string>",
"workflows_count": 123,
"records_count": 123
}