Chunk Utilization
Understand Galileo’s Chunk Utilization Metric
Definition: For each chunk retrieved in a RAG pipeline, Chunk Utilization measures the fraction of the text in that chunk that had an impact on the model’s response.
Chunk Utilization ranges from 0 to 1. A value of 1 means that the entire chunk affected the response, while a lower value like 0.5 means that the chunk contained some “extraneous” text which did not affect the response.
Chunk Utilization is closely related to Chunk Attribution: Attribution measures whether or not a chunk affected the response, and Utilization measures how much of the chunk text was involved in the effect. Only chunks that were Attributed can have Utilization scores greater than zero.
What to do when Chunk Utilization is low?
Low Chunk Utilization scores could mean one of two things: (1) your chunks are probably longer than they need to be, or (2) the LLM generator model is failing at incorporating all the relevant information in the chunks. You can differentiate between the two scenarios by checking the Chunk Relevance score. If Chunk Relevance is also low, then you are likely experiencing scenario (1). If Chunk Relevance is high, you are likely experiencing scenario (2).
In case (1), we recommend tuning your retriever to return shorter chunks, which will improve the efficiency of the system (lower cost and latency). In case (2), we recommend exploring a different LLM that may leverage the relevant information in the chunks more efficiently.
Luna vs Plus
We offer two ways of calculating Completeness: Luna and Plus.
Chunk Utilization Luna is computed using Galileo in-house small language models. They’re free of cost. Completeness Luna is a cost effective way to scale up you RAG evaluation workflows.
Chunk Utilization Plus is computed by sending an additional request to your LLM. It relies on OpenAI models so it incurs an additional cost. Chunk Utilization Plus has shown better results in internal benchmarks.
Chunk Attribution and Chunk Utilization are closely related and rely on the same models for computation. The “chunk_attribution_utilization_{luna/plus}” scorer will compute both.
Was this page helpful?