Integrate promptquality into your system or test a template model combination through the Playground. Choose and register your metrics to define what success means for your use case.
2
Analyze results
Identify poor perfomance, trace it to the broken step, form hypothesis on what could be behind it.
3
Debug, Fix & Run another Eval
Tweak your system and try again until your quality bar is met.