Results
After successfully evaluating a test plan, it will be marked as Completed." By clicking on the Results button, you can access advanced metrics related to the execution .
Exploring the results
The results view of the test execution offers valuable insights into your agent's performance, enabling you to refine the agent based on objective information derived from the defined test plan.
Summary
The summary card displays key information, including the execution duration, cost, number of test cases, and the date of execution.
Dimensions tabs
Within each tab, you can view results for specific dimensions. Each characteristic corresponding to the dimension will be labeled as either "Requires Attention" or "Validated," ensuring clear visibility of relevant characteristics and their performance in the test plan.
- For Qualitative Characteristics, you can see which test cases passed and which did not.
- For Quantitative Characteristics, you can view the minimum, maximum, and average scores for the test cases.
Test case results
By clicking on each characteristic, a side panel will open, displaying the results for each evaluated test case. This panel includes summary information and details for each test. You can also access the specifics of the test case, which include the expected result, agent response, and the assessment rationale.
Tips
Once the agent results have been evaluated, it’s time to refine your agent based on the insights gained from the test execution. Use this information to enhance the agent and create a new test execution with the updated version.
- Schedule periodic reviews of the agent's performance to identify patterns and areas for improvement based on test results.
- Collect and analyze feedback from users interacting with the agent to understand its strengths and weaknesses, and make adjustments accordingly.
- Expand the training dataset with diverse examples to improve the agent's understanding and response accuracy across various scenarios.
- Develop and implement specific response strategies for different types of inquiries to ensure the agent provides relevant and helpful information.
- Create a new version of the agent and conduct a new test execution to evaluate the improvement prior to publishing