As our AI financial coach gains traction, the volume of interactions has made comprehensive quality assurance increasingly challenging. Our expanding client base has driven us to our scaling point, necessitating a tech-driven approach to maintain quality without proportionally increasing our workforce.
Scaling a venture inside a corporate environment is similar to scaling a startup. We maintain a strong commercial focus and think carefully about how we allocate our resources. As our venture started gaining traction, the workload and demands on our small team began to increase exponentially.
We turned to scale with technology, which not only frees up our human talent but also relieves them from the burden of repetitive tasks, giving them space to focus on more value-adding activities.
To revolutionise our approach to quality assessment and address our scaling challenges, we developed a solution using LLM-based agents named the Vanguards. In military and leadership contexts, a vanguard is the foremost part of an advancing army or leading position in a movement or field. Our Vanguards embody this definition by being at the forefront of our efforts to ensure the highest standards of quality and efficiency in our AI financial coach interactions.
These autonomous, independent, and critical agents are transforming how we assess and ensure the quality of interactions between actual customers and our AI financial coach. Our Vanguards operate under three distinct yet complementary roles:
- CSAT (Customer Satisfaction) Agent: Ensures interactions meet our high customer satisfaction standards.
- Risk and Compliance Agent: Monitors and enforces adherence to regulatory and compliance guidelines, which is particularly crucial in the financial services sector.
- Product Improvements and Error Detection Agent: This agent identifies areas for product enhancement and detects errors in interactions, driving continuous improvement of our AI financial coach.
While our Vanguards are powered by advanced AI, we recognise the irreplaceable value of human expertise in ensuring their accuracy and reliability. Each Vanguard Agent operates using a meticulously developed scorecard tailored to their specific focus area. These scorecards are created by human experts in each domain—customer satisfaction specialists, compliance officers, and product managers—ensuring that our evaluation criteria are grounded in real-world understanding and industry best practices.
- CSAT Scorecard: Evaluates interactions based on factors like empathy, resolution effectiveness, and overall satisfaction. It helps ensure that our AI financial coach maintains high customer service even as we scale.
- Risk and Compliance Scorecard: Focuses on adherence to legal and regulatory guidelines, data privacy protocols, and ethical standards. This is particularly crucial as we scale our financial advisory services, ensuring we maintain regulatory compliance across all interactions.
- Product Improvements and Error Detection Scorecard: Assesses conversations for recurring issues, potential areas for enhancement, and any errors. This scorecard drives our continuous improvement efforts, ensuring our AI financial coach evolves to meet changing customer needs and market conditions.
One of the key strengths of our Vanguards is their autonomy. Leveraging advanced LLM technology, these agents independently read and analyse transcripts of interactions between actual customers and our AI agents. This autonomy ensures that our assessments are unbiased and not influenced by human error or oversight.
The agents operate on random conversations, which are drawn weekly to create a comprehensive and varied sample for analysis. This random sampling is crucial in maintaining an accurate and representative assessment of our service quality.
The agents provide a comprehensive report of the sample conversations and produce exceptions when any metric breaches its defined threshold. These exceptions deliver actionable insights that help narrow down specific problem areas. The team is still free to read the full report produced by Vanguard, but this allows them to focus on specific items, freeing their time and increasing their efficiency.
Because the Vanguards are powered by LLMs, synthetic insights are created leveraging the inferencing superpower of LLMs, which allows it to analyse and interpret nuances, generate detailed explanations and handle complexity. So, it does not merely provide a numerical score for each item in the scorecard but also explains the reasoning behind the allocation of its score. This allows it to provide insights based on the nuances in the conversation.
This is particularly useful in the case of CSAT. Traditionally, this is obtained by surveying the client directly. While we have this functionality built into the AI financial coach, we find that clients don’t complete it as they merely abandon the conversation once their objective has been completed and their outcome reached. Our experience shows us that post-conversation surveys have a very low response rate. So while we still pursue those avenues for direct client feedback, the synthetic insight derived from the conversation is invaluable in driving our continuous improvement process of our AI financial coach.
While the Vanguards can operate autonomously, this does not eliminate the need for human oversight. We have defined a process where full reports are assessed monthly to ensure the experts are satisfied and offer the opportunity to refine them further. This human oversight is crucial to ensure the long-term accuracy of these models is maintained.
Below is an example of a report produced by the CSAT Vanguard:
Based on the provided conversation history, here's a detailed evaluation of each metric:
Accuracy (1-5): 5
Reasoning: The AI provides accurate information throughout the conversation. It correctly identifies the user's credit score, explains why they don't qualify for a loan, and offers precise tips for improving credit scores in the South African context.
Relevance (1-5): 5
Reasoning: All responses are directly relevant to the user's query about needing a loan. The AI guides the conversation through logical steps, from checking the credit score to explaining loan eligibility and offering improvement tips.
Clarity (1-5): 5
Reasoning: The AI's responses are clear, well-structured, and easy to understand. It uses bullet points and bold text to highlight key information, making it easily digestible for the user.
Interaction Length (1-5): 4
Reasoning: The interaction is productive with several exchanges. The AI guides the user through multiple steps, from loan inquiry to credit score check and improvement tips. However, it could have been more extensive with additional user queries.
User Satisfaction (1-5): 4
Reasoning: The user seems satisfied with the responses, agreeing to have their credit score checked and expressing interest in credit improvement tips. They also agree to be contacted about debt consolidation, indicating satisfaction with the AI's suggestions.
Issue Resolution (1-5): 4
Reasoning: While the AI couldn't approve the loan due to the user's credit score, it provided a clear explanation and offered solutions to improve the situation. The query was effectively addressed, even if the initial request couldn't be fulfilled.
First Contact Resolution (1-5): 3
Reasoning: The immediate query (needing a loan) wasn't resolved in this interaction, but the AI provided valuable information and next steps. A follow-up with a human agent for debt consolidation was arranged, indicating partial resolution requiring further action.
Positive Sentiment (1-5): 4
Reasoning: The conversation maintains a positive tone throughout. The AI uses friendly language and offers constructive advice, even when delivering potentially disappointing news about loan eligibility.
Negative Sentiment (1-5): 5
Reasoning: There is no noticeable negative sentiment in the conversation. Even when discussing the user's low credit score and loan ineligibility, the AI maintains a supportive and solution-oriented approach.
Response Time (1-5): 5
Reasoning: The AI's responses appear to be immediate, with no noticeable delays between user inputs and AI responses.
Resolution Time (1-5): 4
Reasoning: The AI efficiently guides the user through the process, from loan inquiry to credit score check and improvement tips. While the initial query isn't fully resolved, a clear path forward is established quickly.
Direct Feedback (1-5): 3
Reasoning: There's no explicit feedback from the user at the end of the interaction. However, their willingness to engage throughout the conversation and accept follow-up contact suggests a neutral to positive experience.
Implicit Feedback (1-5): 4
Reasoning: The user shows several indicators of satisfaction by agreeing to have their credit score checked, expressing interest in credit improvement tips, and accepting a follow-up call about debt consolidation.
The AI Financial Coach demonstrates excellent performance across most metrics. It provides accurate, relevant, and clear information, maintaining a positive and engaging conversation despite delivering potentially disappointing news about loan eligibility. The AI effectively guides the user through a logical process, offering valuable insights and next steps. While the initial query isn't fully resolved, the AI sets a clear path for improvement and further assistance.
Overall Score: 4.3 out of 5
This high score reflects the AI's strong performance in accuracy, relevance, clarity, and positive engagement. There's room for improvement in areas like first contact resolution and gathering more explicit user feedback, but overall, the AI provides a highly satisfactory user experience.
Our Vanguards represent a significant leap forward in scaling quality assurance for AI-driven financial services. By combining advanced LLM technology with human expertise, we've created a robust, scalable, and consistent solution that allows us to maintain excellence in customer service while growing rapidly.
This innovative approach enables us to handle a much larger volume of interactions without proportionally increasing our workforce, addressing the key challenges of scaling. As we continue to grow and evolve, Vanguards will play a crucial role in ensuring that our AI financial coach delivers consistent, high-quality service to every client, positioning us at the forefront of the AI-driven financial advisory landscape.
Additionally, our experts have identified opportunities to extend this technology across the broader enterprise, supporting similar processes and driving further efficiency gains, even extending to human-to-human conversations.