Chatting about bots
By Dulcie Vousden, Head of Data Science at DataKind UK
Insights from our chatbot peer learning group.
A growing number of third sector organisations are developing or considering AI-powered chatbots to help staff access internal documents or support with service delivery. But for every success story, there are also controversial headlines, ethical issues, and unintended consequences.
To tackle this, we brought third sector leaders together for three months in 2025 as a small peer-learning group to discuss responsible chatbot development. Our goal was to provide a safe space for participants to share their prototypes, challenges, and insights.
Although the group was small, we were lucky to have a diversity of experience, from organisations just starting out to those working on deploying their chatbot at scale. Participants could get feedback from their peers and discuss strategies for making design choices and mitigating pitfalls. We are so grateful to everyone who generously and frankly shared their challenges. Here’s what we learned.
What can we use AI chatbots for?
Broadly, the group was interested in three main uses for chatbots:
Internal knowledge management: helping teams access organisational knowledge more efficiently, such as a chatbot that provides instant answers to internal policies from a staff handbook.
Service delivery enhancement: helping human advisors work more efficiently or access a greater breadth of resources, rather than replacing them. An excellent example is Citizens Advice's Caddy chatbot, which provides their advisors with human-approved guidance to improve their response to clients. Nesta’s chatbot for heat pump installers also helps make specialised technical information instantly accessible to professionals in the field.
Triage and self-service: providing direct assistance to clients, driven by pressure on services and a desire to help a greater number or range of beneficiaries. Several organisations were exploring how chatbots might reduce pressure on their teams by helping with simpler queries, initial triage, or self-help resources.
What did we learn?
What tech can we use?
Knowing where to start can be overwhelming, especially given the capacity gap between organisations with in-house technical staff and those without. Before choosing any tool, it’s important to be clear about your goals. Who are you making it for, and what problem are you trying to solve?
The group discussed various approaches to choosing a set of technical tools: Some participants had played with customised versions of ChatGPT, which let you upload documents and customise how the chatbot responds; others were exploring enterprise solutions built on platforms like Azure and AWS. A couple had built custom applications using code in Python. Many organisations were interested in using alternative, smaller or open-source models, but found it difficult to navigate the options. AI-assisted and no-code tools like softr or v0 by vercel provide options for prototyping, but while that makes it easier to get started, you will still need the expertise to develop and maintain your app later on.
Participants who had already built prototypes shared reassuring advice: nearly any tool works for prototyping, so they used what they already had. It wasn’t a problem to change aspects of the prototype later.
The other key piece of advice was to build something modular. Because AI models and tech tools change all the time, chatbots work best when built in a modular way—where parts like the model (such as GPT-4o and Claude Opus 4.1), the prompts that guide it, and the user interface can be swapped or updated without rebuilding everything. Frameworks like LangChain, LlamaIndex, and Semantic Kernel, along with enterprise platforms like Azure OpenAI, make it easier to organise these modular components.
What should I evaluate?
Evaluating your prototypes requires several approaches, including quantitative metrics, qualitative feedback, and preemptive testing. The good news is you don’t have to start from scratch - there are existing frameworks that can help. We like CAST’s AI Experimentation canvas as a light-touch, initial assessment. For metrics, one organisation suggested the RAGAS framework for automated evaluation alongside user testing. Another tracked efficiency (time saved), accuracy (percentage of responses needing editing), and user confidence, alongside more qualitative feedback, such as whether their testers liked using the prototype.
We identified continuous user feedback and testing as really important for improving the quality of the chatbot: when one organisation found that users were asking vague questions and getting unhelpful responses, they trained staff on ‘prompt engineering’ - how to ask an AI questions to get the most useful answers. For your team to embrace using a new tool, training is as important as technology.
How do you ensure ethical, responsible, and inclusive design?
Beyond GDPR compliance, everyone was keen to learn how they could test tools for accessibility, accuracy, bias and unintended harms. What happens when the bot gets something wrong about something important?
The main approach so far has been to keep humans in the loop and focus on enhancing staff support, versus providing direct client services. But there is interest in client-facing support for lower-risk purposes, with the goals of broadening reach or increasing the capacity of already-stretched staff.
There are useful guides to safe implementation from UNICEF and Girl Effect, as well as UNESCO’s helpful guide to red teaming, a way of testing AI systems for vulnerabilities or bias. You can also take a look at Doteveryone’s ‘consequence scanning’ manual.
Chatbot-specific accessibility testing remains an open question, so if you know of a good tool, please tell us!
What liability do we face?
The governance risks associated with using these tools in high-accuracy service delivery situations were a concern. Participants grappled with questions about liability and insurance requirements. If your chatbot gives wrong advice and someone acts on it, who's responsible?
These questions become especially urgent when serving vulnerable populations or providing critical services. The sector needs more guidance on consent, data protection, and safeguarding in chatbot contexts, aligned with the UK regulatory landscape. We need sector-specific evaluation and risk assessment frameworks that account for the particular responsibilities charities carry.
How do we move from a prototype to something ‘live’?
Several participants had prototyped a working demo chatbot, running on one person’s computer, that could be tested by small groups of people. The challenge was how to take a proof of concept and build a ‘real’ version, used by tens, hundreds, or even thousands of actual users.
This transition - called moving to production - is where you transform a prototype into a system ready for real-world use, and it can be a lot harder than prototyping. Production introduces many complex challenges: handling concurrent users, ensuring security, managing errors, and maintaining reliability. What happens when fifty users use your chatbot at once? How do you stop unauthorised access? What's your plan if it confidently gives incorrect advice? Building something robust enough for daily use by many people is complex and more costly.
Can we afford this?
Several participants recognised that ongoing costs, including human resources, extend far beyond initial development. There are costs to maintenance, servers, updates, monitoring, team training, and staff oversight.
Innovation funding may support prototypes, but it’s difficult to sustain. The jump from pilot to production requires funding models that recognise the iterative nature of AI development and the ongoing costs of responsible deployment.
Where next?
The group was a wonderful opportunity to share resources and support each other through critical choices about if, and how, to implement AI. In the future, we would love to invite guest experts to add more valuable expertise on top of the peer learning. We’d also love to move from bringing together people at different stages, to developing more focussed sessions or action learning sets with those at a similar points, from ideation to production.
Please reach out if you are interested or could help us to fund and bring together future groups.