Why does implementation of AI fail so often in healthcare?
Where we explore the potential of responsive evaluation for the development of AI solutions
Written by Paul Hiemstra, PhD and Gido Hakvoort, PhD
Introduction
AI is one of the typical answers to rising costs and staff shortages in healthcare. Some fields within healthcare are further along in their journey to integrate AI. In particular, in medical imaging a lot of work has been done in building and developing AI solutions. However, Huisman et al. (2023) note that although many of these AI solutions focus on relevant tasks such as detecting abnormalities in CT scans, this does not mean that the actual problem is solved: high workloads for healthcare professionals. Huisman calls for an honest appraisal of the actual impact of AI in radiology. This mismatch between the developed AI solution and the actual needs of healthcare professionals limits both the impact and implementation of AI.
One of the drivers we identified of this mismatch is the lack of multidisciplinary input into the AI development process (Leenen et al, 2025; Breda et al, 2023). One of the contributors to this lack of input is the the technocentric approach that much of AI is developed with. Typically this approach starts with a certain task that could be performed using AI, for example detecting cancer nodules in a CT scan. The AI researcher measures the performance of this AI using a performance metric, and compares this metric to some benchmark, for example the performance of a healthcare professional. If the performance is similar or better, the AI solution is deemed satisfactory. In our opinion this technocentric approach is too reductive, puts too much emphasis on the AI researcher, and risks losing sight of the healthcare and human context. The needs, interests and requirements of the various people that have a stake in the AI solution cannot be reduced to just one or a few metrics. This reductive effect in our view amplifies the mismatch we see between the development of AI solutions and healthcare practice.
To remedy this mismatch we believe that a much more holistic and collaborative approach is needed. In this article, we explore responsive evaluation (RE) as one of the options to get this holistic and broader view.
Responsive evaluation in a nutshell
In the mid-20th century, a movement emerged that aimed at evaluating large governmental programs, for example aimed at reducing poverty. They wanted to determine what the actual impact and efficacy of these programs was. The predominant idea at that time was for an evaluator to compare intended and measured outcomes, a very technocentric and analytical approach. Stake (Abma and Stake, 2001) ran into the problem that many of the complex societal issues could not be reduced to this reductive intended-measured paradigm. Different stakeholders surrounding the societal issue are hardly ever in agreement with what the correct approach should be, or even whether there is a problem at all. This severely hampers the ability to capture societal issues in straightforward intended-measured outcomes.
Stake et al propose a much more collaborative approach they dubbed responsive evaluation. The responsive part originates from the fact that the evaluator is responsive to what happens in the evaluation process. The process is much more iterative and focuses on differences in viewpoints, feelings and convictions of various stakeholders. This increases the chance that the evaluation process actually solves the correct issue since we broaden the potential sources of insights. In addition, we collect the insights in a more rich manner including values, emotions, and believes, and involves stakeholders throughout the entire evaluation process.
Responsive evaluation and AI
A core concept in responsive evaluation (RE) is that stakeholders each have their own views and values, which determine how they relate to the challenge at hand. RE focusses on addressing “the challenge” rather than simply “developing the AI” to emphasize that the goal of the process is to fix an issue, not just to implement an AI. In our studies (Leenen et al 2025, in submission) we used the following categories of stakeholders, each with their own typical views and values:
- Healthcare professionals: they work directly with patients, and their main focus is to provide the best possible outcome for their patients. Typically, AI is not an inherent interest for them, their interest is linked to the added benefit AI can provide to their work.
- Policy makers: they provide the ethical, legal and administrative framework in which the challenge is addressed. For them, AI is not necessarily an inherent interest either. They are focused on keeping budgets in check, or validate if and where certain legal frameworks apply to the challenge.
- AI researchers: they create the actual AI solutions that address the challenge. They typically have a technological mindset, and an inherent interest in AI. For them, applying new and innovative AI techniques is a good thing for its own sake.
The practical consequence of these different views and values is often a lack of mutual understanding. The healthcare professional thinks the policy maker “is only focused on saving money”, the AI researcher is frustrated that the healthcare professional “is not impressed by an awesome AI solution”, and the policy maker is left scratching her head because “everyone is far too optimistic regarding the impact of new AI legislation”. This lack of understanding limits multidisciplinary collaboration, which we believe is a key factor in creating practically viable AI solutions.
RE tackles this problem head on (Widdershoven (2009); Abma (2009))
- RE actively involves stakeholders during the entire process of addressing the challenge. They do not just provide input and requirements, but are an integral part of the process.
- The evaluation design and execution is emergent. One does not make assumptions in advance, but really engage in an iterative and co-creative process involving all salient stakeholders. This makes the process responsive to the ever growing understanding of what the challenge is that we are trying to solve.
- Experiential knowledge of stakeholders is important. Not just the factual information, but also the feelings and convictions the people have surrounding this factual knowledge. This allows others to comprehend what drives a particular stakeholder to have a certain opinion or take a certain action. This allows people to stand in the shoes of the other person and reach more deep understanding of where they are coming from, i.e. a form of vicarious experience.
- RE focuses on interaction and social learning between stakeholders, increasing mutual understanding. To stimulate social learning, there needs to be a strong focus on openness and respect between stakeholders.
- To enrich and extend the impact of AI solutions in solving the challenge, RE focuses explicitly on stakeholders that normally get little or no voice in the process. In the case of a medical AI solution this means patients or the people that interact on a daily basis with these patients. Sadly, these stakeholders seldomly have a strong say in the process or potential solution to the challenge.
RE sees the development and evaluation of AI as an inherent collaborative process, not a process where just one or two stakeholders build a solution with sporadic interaction with other stakeholders.
How does this work in practice?
The promise of responsive evaluation is enticing, a solution that fixes all our AI implementation woes. Sadly, this is not the case. In practice a collaborative mindset such as RE takes time to introduce into an organization. And subserviently following RE as a rigid process is not in the spirit of what RE tries to accomplish.
We are at the start of this collaborative journey, and do not pretend to have all the answers. We see this article as the start of a discussion, and are very much interested in what the views and values of other people in the field are. We can however provide some advice that we are applying ourselves:
- Create a diverse team that includes all relevant stakeholders: medical personnel, AI researchers, policy makers, and maybe even patients. Invest time into building a positive and safe atmosphere in this team.
- Meet regularly and discuss the results so far. Be ready to adapt the goal and focus of the research if you as stakeholders decide it is the best course of action.
- Actively involve non AI researchers in interpreting the results of AI models.
- But above all, treat AI research as an emergent process: we find the goal and value during the interaction with all our stakeholders.
Acknowledgements
This work is part of the Medical Data Analytics Center project, a collaboration between the Isala hospital, the software company AppBakkers, Windesheim University of Applied Sciences, and has been made possible through a grant from TechForFuture. Special thanks goes out to Barbara Groot, Fenne Verhoeven, and Anouk Jansen for providing feedback on drafts of this article.