We're Overestimating Medical AI — and Underestimating the Harm (Jessica Morley, Yale) | Like A Girl Media, a podcast agency for elevating women's voices in healthcare

AI ethicist Jess Morley: these chatbots are giving medical advice — so regulate them as medical devices. Part of The Agentic Patient, a Faces of Digital Health series on how patients actually use AI — which tools, which prompts, which safeguards. In this episode, host Tjaša Zajc sits down with Dr Jess Morley, Associate Research Scientist at the Yale Digital Ethics Center and a former AI subject-matter expert at the UK Department of Health and Social Care, for a clear-eyed account of where health AI is going wrong — and how to use it well anyway. Morley argues we systematically overestimate what these tools can do and underestimate the harm. She makes the case for "skeptical optimism," explains why bioethics principles built for one-to-one care break down against many-to-many AI harms, and reframes ambient scribes as inference engines rather than transcription services — with real consequences for coding, billing and patient records. Then she gets practical: the guardrails, prompts and habits patients (and clinicians) can use today. Guest: Dr Jessica Morley — Associate Research Scientist, Yale Digital Ethics Center; formerly UK Department of Health and Social Care and the Bennett Institute, University of Oxford. What the conversation covers: - Why "skeptically optimistic" is the honest position on health AI - AI adoption as "a hammer looking for nails" — and what needs-led design would look like instead - OpenEvidence, EU rules and the question of regulatory capture - The DeepMind–Royal Free case and why law alone isn't enough - Beneficence, non-maleficence, autonomy, justice — and where they fail for AI - Ambient AI scribes, miscoding, billing inflation and phantom tests - Paid vs free models and the widening access gap - The "ask why" rule and knowing when to walk away from a chatbot - Red-teaming your own assumptions and playing models off each other - Building a personal "harness" with skills so AI works from your history - The last-mile problem and the case for regulating LLMs as medical devices - Whether AI is narrowing how clinicians think Chapters: 02:50 — Intro: The Agentic Patient and the case for skeptical optimism 05:52 — "A hammer looking for nails": adoption pressure without a plan 07:25 — OpenEvidence, EU rules and regulatory capture 09:42 — The DeepMind–Royal Free lesson: why law needs ethics 13:29 — The bioethics principles and what they were built to do 19:40 — Autonomy, consent and the ambient-scribe problem 21:49 — Scribes as inference engines: miscoding, fraud and phantom tests 29:06 — Paid vs free models and the access gap 33:25 — Using AI safely: the "ask why" rule 37:38 — Knowing when to walk away: engagement design and degradation 44:58 — Red-teaming and playing models off each other 49:00 — Harnesses and skills: making the model work for you 51:38 — The last-mile problem and regulating AI as a medical device 58:00 — Does AI narrow the clinician's mind? The Agentic Patient series: https://www.facesofdigitalhealth.com/agentic-patient-blog Website: https://www.facesofdigitalhealth.com Newsletter: https://fodh.substack.com LinkedIn: https://www.linkedin.com/company/faces-of-digital-health

[00:00:03] Dear listeners, welcome to Faces of Digital Health, a podcast about digital health and how healthcare systems around the world adopt technology with me, Tjasa Zajc. And this is a special series called The Agentic Patient, where we explore the good uses of AI from the patient perspectives, what we should be mindful of when we use AI from that lens, and what are some of the best practices worth keeping in mind.

[00:00:32] Next time we turn to AI for medical issues. The dominant story about medical AI runs into two directions. You've got stories about the limitless promise of AI or stories about the catastrophe and dangers. Jessica Morley, an AI ethicist at Yale's Digital Ethics Center and former UK Department of Health and Social Care advisor,

[00:01:00] refuses this dichotomy, settling on what she calls skeptical optimism. She's enthusiastic for special uses, uneasy about the overall trajectory of healthcare AI, but also uses AI herself and has some great practices to share. In the discussion you are about to hear, we discussed two things.

[00:01:26] The first half of the discussion is dedicated to questioning where we currently are with healthcare AI, where Jessica's central charge is that healthcare is deploying AI as a hammer in search of nails, chasing what was cheap to build because that is where the data happens to sit, rather than what kind of solutions patients and clinicians actually struggle with in the clinical setting.

[00:01:55] In the second part of the discussion, I asked Jessica for her concrete prompts, her ideas and insights into where AI can be very helpful. So if you're interested in that, make sure to skip to the second part of the discussion. Now let's dive in today's discussion. And if you haven't yet, subscribe to the podcast. Also check out the Agentic Patient series, which you can find on our website, facesofdigitalhealth.com.

[00:02:25] And a summary with the first six tips in the newsletter, which I will link to this episode. All the newsletters can be found at fodh.substack.com. That's fodh.substack.com. Now let's dive in today's discussion.

[00:02:53] Jessica, hi, and thank you so much for joining me again on Faces of Digital Health. This time for a podcast series that I named The Agentic Patient, where we basically explore which prompts, which tools do patients use. But I don't only speak to patients, I also speak to researchers.

[00:03:14] I try to speak with doctors to get both sides of the story because it can have positive, but it can also have negative consequences. You recently had a four-part series about AI to just expand people's knowledge and basic understanding of what AI is and what AI isn't. And when I was looking at this series, I was actually thinking, what's your opinion as an AI ethicist?

[00:03:43] Are you concerned about AI or not? Where do you stand? Well, that's such a good question. So first of all, thank you for having me. And it's great to be back, and especially as part of this particular series. I suppose in Am I Worried About AI? There's different levels to that answer. At a sort of macro level, yes. Because I don't think we're harnessing it in quite the right way.

[00:04:12] I think there's a little bit of a problem systematically at the moment to underestimate the harm and to overestimate the capability. But if we dig down into a sort of lesser layer, there are specific use cases I'm not worried about or I'm more excited about. And so my answer ends up being mixed, which is why I often say I'm skeptically optimistic.

[00:04:37] I feel good about the opportunities, but I'm also quite conscious that I think we're on the wrong path at the moment. And that makes me a little bit worried. Do you think when you say we're on the wrong path, do you think the world globally? Because in Europe, we often feel as patients that, yeah, innovators are complaining that regulation is stalling things and progress.

[00:05:02] But as patients, at least we feel safe about our data and everything that's happening with it. Yeah, I think across the whole world, really. But I will focus, let's focus for now on the sort of Western world. So UK, Europe, the US, and then we can maybe talk about other parts of the world later. But for now, let's just focus on them.

[00:05:24] I think where I say I think we're going a bit wrong is that it's like AI is a hammer and everything is a nail, but there's no real plan. So there's an enormous amount of pressure being put on healthcare systems from governments to say, please adopt AI. At the same time, they're also getting that same pressure from developing companies who have produced these models or these algorithms. Say, please do this.

[00:05:52] It's going to deliver all this enormous benefit. And isn't it going to be wonderful? But a lot of that conversation is not very needs led. It's not really targeted at the exact specific main problems that the healthcare systems are actually focusing on or really struggling with, or indeed what patients want from these types of tools. It's just what's been built. And most of the time, what's been built has been built because that's where the data was.

[00:06:19] And that's what those companies thought was the lowest hanging fruit, essentially, as opposed to what I would really like to happen is for either healthcare institutions and patients themselves to say, this is a problem we're really stuck with. It's really difficult to do. Is there a way in which AI can help us do that? And I think if we were doing that latter option, which is basically, I'm not saying anything groundbreaking,

[00:06:47] that is the sort of foundation of user-led or needs-led design, I think we would be in a slightly better place. So many questions there. One is when you talk about the pressure to adopt AI, I think that open evidence, which is it's built for clinicians, it's based on medical research.

[00:07:13] So it checks a lot of boxes of being a really useful tool for clinicians. But from the pressure perspective, they did a very interesting thing when, because of the EU regulation, they stopped making open evidence available to the EU and UK consumers. I just checked their website and they currently, they changed it.

[00:07:38] But basically, when they initially did that, they had something like a note that also said, if you disagree that this is not available, write it directly to the EU commission, which was like a direct call to action for grassroots pressure on the legislators to loosen things up. Loosen things up, yeah. Yeah, I don't know if you saw that.

[00:08:06] I saw a little bit of it, yeah. I think it's part of a wider ongoing thing that we're definitely seeing. We see it in not all industries, but it's definitely happening in AI and health, this sort of regulatory capture and lobbying effect that's happening. And people are not quite sure where that line is. Do I listen to the experts because I'm a regulator and I don't have the technical expertise in order to make decisions about this?

[00:08:34] So I have to go and speak to the people who are developing this technology, who have the technical expertise that I don't have. And when does that sort of consultation start blurring into this type of thing, which is really lobbying on behalf of the industry rather than on behalf of patients or clinicians?

[00:08:54] Earlier, you talked about basically needs-based developments and also building based on the data that you have access to. And also in your series, you mentioned the case from years ago when DeepMind was working in the UK and using hospital data. There was a, it was a whole scandal, the fact that they basically got access to the data.

[00:09:24] But the argument was that basically the hospital gave them the data in order to provide direct care. So that was the justification that felt okay to everyone. So what made me wonder here is to which extent do you think that ethics can be bended? So you find the right conviction for yourself that you're doing the right thing. It's a very tricky thing, especially because rules also change.

[00:09:54] Our perception as society changes in terms of what's acceptable and what isn't. Yeah, absolutely. Which is why I think you need both ethics and the law. So in that DeepMind and the Royal Free, it was the Royal Free Hospital in this particular instance. To be clear, nobody did this on purpose. It was a sort of misunderstanding, I think, of what had happened.

[00:10:16] But the law requires you to determine whether or not, in the UK anyway, to determine whether or not data is being used for one of three purposes. Which is direct care, which is essentially I am being directly looked after. It's something to do with your prescription or you're being seen by a clinician. Or there is research or there's what's called service analytics or sort of service performance.

[00:10:41] Now, each of those different categories has different requirements with regards to consent. And this is where things went wrong because for direct care, you don't need direct patient consent to share the record. And that's because it's predicated on this idea that if I am in a hospital and I need to see both a head doctor and a foot doctor, it would be inefficient to ask the patient every time you need to hand over the clipboard. Whereas with in the data space, that starts to break down.

[00:11:11] And the fact is really that those three categories in the AI space are collapsed. We don't have a really clear boundary between direct care, service analytics and research. They're all of one, which is essentially why you need ethics. Because ethics allows you to do contextualisation and to keep pace with that societal expectation changes in the way that you just described. In the way that the law often can't. The law is pretty slow to move.

[00:11:40] It's often left open to some degree of interpretation because there needs to be a degree of flexibility. And creating, as you say, hard rules often makes things more risky rather than less risky because it allows people to jump through gaps slightly more easily. Hence the Swiss cheese model of risk. Whereas that sort of ethical context, which is really about asking should we rather than could we, is what helps there.

[00:12:09] So had there been more, I think, ethical discussion in that particular scenario and asking patients, for example, does this feel like something you're happy with? Does it feel as though it's something that's socially acceptable? I think that probably would have got you a better answer than the hospital misinterpreting the law. So there is a really definitely a case where people can bend things to their own will.

[00:12:37] We talk about things, we've talked about this problem, ethics shopping, which is essentially when people go looking for the lowest common denominator. What's your interpretation of transparency versus your interpretation of transparency? Yours is much easier to be compliant with, so I'm going to just pick yours. That's not really how ethics should work. And so if they work together as a package, they should, in theory, help prevent this flexibility.

[00:13:05] But it is a tension because on the one hand, you want contextual flexibility. And on the other hand, you don't want, as I said, this sort of ethics shopping effect where people go jumping around and pick the rules that are most applicable and most appropriate and the easiest for them, which is really the problem that you are describing. This is probably a too simplified question.

[00:13:29] But are there any ethical principles that you think we should remind ourselves more often at the moment or any ethical principles that are slowly disappearing but shouldn't? That's a great question.

[00:14:13] Non-maleficence is the do good thing. I should be actively promoting some form of good. Non-maleficence tends to be the one that people are most familiar with because that's the one that gets subsumed in the Hippocratic oath of do no harm. And then justice is most often operationalised as fairness, although there are slightly different interpretations of that depending on which particular line of ethical reasoning you're taking.

[00:14:39] Are they being forgotten or should they be remembered is an interesting one in the context of AI because it's a little bit of both. People have got very hung up on those principles. We had those principles come out. The WHO, so the World Health Organisation, did a version of them for AI and health and care. They're slightly different wording, but it's essentially the same thing.

[00:14:59] And I think for a long time this sort of idea of, oh, as long as I say I'm abiding by those principles, it was letting people get away with things because they're very difficult to interpret. They're slippery. They're easy to manipulate, as we were just saying. They're also difficult to enforce. Of course, it's quite unclear what does it mean to design a non-maleficent algorithm.

[00:15:27] We know that fairness, justice, fairness, there's multiple different ways in which that can be operationalised. So I think in some extent by them becoming these sort of canonical principles that people were really familiar with, their actual meaning and their purpose, which is really to facilitate conversation.

[00:15:49] That's how they're designed is to facilitate conversation and you make decisions with them based on the conversation, not in a sort of hard line. This is what you always do type of way. That probably got forgotten and could do with being re-brought back to the surface.

[00:16:08] The other thing that's interesting about those sort of ethical principles in the context of AI, and especially in the sort of generative AI space, is that those principles were really designed on a sort of one-to-one relationship. So they're there to help clinicians in a hospital who might be on an ethics board in a hospital make decisions with regards to a particular patient. So this patient has come in, this patient is in a coma, this patient can't consent. What does that look like?

[00:16:38] Or maybe very occasionally you might have scenarios in a hospital where you've got 10 patients all needing a ventilator and you've only got eight ventilators. Then they might have a conversation with regards to who does that get the ventilator go to. But by and large, they're about governing the relationship between an individual intervention and the individual patient.

[00:17:01] Most of the time what we see with AI is that harms and the things that we should be governing from an ethical perspective are not operating in that one-to-one way. They are operating in a many-to-many way. So yes, I might be talking to a chatbot or I might be getting triaged or I might be using something that's like a diagnostic algorithm.

[00:17:23] But, and that could get it wrong for me, but at the same time, that algorithm is producing that response based on hundreds of other patients' data that it's been trained on. And it's essentially pattern matching. And at the same time, it's doing the same thing to hundreds of other people at the same time. And where you see the risks come up is things where we have bias or discrimination or group level harm.

[00:17:51] And we don't really, at least in the sort of Western world, because our tradition of ethics is really individually based, have a framework for thinking that through in a principles perspective. And I think that's what's lacking most.

[00:18:13] So more specific kind of principles based on differences in cultures and beliefs. Yes, differences in cultures, but also, so autonomy, right? Let's take autonomy. There are questions that arise with how you operationalize autonomy at an individual level in the world of AI, because there is things like, when do I consent? How do I consent?

[00:18:41] Are patients able to opt out of their, of AI being used in their care? What happens if they do opt out of their patients being used in their care? Are patients allowed to disagree with a recommendation that comes from an algorithm? All of those types of things are really questions with regards to autonomy. And autonomy is doing really good work there at helping us think through individual patient people harms.

[00:19:05] Now, the problem is that most of the time where we see risk of ethical harm happening in the context of AI and healthcare is it's not really just me misdiagnosing you or me undermining your autonomy.

[00:19:23] I use an AI scribe in the hospital, and my hospital doesn't have a policy in place that says that I have to tell patients that data is being used or that I'm recording it on every interaction, every consultation on an ambient scribe. That data is going, it's being processed in the cloud by whoever the provider of that is. I have no means of knowing whether or not the privacy was intended, how those things are associating what's being spoken to different types of codes.

[00:19:54] That's a very different way of thinking about how do we protect autonomy, because that's impacting everybody at once. And that everybody at once also means that there are variability, there are various groups in that who are going to be harmed more than others. Whether that be, for example, people who are experiencing domestic violence.

[00:20:16] If they knew that there was something that was listening to them, they might not feel comfortable disclosing that information. We already know that there was a study that came out last week that literally shows when people are interacting with some form of AI, they are less disclosive. They are less likely to give all of their details away. That harm concentrates in specific vulnerable groups of people.

[00:20:42] And we do not really have a good set of principles for dealing with that problem. We never have, to be clear, which is why medicine has always had a bit of a problem with things like bias. But we are now dramatically scaling that issue up. And I think without an ethical framework for really dealing with, okay, what does this mean for public and population health rather than individual's health? I think we're missing a trick.

[00:21:14] Very interesting. When you were talking about the ethical principles, I think the intention of developers always comes from the visionary idea of how technology is going to cause good. And for example, scribes are a good example of a technology that seemed that for the first time in the digital transformation of healthcare,

[00:21:41] there's something that's making doctors happier. Even if research shows that they don't save that much time that we might think. So that's good. Doctors are happy. There's also several other side effects that we now see. Care is getting more expensive, either because sometimes scribes pick up more things that doctors would write in if they would do the notes themselves.

[00:22:08] So scribes are capable of billing more. So healthcare is getting more expensive. And then the second problem is that this whole idea that doctors actually check what the scribe summarized to a degree can be missed. There's doctors that will not double check if scribes got everything right.

[00:22:35] A patient just recently published how he went to the doctors because of asthma. And now suddenly, out of nowhere in his diagnosis, he has asthma and a legal intervention evolving injury by tear gas. And that's such a specific thing. And especially in the US context, it can potentially be criminalizing as if he's a rioter or something.

[00:23:02] And now he's in this whole labyrinth of trying to figure out how can he get that redacted from his medical record. The point being that with AI scribes, the dark scenario is that we are creating inaccurate EHRs on steroids. So I don't know how much of that are you observing or researching.

[00:23:29] I think that's a really big problem. And that comes back to what I said right at the start in terms of I think we're overestimating the capabilities of a lot of these technologies and underestimating the harms. I think really often the way that AI scribes are currently being perceived or pushed to healthcare providers as if they're just really good transcription services.

[00:23:55] And it's a no, we've had transcription for years. This isn't a transcription service. It's an inference service. Those things are making quote unquote decisions with regards to what information is recorded, what information is given weight, what certain pieces of information are considered higher level or lower level. Don't forget, we don't necessarily just act in healthcare on free text. We act on the codified data of that, the sort of structured coded data.

[00:24:25] So that's when things are turned into a SNOMED code, for example, or an ICD-10 code for that particular diagnosis. There are loads of ways in which that can be gamified. And you don't know, you know, we have no idea what the incentives of these companies are who are building the scribes. I'm sure that none of them are being Machiavellian.

[00:24:47] But it's extremely easy to decide, oh, we should definitely err on the side of caution or we should always code this particular disease as this particular thing. Whereas others might actually code it in a different way for a wide variety of reasons. One of those being exactly as you've just described, maybe if this was coded in this way, it's going to cause patient harm because it's going to expose them to some sort of risk of criminal intervention or something like that.

[00:25:16] The other area that we've seen this happening loads, especially in the US, is with regards to abortion and access to abortion. But so, yes, there's an enormous problem. Those things are inaccurate. They are capable of being manipulative. They do probably over-record.

[00:25:33] And then the other thing that can happen that there's a few sort of cases, especially anecdotal cases now coming through, is if, for example, the next logical step of the conversation you've had with the clinician is that a specific test was ordered. Right? Now, the scribe might decide that, therefore, that test has just happened.

[00:25:58] It's going to record that the test was ordered because that was the logical conclusion of the consultation. If that then never does get ordered because the scribe said it had already happened, what happens to that patient? They're just stuck in an endless loop of never that test never being ordered. There are all these consequences that I just think have been really poorly thought through because none of this stuff was trialed.

[00:26:25] It was just, okay, here's the scribes. Let's go. Yeah, absolutely. My mind immediately also went to medical fraud, which I think is most often discussed in the context of just, I don't know, insurances denying claims.

[00:26:48] And I've never researched yet the underlying details about how medical fraud actually happens. But that would basically, you could be accused of committing fraud because something would be charged for and never executed. So definitely a huge issue.

[00:27:07] Another challenge that I am thinking about when it comes to the data issue is this whole discussion about using a paid version or an unpaid version. So when patients use open, so not health dedicated chatbots like Claude or Chagibuty or Gemini or something else,

[00:27:34] one of the recommendations that we are talking about, for example, in the series is that you would potentially, if you can, use a paid version. So you have some level of privacy. And so you have the access to the latest models, which are just massively better in the outputs that they produce. But what I'm thinking is there's people who can afford that and there's people who can't afford that.

[00:27:56] So if people who can't afford that are using the free versions and basically the models are trained on the data from the people that use the free versions, like how does that impact, like how does that skew the overall data and the outputs that the AI is giving if it's trained on a very specific data set? So I don't know. What do you think? So I think there are two things there.

[00:28:22] I think, first of all, it's a really good example of why I don't think AI is the solution to the access problem in the way that it is sometimes is framed as being. I think AI can partially help alleviate access issues in some ways that we could talk about later, but it's not the be all and end all. And he used to talk about it in the context of it being the people who know how to work the system will get access to a human.

[00:28:50] And then the people who don't will get access to an algorithm. I think now what we will probably see is even greater bifurcation there in that there will be people who get access to humans. There will be the people who can afford to pay to have access to the highest performing model. And then there will be the people who are left to only rely on the free versions in the worst performing models or indeed left completely abandoned. So I think there's that element.

[00:29:16] And I think that's something that's really worth us thinking through. Then in terms of what you're really describing, this is a couple of different things. There's the data set that these things are trained on. Yes. If they are being deployed at the point of deployment, they've in theory all been trained on the same thing. They've been trained on, most of them have been trained on the internet.

[00:29:41] What the different models are doing is they have different levels of parameters in them and different levels of specificity. And typically different models underneath the hood of how they quote unquote reason. But the thing that you're pointing out is reinforcement learning. So this is this idea that you tweak them over time based on your interactions with the algorithm, based on the patient's or the individual's interaction.

[00:30:03] I think that's very interesting that you could potentially have this sort of split with some people who are only ever accessing Haiku, which is Claude's lowest version of their model, versus others that are only ever interacting with, you know, sort of Opus 4.7 or whatever it currently is. I think, yes, we could see that.

[00:30:25] I think the whole generative AI issue in general is we've all talked about rubbish in, rubbish out for years, right? Everybody's talked about that for us for so many times. Really, that is where we see, I think, in the generative AI space, that problem happening on steroids, because you are going to get them starting to be trained on their own outputs. And that, I think, is a real issue.

[00:30:51] I want to go back a little bit to the whole discussion around increasing or decreasing access. With AI use, there's two things. I think that if you ask patients, would they prefer to speak to AI or a doctor? In the majority of the cases, I think the answer would be the doctor.

[00:31:14] But at least that's also my experience is that patients don't use AI because they wouldn't want to speak to doctors. They use AI because they can't reach them, because you're either between visits, because the waiting list is 18 months long. So that's like the only thing there's left.

[00:31:32] And there is many examples of when patients did use AI, also then verified the outputs with clinicians and saw that AI was right. So given that the whole idea of this series is to bring out some advice on how to use AI and what not to do, what would your advice be? We're being very specific about guardrails you can use.

[00:32:01] And so let's just start with what's your advice and we'll go from there. So my advice is if you want to use them, absolutely use them. If you want to use things like Clawed or Gemini or TrackGPT to talk through things, if you want to use them in the way of I have this list of symptoms, or can you give me some questions that I should ask once I get to the doctors, or what does this mean?

[00:32:27] I think that's all really reasonable and then to a large extent, not any different from what we've had for years with people talking to Dr. Google. Dr. Google has been a thing for a long time. I think where there is a guardrail that people can maybe live in their minds is where the risk amplifies with the use of generative AI, which you didn't have with just using a search engine,

[00:32:57] is they are incredibly effective at saying something in a really persuasive way. And that you've sort of got to keep your wits about you. And so the best way of describing it really is that do not take anything that they say at face value. And that is where we have this problem with access, is because the ability to do that really depends on people's digital health literacy

[00:33:23] and their confidence in questioning and tweaking the algorithm. I'll give you an example. I told you already, I really hurt my neck, mostly from excessively typing, I think is the conclusion of what has happened to my neck. Your conclusion or AI conclusion? So we think the doctor plus the AI plus me, we all seem to think that Jess, you should probably at some point in the day step away from a keyboard.

[00:33:51] Otherwise you will continue to damage your neck. But the doctors here prescribed me some specific types of drugs to try and deal with the pain that I was in from hurting my neck. And I asked Claude, I use Claude all the time, I was like, Claude, these particular drugs make sense. And at first Claude was saying, yes, this is absolutely the main line of reasoning, completely is the right type of drug for the doctor to have prescribed.

[00:34:18] And then because I am me, and also because I have high level of literacy, and also I'm just a massive sceptic anyway, full stop, I went to look up what is the NICE guideline, so the UK's sort of official guideline for treating the particular injury that I have. And it does not involve these types of drugs. And then I went and I looked up the evidence base for these types of drugs for my condition, and there is no evidence base. So I then went back to Claude, and I said,

[00:34:47] hey, you've been convincing me that this was a really good thing of my doctor to prescribe, but actually I've just gone and looked up the evidence base, and it's really weak, and I'm not sure it actually makes sense for me to have been prescribed this drug. And it immediately comes back to me and says, you're right, that is what the evidence base is, shows me all of the links for where that evidence exists, and said I should not have been so convincing. Now, I am able to do that because I understand how well I understand the model,

[00:35:16] and I also have enough understanding of health in order to question. Where I get worried is when people start over-relying on these things, forget that they're stochastic, so forget that they'll give you different answers in different scenarios, forget that they're trained to be sycophantics, which they are trained to cater to you in a way that a doctor is not, and forget that just because it says things that are overconfident doesn't necessarily mean that it's accurate. And so if you're going to use them, by all means do.

[00:35:47] But please keep your sort of questioning mind on, and the best way I can describe it, the biggest tip I can give you is just behave like you're a toddler, who can only say why. Every single time this comes to you, why? You give me this recommendation, why? And just never forget the why, and that will help some things.

[00:36:12] I guess secondary to that is try to use them in a way that aids your own thinking, rather than it becomes your thinking. So to be, like, as I said, to be clear, I'm a really big fan. I think these things can make an enormous amount of difference in certain circumstances, but the issue becomes over-reliance with no questioning of the reliability of the thing that it's coming to.

[00:36:41] Which is, by the way, very hard. Like, that's at least my experience, because I do try to question, but it's like, for example, if I just name some of the tips that we've gathered so far, is don't ask simple questions, because you're going to get a simple answer. Never trust it. Then, yeah, just explain stuff in detail. There's a master class that also recommends the TILIC formula,

[00:37:10] define the time, the location, your context, and the change. How, for example, if you're talking about pain, how pain changed all the time. So do a lot of that. But I did that recently also to try to figure out some of the pain. I even, yeah, I added, like, a description. I said, can you give me 20 more questions so you can make a better assessment? And it gave me an idea of what could be the problem.

[00:37:36] And then when I added the MRI scan in it, it said, this changes my previous answer's significance. So I was like, I don't know. I don't know. It's, personally, I've got mixed bags of experience. Yeah, exactly.

[00:37:55] I have really low opinion of their ability to do diagnostics for this exact reason. I do think that they can help you if you are sufficiently well-versed in making your own mind up. Tell me what I should ask my clinician or can you prompt me to think of certain things that might be going wrong or something like that.

[00:38:25] Beyond that, I'm not convinced they're that helpful. I also think that they can really amplify in ways that we haven't really thought through. So I guess a secondary tip, which will make sense in a second, is know when you want to walk away. So we forget that they are trained and they are designed to keep you engaging. They aren't going to tell you to go away.

[00:38:53] In the same way that social media is designed to keep you on the platform, that's what they want. They want you to ask on the platform. So we just did this idea a couple of weeks ago. I was involved in a workshop with clinicians and we all had different of the different models open and we were running the same clinical scenario. And I said to all of the clinicians, I said, please be really conscious of how many times it ends by asking you a question. Or by saying, would you like me to do X?

[00:39:24] Almost without fail. That's how I was asking. So they are designed to keep you involved. Now, if you are somebody who has health anxiety, OCD, you are worried in general or actually one of the things that's a problem is you are spending too much time on these things. And they're amplifying that because the way in which they want to keep you engaged is a real thing.

[00:39:47] Now, this is something, again, with the mental health space in particular, is that we know there's dozens now of studies and also data that shows that one of the main use cases people or patients in particular are using these models for is for some form of therapeutic process. Because they want access to mental health care. Now, there are different grades of what that might look like.

[00:40:16] I think if you are somebody and you are just trying to talk to it like a friend, that's one thing. If you are somebody who has a diagnosed condition and you tell it that you have the diagnosed condition, it should, in theory, interact with you differently. Because it should have a degree of parameters inside of understanding that. But that will also degrade over time.

[00:40:39] And it doesn't know how to match your presentation with the particular modality that might be required. So I guess that comes back to your sort of other person's tip with regards to specificity. Specificity definitely does help. So one thing you can do that I think people often, it just doesn't occur to people, is get it to write the prompt for its own self.

[00:41:06] So if you want to say, I want to know about X, Y, Z, can you write me a prompt that will help that happen? And then go into a chat, a second chat, and run the prompt it's given you, you will get a better. And that's one way of overcoming the literacy issue. So that's tip number three. And I give you a fourth and final one, which is remember the context window shrinks.

[00:41:31] So the longer you stay in one chat, the performance degrades. And so you have that sort of trade-off where it's designed to keep you involved, but its own performance is degrading over time. So if you know when to walk away from it completely and also know when you need to change context window, that will. Awesome. Yeah, actually, the last two were mentioned, not exactly in that context.

[00:41:57] It was more about one patient said that basically because at a certain point the history disappears, if you want to have the full discussion, you need to copy paste it into a different document. And another speaker also mentioned that some patients use one AI model to create the prompt and then use that prompt on a completely different model. So not just a different chat, but actually a completely different model.

[00:42:20] The one that I like to help with all the challenges that we were discussing around not being sure if it's critical enough, if it's skeptical enough, if it's taking you to the direction you don't even yourself know you want to go to, is the prompt perform the red team analysis, which means that it goes into the completely adversarial thing, which I thought was awesome. But again, I would say that if you are prone to medical anxiety, don't go there.

[00:42:50] Yeah, we literally have a paper on that called red teaming the text. So we basically say one of the ways in which you can use them is to be adversarial. And you can do that. This is my assumption. Please red team it. You can also do that with each other, between the models, by the way, and they definitely like to compete. So if they know that you are taking, if Claude knows, for example, that you are taking its outputs and giving it to ChatGPT to fact check it,

[00:43:18] it will respond to that, because these are things that are hard coded in. So you can also play them off each other, which can be fun. The other tip I would give you in that regard, which is something that I use a lot myself and works very well, is tell it that you want it to come up with success criteria and only give you the output once it thinks that its answer meets all those success criteria.

[00:43:45] That also works quite well, because it means that it will just keep iterating over and over until it's, oh, I now hit. Yeah, awesome. Awesome. A similar one was one patient said that he asks AI, give me a confidence score on this answer. So tell me if on a scale of one to 10, how confident are you in what you just said? So that's similar. Okay. I'm definitely going to write that down. One thing that we didn't mention earlier was, so we talked about harm.

[00:44:15] We talked about the issue of amplifying medical anxiety, but another kind of category that I put patients in is also when you're a chronic patient, your benchmark of what's normal kind of changes. So an abnormal result in one person will be a normal result in another.

[00:44:36] So I think you can also very quickly try to minimize the problems that you have, which also, there's also research about this, that basically patients delayed care, which results in complications, more expensive care, etc. So, yeah, on a broader level on what this whole thing is going to have on healthcare systems as such,

[00:45:01] the costs, the utility, the utilization, that's something that I'm very curious about. Absolutely. Exactly. And I have, like I said, real skepticism that it's going to do anything other than amplify people's desire to have access to care rather than diminish it. I think that point around baselines is really interesting.

[00:45:26] And especially, I think we want to talk about tips and tricks in, and this really is when you are requiring people to have access to the paid model, but we'll just assume that some people do have access to paid models for the time being, is there's this concept called harnessing when you are working with an LLM or a generative bot. And the harness is essentially like your tortoise's house. So you are the tortoise and the house is your shell. The harness is the LLM's shell.

[00:45:54] And you can train that harness, you can specify it to be much more similar to you by using things like skills and apps. That mean that it gives you, any of these tools give you a much more bespoke answer to you rather than something that is more generic. So with writing, I use these tools to help me to write better, but I don't want them to make me sound like a bot. I still want to sound like myself when I write.

[00:46:21] So I have a skill, it's called the Jessica Morley voice skill that sits inside my LLM harness, so that when I say edit my text, for example, it will load my voice skill and it will edit according to my voice, not according to its own sort of generic preferences. So there are ways in which you can also do that with patient care. You can look at how you do skills and you can build apps, etc.

[00:46:47] So that any interactions that you're having with the LLM are based on your own preferences, understanding your history, your own personal requirements, rather than it being something that's just a generic average from the internet. These things are not difficult to do, by the way, and they will also tell you how to do it if you ask. So what would be the prompt? Tell me how do I create a harness for my patient profile?

[00:47:16] Yeah, or how do I create a skill or how, but can you tell me how I harness you to work better for me? Is this sort of one of them. But in general, harness the LLM to work for you will give you much better results. Okay, awesome. I know what I'm going to be doing during the weekend. It's another thing that I find very interesting in this whole AI debate is that we've got these large language models that are available to everyone.

[00:47:44] And everybody's using them for healthcare and medicine, even though that's not their intention. So they're not configured specifically for that field. And if you actually wanted to create a configured thing for the field, it's so difficult to create a scalable solution. And this is also what you talk about in your series, the problem with the last mile issue. So can you talk a little bit about that?

[00:48:14] How do you see that is going to be addressed? Are we just going to use these large language models? Because it will not make sense to create specific medically focused solutions, which will never be reliable enough to cover all edge cases.

[00:48:41] So there are a couple of different things in there. So first of all, the issue is that they're not, because they are not large language models, the generic ones that are just off the shelf, your GPTs, whatever. Not regulated as medical devices.

[00:49:02] Whereas something that gives you medical advice, which they categorically do, even though they are not really designed to do that, should 100% be classified as a medical device. And then should be your software as a medical device. And then should be required to meet the sort of minimum standards that we would expect of a medical device. So there are two angles to this.

[00:49:31] The first one is that I really think we should be pushing for these things to be regulated as a medical device. But companies are clearly very comfortable with the fact that people are using these tools for medical purposes because it's driving up their uses. So of course they're going to be happy. I'm like, OK, if you're happy with that, then you need to take on the responsibility of making sure that it works in an appropriate way. And there are some things to that.

[00:49:55] The second half, the half that comes from my more pragmatic side and assuming that won't happen, is a bit like we'll be going back to this sort of ticks and trips or tips and tricks. That it's a bio beware market. So when you're interacting with them and they're giving you medical knowledge, know that you're in an entirely unregulated space. There are consequences to that.

[00:50:22] Now, in terms of the practicalities of deployment, there are different sort of scenarios. So you can build on top of the models and make them more bespoke to specific institutions. And that, I think, is a use case that could be very good for medical practice in hospitals. Hospitals have endless reams and reams of documentation and policies. And this is the protocol for doing this.

[00:50:52] And this is the guideline for doing this. It's very possible to take a baseline model, then refine it only on that hospital's records and protocols and care guidelines. And then tell the model in its sort of temperature settings, you are not allowed to go outside of these parameters. You only use this.

[00:51:10] The challenge that comes there, and this is when we start talking about scalability, is that requires there to be people in-house that have the skills and the capability to do that. And to have the knowledge in the first place, to understand that's something that's even possible. And that is missing.

[00:51:29] And then when it comes to the medical side of stuff, the real clinical side of things, I think people, we need to be having a more practical and realistic conversation about what they are and are not doing. There's a little bit, I think, of an overblown tendency to think that what they're doing is reasoning in the same way that a clinician will reason.

[00:51:56] And from that, there is this extrapolated belief that as long as I've given them access to all of the medical knowledge in a textbook in the world, then they're going to be able to correctly infer the right explanation for this particular patient's symptoms based on this scenario. And so really, all we have to do is make sure that we make a medical model that's trained on all medical textbooks ever.

[00:52:25] The reality is, that's not how most medicine is actually practiced. And very often, what you're doing is reasoning to the best explanation based on experience and your own ability to infer. That an LLM is never going to do because that's not how it's designed and that's not how it's built. It's only going to predict what's the most likely next answer. They can do that more sophisticated way than they could six months ago, but that's still the reality.

[00:52:54] And I think until we understand that and have honest conversations about the limitations, we're probably not going to solve that last mile problem because we're always going to be focused on a flawed assumption that the answer is give it access to all of the medical knowledge that ever existed. And that forgetting that a lot of medical knowledge and experience and practice is not codified. It's not written down. It's not in data anywhere.

[00:53:23] And it doesn't suit the way that these models are designed to behave. Very interesting. It reminds me or takes me back to the discussion we had earlier when you mentioned that a lot of people use LLMs for mental health or for therapy because they don't have access to it. And I recently asked a therapist, what's your experience with AI?

[00:53:50] Are people coming in with ideas on what their problem is, etc.? And she said a very interesting thing, and that is that her current impression is that AI is narrowing our viewpoint exactly because it tries to agree with you. But that's exactly the whole point of human interactions is that there's a certain level on unpredictability that you do not know how a person will react.

[00:54:17] So basically now she feels like an intermediary between patients because AI tell her what they told, patients tell her what they told AI. And now she needs to decipher based on the problems or based on the discussion that they had with AI, what is the actual underlying need that they're trying to address and the problem that they're trying to solve. So it's very interesting.

[00:55:10] Thank you. Yeah, exactly. But that's exactly the point in terms of this comes back to what do you and do you not use it for? I think it's probably entirely reasonable for a clinician to say, I have a patient that's doing this and this presenting with these symptoms. I think it's X. Is there anything I'm missing? And for it to come back and do that sort of adversarial thing that we were talking about before or what in medical reasoning would be differential diagnosis. Have you considered this? Have you asked for this?

[00:55:39] Maybe you should order this test, blah, blah, blah. I think that's probably quite a healthy thing for people to have access to if they're unsure. But what I definitely wouldn't want to happen is when you, as a consequence of that, you turn off the brain and think, oh, I'm just going to ask this. I'm just going to basically use it like it's a searchable textbook.

[00:56:03] Because, first of all, it's inaccurate and secondly, it really does a disservice to what medicine and the practice of medicine is and undermines what clinicians are designed and trained to do. Jessica, thank you so much for joining me again.

[00:56:20] In the summary, I will focus on the top three, which were creating a skills or a harness, making sure that you tell AI to only get back to you with an answer when it's convinced it's a good one. And what else did we have? I don't know. We're going to look. We had those two ahead, always know when you want to walk away.

[00:56:48] And the other one was the point around don't just basically don't trust it. Trust but verify. Make it, ask it why. So between those two, you pick which was the top one. Yeah. Awesome. Awesome. Thank you so much. And good luck with your further research. I will definitely be further following you. No, thank you. And I'm sure we'll speak soon. You've been listening to Faces of Digital Health, a proud member of the Health Podcast Network.

[00:57:18] If you enjoyed the show, do leave a rating or a review wherever you get your podcast, subscribe to the show or follow us on LinkedIn. Additionally, check out our newsletter. You can find it at fodh.substack.com. That's fodh.substack.com. Stay tuned.