How to Evaluate Veterinary AI: My Practical Checklist for Safety, Ethics, and Trust
- Dr. Karen Bolten
- Jun 25
- 8 min read
Updated: 2 days ago

Artificial intelligence is rapidly entering veterinary medicine whether we’re ready or not. Some tools are excellent. Some are deeply flawed. Most lie somewhere in between.
That doesn’t mean AI is bad. But it does mean you need to know what you’re using – just like any other medical tool.
AI tools can influence clinical decisions, store sensitive client data, and impact patient outcomes; however, most of them are unregulated. That puts the responsibility on the user to ask important questions and evaluate what’s safe, ethical, and actually useful in practice.
This post walks you through the exact framework I am currently using when evaluating AI tools for safety, ethics, and trust. It’s not meant to scare you away from AI - it’s meant to empower you to use it wisely.
Are there peer-reviewed articles?
This is always my first stop. If a company is making claims about diagnostic accuracy or clinical utility, I want to see those claims tested in published, peer-reviewed studies - the kind that go through independent expert review before publication.

If you haven’t read a lot of studies, be careful: not all scientific-looking papers are peer-reviewed. Here’s a quick breakdown:
Peer-reviewed journal articles are reviewed by experts before publication and are generally the gold standard.
Conference proceedings are papers accepted for presentation at academic conferences. Some are lightly peer-reviewed, while others are not reviewed at all, so quality can vary significantly.
Preprints (like those on arXiv) are early drafts shared before peer review. They can be promising, but they haven’t been formally validated.
White papers are often published by companies themselves, which useful for context but are also not peer reviewed.
👉 You can explore my growing AI publications database here (which is also connected to the products).
Initially, I only included peer-review studies in this database, but I do think the other studies warrant inclusion. However, like all publications, you should read them and make your own conclusions – even more so since they’ve not undergone more thorough critique.
Which brings me to…
What do those articles actually say?
Once I find relevant studies, I read them critically. Not all studies are created equal, and exciting stats alone don’t mean the model is trustworthy.
And please don't fall for clickbait. Read the articles.
Here are the key questions I ask:
What stage is the model in?
Is it still in training?
Has it been validated on a separate dataset?
Is it already in deployment, being used on real patients?
Understanding this tells you how mature and reliable the model might be in real-world use.
What kind of data was used?
Where did the training data come from?
How large and varied was the dataset?
Was the model tested on real-world patients or just manufactured or retrospective data?
Which patients were used? How diverse were they?
How many locations, hospitals, or practitioners contributed data or labels?
What type of practices did the data draw from? From what socioeconomic locations?

Be cautious: Bigger and more varied isn’t always better. If a model is built for a very specific use case (for example, detecting one condition in one species), overly broad data may actually reduce performance. But if the model is intended for broad clinical use, it must be trained on diverse, representative populations. Otherwise, bias and blind spots are guaranteed.
Also, keep in mind that “statistical significance” in AI research doesn’t always require the same sample sizes as traditional clinical trials. But that doesn’t mean you should necessarily accept models without significant data inputs. It depends on the scenario.
How well does the model actually perform?
What are the training and validation accuracies?
Do the authors report sensitivity, specificity, precision, recall, or F1 score (not just accuracy)?
Are failure scenarios discussed? (They should be.)
How did the authors address bias?
This is one of the most important - and most overlooked - parts of AI in medicine.
Did the authors describe how they intentionally identified and mitigated bias?
What types of bias were considered (demographic, geographic, species-specific, etc.)?
If they don’t mention bias at all, that’s a major red flag - especially for models used in life-or-death decisions.
A model that ignores bias is not just flawed - it can be dangerous.
This is one of the most important - and most overlooked - parts of AI in medicine.
👉 If you're not sure how to critically evaluate an AI study, I wrote a practical walk-through.
How do I find ethically minded vet AI tools?
The most ethically sound tools usually leave a trail:
Peer-reviewed research
Websites that address data and AI transparency
Clearly written privacy policies that favor users' preferences in data management
Thoughtful safety disclaimers that help users understand what happens with their data
👉 You can find products that meet these higher standards in my searchable AI database.
Note: The current version includes filters like GMLP, GDPR, HIPAA, and FDA status. Behind the scenes, I’ve collected much deeper transparency data on each product - especially around privacy policies, AI transparency, and how responsibly data is handled.
I’ll be reformatting and expanding the database soon to surface this data, so you’ll be able to directly compare how each tool addresses AI and data transparency. Some companies are truly setting a higher bar in this area, and I want to make it easier for you to find them.
Which brings us to…
What does the privacy policy say?
As I’ve begun evaluating AI scribes especially, I’ve found this has become a tell for the company’s AI and data ethics and attention to detail.
There’s enormous variation in how much data AI tools collect, whether they use it for training, and how they protect user or client privacy. Some tools are quite vague. Others are extremely transparent in ways that raise red flags (but are buried in extremely lengthy policies that most users are not going to read).

To tackle this, I built a custom GPT model that evaluated the privacy policies of all the veterinary AI scribes in my database using international ethics guidelines as the benchmark.
The results were eye-opening. Some tools nailed it. Some… not so much. (Blog coming soon.)
Moral of the story: READ BEFORE YOU SIGN. You do not have to permit a company to use your data for training, and you should search for companies that lean towards “opt-in” policies versus “opt-out” – or at least make it very easy for you to opt-out. Some companies don’t even allow you to opt-out at all.
You do not have to permit a company to use your data for training, and you should search for companies that lean towards “opt-in” policies versus “opt-out” – or at least make it very easy for you to opt-out.
I want to help companies that help me, but at the same time, I don’t want my awkward client conversation about Fluffy’s anal glands existing in perpetuity in someone else’s possession.
Am I overly paranoid?
AI is one of the most misunderstood - and underestimated - tools in the veterinary space. Many people are afraid of it, and they should be… but often for the wrong reasons.
The real danger isn’t “robots taking our jobs.” It’s:
Hidden training data
Undisclosed limitations
Models that influence medical decisions without meaningful oversight
Unlike traditional diagnostics, AI tools are often shrouded in secrecy. Proprietary algorithms can’t be fully peer-reviewed. Training datasets are rarely disclosed. Even basic safety testing might not exist unless it’s required by regulation (which, currently, it usually isn’t).
That’s why transparency matters. It doesn’t mean a company has to give away its secret sauce - but it should give you, the end-user, confidence that the product has been built and tested responsibly.
“More Transparent” does not mean “more Ethical”
Important note: just because a tool is “transparent” doesn’t mean it’s ethical.
I've read some very long, seemingly transparent privacy policies that, when you dig in, essentially give the company license to collect, use, or even sell your data in ways that don’t serve you or your clients. The fine print matters - and most people never read it.
But I still appreciate the companies that DO go above and beyond with transparency, as this is currently the exception to the rule.
Global guidelines exist, but they’re a mess

I keep returning to the idea that AI regulations and guidelines are in the "Wild West phase" right now. It’s a bit of a free-for-all. AI ethics and regulations vary dramatically by country. Some places (like the EU, Canada, China, and South Korea) are ahead of the curve. Meanwhile, with the exception of registered medical devices, my home country, the US, is in more of a “ehh…just do whatever” phase.
While I’d like to say that was just sarcasm default, it’s really not too far from the truth, and it has huge implications on the tools you are using in practice. To be clear, most of the AI that we are currently using in practice in the US is not falling under FDA medical device approval. In my database, there is currently only one tool currently that is FDA-approved (ScopioVet Digital Cytology).
This lack of government regulation can affect whether a product was tested appropriately and what protections it offers users. It can mean that you may be using a tool that doesn’t work properly and is going to advise you poorly on important decisions for your patients.
To help others navigate this, I’ve compiled a list of international guidelines, standards, and certifications. There’s too much to cover in this post, so I’ve turned that into separate resources, linked at the end.
So... How Do You Actually Start?
If all of this feels like a lot, well, that's because it is (*you're welcome*). But so is medicine, and you didn’t master that overnight either. You've just got to start somewhere.
AI isn’t something to blindly adopt or to avoid out of fear. It’s a tool, and actually a pretty amazing one; but like any tool, it can be well-made, misused, or outright dangerous. The key is knowing how to evaluate it before you let it into your practice.
Here’s my recommendation if you’re feeling overwhelmed:
Pick your single biggest pain point: maybe it’s documentation, maybe it’s missed diagnoses, maybe it’s staff burnout.
Head over to my Veterinary AI Search Engine.
Apply the framework above:
Look for published studies.
Check the privacy policy.
Prioritize for transparency and ethics.
(And then ask for a trial!)
AI isn’t something to blindly adopt or to avoid out of fear. It’s a tool, and actually a pretty amazing one; but like any tool, it can be well-made, misused, or outright dangerous.
Some tools will pass with flying colors. Others may surprise you. But you’ll be making decisions based on real data - not hype, marketing, or fear.
Keep pushing for a future where AI actually helps you practice better and easier - and where you stay in control of the tools, not the other way around. And help support companies that support their own users with ethical choices. If we prioritize these companies, our whole field will win.
If you want to dive deeper into the certifications and international guidelines mentioned above, check them out here:
Comments