February 2026

Over the past few months we’ve received hundreds of individual pieces of feedback on AskTrip answers. Around 15% were low ratings. That might sound worrying, but I actually find the low scores the most valuable.

Why? Because they’re actionable.

People who are dissatisfied are far more likely to tell you about it, so the 15% is likely to be an overestimate of overall dissatisfaction. But each low score comes with something far more useful than a number: a clue about where the product isn’t meeting expectations. And when you look across hundreds of these, clear patterns start to emerge.

Here are the main things we learned.

1. Clinicians want answers that stay tightly focused on their question

One of the most common frustrations wasn’t that the information was wrong – it was that it drifted.

A clinician might ask a very specific question (a particular population, drug comparison, route, or clinical dilemma), but the answer sometimes broadened into a more general discussion of the topic.

Interesting? Yes.
Helpful for a decision? Not always.

The lesson for us is simple: relevance beats comprehensiveness. Staying locked onto the exact clinical question matters more than covering the wider subject area.

2. Confidence must match the strength of the evidence

Another pattern was what I think of as “EBM wallpaper” – answers that looked polished and evidence-based but were built on thin or indirect evidence.

Users don’t just want citations. They want honest calibration:

Strong evidence → clear conclusions
Limited evidence → say so early and plainly
No evidence → don’t dress it up

In other words, clinicians value honest uncertainty more than polished narrative.

3. When the evidence isn’t there, don’t guess

Sometimes there is no directly relevant research – or the question uses a term that isn’t recognised in the evidence.

In these situations, the risk for AI is to be “helpful” by filling the gap with general advice, assumptions, or plausible definitions. That can create confident answers that aren’t actually evidence-based.

Our approach will be different. When evidence is missing or uncertain, AskTrip will:

Say this clearly and early
Avoid speculation or invented interpretations
Suggest related questions that are more likely to return useful evidence

Sometimes the most helpful response isn’t a longer answer — it’s helping you ask the next, better question.

4. And finally… some people want more detail

Interestingly, the feedback wasn’t all about making answers shorter or tighter.

Around one third of users told us the opposite – they’d like longer, more detailed answers.

This highlights something important: clinicians use AskTrip in different ways. Some want a quick, decision-focused summary. Others want to explore the underlying evidence in depth.

So the challenge isn’t simply length – it’s flexibility.

What we’re changing next

This feedback isn’t just interesting – it’s directly shaping the next phase of AskTrip.

We’re actively working on two key improvements.

1. Better-calibrated answers
We’re refining how answers are generated so that they:

Stay tightly focused on the exact clinical question
Match confidence to the strength of the evidence
Say clearly when evidence is limited or absent
Avoid speculation or unnecessary narrative

2. A redesigned answer format
We’re moving toward a structure that supports different user needs:

A concise clinical summary by default – clear, decision-focused, and quick to read
Expandable detail – allowing users to explore the full evidence, studies, and context when they want more depth

In short:
Short by default. Deep on demand.

Why low scores are valuable

It’s easy to focus on average ratings or overall satisfaction. But the most useful feedback often comes from the edges, the cases where we didn’t meet expectations.

Those low scores aren’t failures. They’re signals.

And if we listen carefully, they help us do what AskTrip is designed to do in the first place:

Turn evidence into answers that clinicians can actually use – clearly, honestly, and at the level of detail they need.

I introduced the idea of chunking in the post HTML Scissors towards the end of last year. Since then we’ve been working on delivering on the promise and things are starting to come online. Before expanding on that, I’ll restate the problem…

A significant element of how we order Trip search results is how relevant the search terms are to the documents in our index – and this is strongly influenced by term density: the more a document is focused on the topic, the higher it is likely to rank.

However, this creates an important problem.

Take a clinical guideline on asthma. It might be 10,000 words long, with a 1,000-word section devoted to diagnosis. That section is highly relevant to a search for asthma diagnosis. But across the document as a whole, only 10% of the content relates to diagnosis. From a search engine’s perspective, the topic is relatively diluted; so the guideline may be judged less relevant and appear lower in the results than shorter documents that focus entirely on diagnosis.

In other words, long, high-quality documents can be penalised simply because their relevant content is spread thinly.

So, we’re starting to work with chunking – cutting long documents into smaller, coherent elements. These chunks are appearing live in the Trip results and we’re getting quite excited! We haven’t ironed out all the issues yet, but using the technology live is the only way we’ll refine and improve it.

An example search that highlights chunking

A search for Meningococcal Chemoprophylaxis reveals the following top result:

A few things to point out:

The document title is Guidance for public health management of meningococcal disease in the UK and we have added Chemoprophylaxis in Healthcare Settings (Detailed) ‒ Chemoprophylaxis Recommendations in Healthcare Settings. As we chunk we assign a chunk title to sit alongside the actual title. Whether this continues to be displayed is an ongoing debate.

If you look at the the documents index:

You will see that only 6 pages (pages 24–30) are about chemoprophylaxis — less than 10% of the 63-page document. As a result, the document as a whole would score relatively low for this topic and would be unlikely to appear near the top of the results, even though those six pages are highly relevant.

By treating those pages as a separate unit, the content becomes highly concentrated on chemoprophylaxis — increasing its term density and allowing it to rank much more appropriately for the search.

In short, chunking helps Trip find the relevant part, not just the relevant document.

That means long, authoritative sources are no longer penalised for covering multiple topics – and clinicians are more likely to see the evidence they need, faster.

We’re just getting started, and your searches will help us make it better.

Quiet changes like this don’t always get noticed – but they make a real difference to turning research into practice.

	When good evidence g… on HTML Scissors
	A Research Agenda Bu… on Turning Research Into Practice…
	Turning Research Int… on What 10,000 Clinical Questions…
	Bookmarks and AskTri… on A fresh new look for Bookmarks…
	Trip in 2025: Quiet… on A great example of the power o…

	When good evidence g… on HTML Scissors
	A Research Agenda Bu… on Turning Research Into Practice…
	Turning Research Int… on What 10,000 Clinical Questions…
	Bookmarks and AskTri… on A fresh new look for Bookmarks…
	Trip in 2025: Quiet… on A great example of the power o…

Trip Database Blog

Liberating the literature

Month

Learning from user feedback: how we’re improving AskTrip answers

1. Clinicians want answers that stay tightly focused on their question

2. Confidence must match the strength of the evidence

3. When the evidence isn’t there, don’t guess

4. And finally… some people want more detail

What we’re changing next

Why low scores are valuable

When good evidence gets buried – and how Trip is fixing it

Recent Posts

Recent Comments

Archives

Categories

Recent Posts

Recent Comments

Archives

Categories