One of the most common criticisms of AI in clinical search is the black box problem.

You ask a question, you get an answer, and it is not always clear how the system got there.

For clinical evidence, that matters.

AskTrip takes a different approach. Alongside each answer, users can now view a transparency page, a step-by-step account of what happened between the original question and the response they received.

It is designed to show the working behind an AskTrip answer, not just the final response.

Why this matters

AskTrip is not a general chatbot. It is an evidence-based clinical Q&A system built around Trip Database.

Its job is not simply to produce fluent medical text. Its job is to help users move from a clinical question to relevant, source-linked evidence – while preserving the intent of the question and being clear about the strength and limits of the evidence found.

That means the route to the answer matters.

A confident answer can still be weak if the question was misunderstood. A citation-rich answer can still be misleading if the cited sources do not directly support the conclusion. And an answer can sound clinically useful while quietly drifting away from what the user actually asked.

The transparency page is our attempt to make those risks visible.

Nine steps, fully visible

Take a simple query:

What is the evidence for transurethral water-jet ablation (Aquablation) in benign prostatic hyperplasia?

The transparency page shows how AskTrip moves from that question to the final answer. It can be reached via the ‘Transparency’ button on each answer page:

1. The question is interpreted

First, AskTrip shows how it understood the question.

In this example, the system identifies the clinical intent as Outlook & Future Care. It also identifies the key elements of the question:

  • population: patients with benign prostatic hyperplasia
  • condition: benign prostatic hyperplasia
  • intervention: transurethral water-jet ablation, also known as Aquablation
  • phase: treatment
  • outcome: evidence

Some of these are marked as explicit, because they come directly from the user’s question. Others are marked as implied, because they are necessary to make sense of the question.

This step is important because even a short clinical question carries assumptions. Here, the user is not asking about “surgical removal of the prostate” in general. They are asking about the evidence for a specific minimally invasive treatment for benign prostatic hyperplasia.

By showing the interpreted question, AskTrip makes it easier to check whether the system has preserved the user’s clinical intent.

2. Searches are constructed

AskTrip does not rely on a single search.

For the Aquablation question, the transparency page shows several different search routes:

  • broad lexical searches
  • focused lexical searches
  • PubMed-style searches
  • vector search using the original question
  • similar previous questions

The broad searches include natural-language variants such as:

  • transurethral water-jet ablation evidence benign prostatic hyperplasia
  • Aquablation clinical trials benign prostatic hyperplasia
  • transurethral water-jet ablation outcomes BPH
  • Aquablation vs other treatments for BPH
  • Aquablation long-term results benign prostatic hyperplasia

The focused searches are narrower, using terms such as Aquablation, water-jet ablation, benign prostatic hyperplasia, symptom improvement, quality of life, efficacy, effectiveness and recurrence.

The PubMed searches use MeSH and title/abstract terms, while the vector search sends the original question as-is to the semantic search system.

This matters because different search methods find different things. A lexical search may find exact terminology. A vector search may find conceptually similar material. PubMed-style searching may retrieve biomedical literature indexed in a more formal way.

The point is not to trust one search route. It is to gather candidate evidence from several routes, then filter and prioritise it.

3. Results are retrieved

In this example, AskTrip retrieved 292 total results:

  • 107 from vector search
  • 77 from broad lexical search
  • 75 from focused lexical search
  • 33 from similar questions (See ‘reference stripping’ in this blog post)

PubMed found a further 53 results, which were held in reserve and used only if more evidence was needed.

This makes the retrieval stage visible. The final answer is not based on a single hidden query. It is built from a wider set of candidate evidence gathered through several search methods.

4. Duplicates are removed

The same guideline, review or study may be found through more than one route.

AskTrip therefore removes duplicates before moving further through the pipeline. In this example, 292 retrieved results became 202 deduplicated articles.

This matters because duplication can distort the apparent volume of evidence. A document found by three routes is not three separate pieces of evidence. Deduplication helps keep the evidence base cleaner before relevance scoring begins.

5. Relevance is scored

The deduplicated documents are then scored for relevance.

In the Aquablation example, the transparency page shows the score breakdown:

  • score 10: 11 documents
  • score 9: 22 documents
  • score 8: 13 documents
  • score 7: 12 documents
  • score 6: 7 documents
  • score 5 and below: 137 documents

The standard inclusion threshold is 6, but guidelines with a relevance score of 5 can also be included because guidelines may still be clinically important even when their wording does not closely match the user’s question.

This step helps separate documents that merely mention BPH or surgical treatment from documents that are likely to answer the specific question about Aquablation.

6. Documents are prioritised

From the scored results, AskTrip prioritised 26 documents.

These included:

  • 6 essential sources, such as guidelines and systematic reviews
  • 14 desirable sources, such as RCTs, cohort studies and high-quality primary research
  • 6 other sources, such as case reports, opinion or background material

The transparency page also shows where the prioritised documents came from:

  • 23 from vector search
  • 21 from focused lexical search
  • 18 from broad lexical search
  • 5 from similar questions

Those numbers overlap because a single document may be found by more than one route.

This stage is important because AskTrip is not simply counting results. It is trying to identify the documents most likely to support a useful, evidence-aware answer.

7. Evidence is extracted

The next stage is evidence extraction.

In this example, 10 documents were extracted and none were skipped.

The transparency page shows the documents that contributed evidence, including guidelines and systematic reviews. It also labels sources by type, quality and directness.

For this question, several extracted sources were marked as direct evidence. These included material on Aquablation for lower urinary tract symptoms caused by BPH, comparative outcomes against TURP, symptom improvement, safety, sexual function, ejaculatory preservation and reintervention rates.

One source, the AUA guideline amendment, was marked as indirect. That is useful because it shows that not all included sources are treated as equally direct. A document may be relevant and still not provide direct outcome data for the intervention question.

This is a key part of the transparency work. AskTrip is not just listing citations. It is trying to show what evidence was extracted, how it relates to the question, and whether it is direct or indirect.

8. Evidence quality is scored

AskTrip then provides an evidence confidence judgement.

For the Aquablation example, the answer confidence was High. The transparency page explained this using several dimensions:

  • source quality: direct, high-quality evidence
  • evidence base: 3/3
  • answer quality: 3/3
  • actionability: strong
  • directness: direct
  • consistency: consistent
  • effect signal: clear
  • sufficiency: adequate

The explanation notes that the evidence for Aquablation in BPH is direct and consistent for outcomes such as symptom reduction, safety and preservation of sexual function. It also notes that multiple systematic reviews and guidelines report improvements in lower urinary tract symptoms and quality of life, broadly comparable to established treatments such as TURP, with potential advantages around sexual function preservation.

These judgements are not meant to replace formal critical appraisal. They are practical signals to help users understand how much weight the answer deserves, and why.

9. The answer is generated and cited

Finally, the answer is generated from the evidence that has passed through the previous stages.

In this example, the final answer cited 8 documents, with a median publication year of 2025.

The cited documents included:

  • NICE interventional procedures evidence review material
  • systematic reviews on minimally invasive treatments for BPH
  • Canadian guidance on male lower urinary tract symptoms and BPH
  • systematic reviews on ejaculatory function and sexual outcomes
  • systematic reviews on reintervention rates
  • French clinical guideline material on surgical and interventional management of bladder outlet obstruction related to BPH

Users can see which documents were cited and click through to inspect them.

The important point is that the final answer is not presented in isolation. It is connected back to the question interpretation, the searches, the retrieved results, the deduplication, the scoring, the prioritisation, the extracted evidence and the evidence quality judgement.

It does not ask for blind trust

The transparency page is not there to make AskTrip look clever. It is there to make AskTrip inspectable.

It helps users see:

  • whether the question was interpreted correctly
  • which searches were run
  • what evidence was retrieved
  • how duplicates were removed
  • how relevance was scored
  • which documents were prioritised
  • what evidence was extracted
  • how evidence confidence was judged
  • which sources were finally cited

This makes it easier to spot problems: a misread question, weak retrieval, over-reliance on indirect evidence, or an answer that sounds stronger than the evidence allows.

Transparency does not make an evidence system perfect. There will still be questions where the literature is incomplete, inconsistent, indirect or poorly applicable to the patient in front of the clinician.

But transparency changes the relationship between the user and the system.

Instead of asking clinicians, librarians and evidence specialists to trust a black box, AskTrip gives them a process they can inspect, question and challenge.

Clinical AI should not just provide answers.

It should show its working.