jrbtrip

Quality and automated Q&As

Yesterday, I returned to my former workplace – Public Health Wales (PHW) – to meet with the evidence team and discuss Trip’s use of large language models (LLMs). It was a great meeting, but unexpectedly challenging – in a constructive way. The discussion highlighted our differing approaches:

Automated Q&A – focused on delivering quick, accessible answers to support health professionals.
PHW evidence reviews – aimed at producing more measured, rigorous outputs, typically developed over several months.

The conversation reminded me of when I first began manually answering clinical questions for health professionals. Back then, I worried about not conducting full systematic reviews – was that a problem? Over time, I came to realise that while our responses weren’t systematic reviews, they were often more useful and timely than what most health professionals could access or create on their own. Further down the line, after many questions, I theorised that evidence accumulation and ‘correctness’ probably looked like this:

In other words you can – in most cases – get the right answer quite quickly and then after that it becomes a law of diminishing returns… In the graph above I would include Q&A in the ‘rapid review’ space.

Back at PHW, their strong reputation – and professionalism – means they’re understandably cautious about producing anything that could be seen as unreliable. Two key themes emerged in our discussion: transparency and reproducibility. Both are tied to concerns about the ‘black box’ nature of large language models: while you can see the input and the output, what happens in between isn’t always clear.

With their insights and suggestions, I’ve started sketching out a plan to address these concerns:

Transparency ‘button’ – While this may not be included in the initial open beta, the idea is to let users see what steps the system has taken. This could include the search terms used and which documents were excluded (from the top 100+ retrieved).
Peer review – Our medical director will regularly review a sample of questions and responses for quality assurance.
Encourage feedback – The system will allow users to flag responses they believe are problematic.
Reference check – We’ll take a sample of questions, ask them three separate times, and compare the clinical bottom lines and the references used.

This last point ties directly to the reproducibility challenge. We already know that LLMs can generate different answers to the same question depending on how and when they’re asked. The key questions are: How much do the references and answers vary? And more importantly, does that variation meaningfully affect the final clinical recommendation? That might make a nice research study!

If you have any additional suggestions for strengthening the Q&A system’s quality, I’d love to hear them.

Two final reflections:

First, it was incredibly valuable to gain an external perspective on our Q&A system and to better understand their scepticism and viewpoint (thank you PHW).
Second, AI is advancing rapidly, and evidence producers – regardless of their focus – need to engage with it now and start planning for meaningful integration.

May 7, 2025 0

Q&A: Categorising clinical questions

We expect to receive a large number of clinical questions and need an effective way to organise them for easy access. While users will be able to search the questions, browsing will also be supported through a classification scheme.

We plan to classify the questions in three ways:

Clinical area (e.g. cardiology, oncology) – we have a 38, from Allergy & Immunology to Urology
Question type (e.g. diagnosis, treatment)
Quality of evidence – a simple system to indicate how robust the evidence is in answering the question, this will be high, medium or low

The question type classification is an interesting one and the full list is:

Causes & Risk Factors
Screening, Detection & Diagnosis
Initial Management
Long-term Management
Complications & Adverse Effects
Special Considerations
Outlook & Future Care

We developed this approach to reflect the natural timeline of a condition – from risk factors and diagnosis through to treatment and prognosis. The idea was inspired by clinical guidelines, which provide comprehensive overviews of condition management but can’t address every possible clinical scenario. By linking relevant Q&As to each stage of the guideline, we can fill in those gaps – and potentially even allow users to submit specific questions directly from within the guideline itself.

May 3, 2025 0

Q&A – looking back at ATTRACT

ATTRACT was a clinical Q&A system that began in Gwent, Wales, in 1997. Members of the primary care team could submit questions by post, email, phone – or even fax – and we would provide an evidence-based answer. It was the inspiration behind the creation of Trip, designed to speed up the question-answering process. ATTRACT expanded from Gwent to cover all of Wales, and a few years later, I led the national Q&A service for England through the NeLH/NLH, alongside a number of other initiatives. Altogether, these services have answered over 10,000 questions – but ATTRACT remains closest to my heart. My first professional love, perhaps?

Now that we’ve successfully tested our automated Q&A system, we’re working hard to launch it as an open beta in the near future. One outstanding task is the site design – and while revisiting old ideas, I came across an image of the homepage from 2005:

And here is what an old answer looked like:

We actually wrote up an evaluation and had it published in the BMJ in 2001: Just in time information for clinicians: a questionnaire evaluation of the ATTRACT project

Given my experience in answering clinical questions the ability to automatically answer clinical questions is still staggering to me – and I’ve been working on this project for over a year. Before, in the ATTRACT days, a question would take 3-6 hours to answer and now they take 20-30 seconds. In fact I took the above question What is the efficacy of ginger in osteoarthritis of the knee or hip? and tried it in our system. it took 24 seconds to generate the answer (which can be viewed below). As I say staggering.

One final thought, what do we call the Q&A service. Do we call it Trip Answers, Trip Q&A, Ask Trip? These all indicate what the service is about – so that’s a positive for them. However, a more evocative (possibly even romantic) name would be ATTRACT. And, many will not know this, ATTRACT was an acronym for Ask Trip To Rapidly Alleviate Confused Thoughts – so still perfectly relevant!

Answer to the Q What is the efficacy of ginger in osteoarthritis of the knee or hip? [Note: this is before the improvements from the beta testing have been instigated, so future answers will be even better]

Clinical Bottom Line

Ginger supplements may provide symptomatic relief in knee osteoarthritis due to their anti-inflammatory properties, but the evidence is mixed. It should be considered as an adjunct to primary treatments, with attention to individual responses and study quality variability.

Answer

Evidence from Systematic Reviews and Meta-analyses

Knee Osteoarthritis
- A systematic review and meta-analysis (32214292) focused on evaluating ginger’s efficacy in knee osteoarthritis. While oral ginger did not show significant evidence to improve pain relief and function compared to placebo, topical ginger also did not provide statistically significant improvements in pain or knee function compared to standard treatments.
Mixed Findings on Efficacy
- Another systematic review reviewing ginger in osteoarthritis, including studies on both knee and hip joints, reported mixed findings with some studies showing ginger may not be superior in improving pain when compared to ibuprofen and noted that evidence supporting ginger for changes in disability and functional capacity is weak (27819836).

Randomized Controlled Trials (RCTs)

Phase I Clinical Trial
- A study evaluating a ginger-honey mixture, “G-Rup® syrup,” showed significant improvements in joint stiffness, pain, and functional scores in knee osteoarthritis patients (13070751). This suggests potential symptomatic benefits of a ginger-honey mixture in knee osteoarthritis.
Add-on Effect with Diclofenac
- In a double-blind placebo-controlled trial involving ginger and diclofenac for knee osteoarthritis, ginger in conjunction with diclofenac demonstrated superior improvements compared to either treatment alone (24617168).
Steamed Ginger Extract
- An RCT examining steamed ginger extract found significant improvements in pain and functional indices in patients with mild knee osteoarthritis, indicating its potential utility as a functional food in OA management (39212491).

Mechanism of Action and Safety Profile

Anti-inflammatory Properties
- Ginger’s active compounds, such as gingerol and shogaol, possess anti-inflammatory properties that can reduce levels of pro-inflammatory cytokines (TNF-α, IL-1β) and inflammatory markers like CRP in knee osteoarthritis patients ([BI]).
Safety and Tolerability
- Across various studies, ginger was generally well-tolerated with occasional mild gastrointestinal side effects compared to NSAIDs like ibuprofen, making it a safer alternative for long-term use (27819836).

Conclusion: Evidence on the efficacy of ginger in knee or hip osteoarthritis is mixed, with some RCTs demonstrating significant symptomatic benefits, particularly in knee osteoarthritis. However, the quality and consistency of evidence vary. While not a first-line treatment, ginger’s anti-inflammatory properties suggest it may be a valuable adjunctive therapy, especially given its favorable safety profile compared to NSAIDs.

April 28, 2025 1

Beta Q&A – types of questions

The questions are still coming in and we’re approaching 300, so I thought we could have a quick look at the types of questions!

Conditions: below is a list of conditions that the questions related to, the fact that diabetes, with 4, is the highest shows the large range of conditions asked about.

Diabetes: 4
Heart failure: 3
Hypertension: 3
Pregnancy: 3
Tetanus: 3
VTE (Venous Thromboembolism): 3
ASCVD (Atherosclerotic Cardiovascular Disease): 3
Duchenne muscular dystrophy: 3
Perinatal mental health: 3
Surgical site infection: 2
Stroke: 2
Type 2 diabetes mellitus: 2
Infertility: 2
Ventilator associated pneumonia: 2
Obesity: 2
Frozen shoulder: 2
Alzheimer dementia: 2
Asthma: 2
Coronary revascularization: 2
Tuberculosis (TB): 2
Low back pain: 2
Bowel cancer: 2
Cirrhosis: 2
Acute myocarditis: 2
Binge eating disorder: 2
Insomnia: 2
Venous insufficiency: 2
Pain: 2
Dilated cardiomyopathy: 2

Question type:

Treatment/Management 70-80
Evidence/Studies/Literature Review 45-55
Diagnosis/Assessment 20-25
Guidelines/Recommendations/Best Practices 15-20
Drug Information/Mechanism of Action/Reviews 15-20
Etiology/Causes/Mechanisms 10-15
Prognosis/Outcomes 10-15
Public Health/Prevention 10-15
Basic Science/Pathophysiology 5-10
Patient Experience/Qualitative Aspects 5-10
Ethical/Societal Considerations < 5
Other/Unclear < 5

Broad versus narrow questions:

Broad questions: These typically cover a wide range of aspects related to a condition, treatment, or topic. They might ask for general overviews, multiple options, or the fundamental principles.
Narrow questions: These focus on a very specific aspect, such as a particular drug, a precise diagnostic criterion, a specific patient population, or a detailed mechanism.
Approximate Count of Broad vs. Narrow Questions:

Broad Questions: Approximately 60 – 75 questions seem to have a broader scope. These often start with phrases like “What are,” “Explain,” “Discuss,” “What matters to patients in their patient experience,” or ask for lists or overviews of a topic. Example of a Broad Question: “What are the core concepts in the primary prevention of ASCVD ?”

Narrow Questions: Approximately 150 – 165 questions appear to be more narrowly focused. These often inquire about specific treatments (“What is the best treatment for…”), particular diagnostic methods (“How to diagnose…”), the role of a specific drug (“How do SGLT2 inhibitors affect…”), or very defined scenarios. Example of a Narrow Question: “What is the correct dose for Meropenem in patients with hemodiafiltration?”

Fascinating….

April 24, 2025 0

Beta Q&A – a lovely success

Why Usage Speaks Louder Than Words

In some ways, I didn’t even need to read the beta tester feedback. Why? Because the most compelling evidence was in the behavior itself: users kept coming back. That repeated engagement spoke volumes – it showed the system was delivering real value and gaining meaningful traction.

Positive Feedback Highlights

But we did ask for feedback and it was broadly very positive, the headlines:

70% were health professionals
Most asked 3+ questions
Accuracy was deemed high
The answers were deemed relevant and trustworthy
Speed – 70% said ‘very fast’ and 30% said ‘reassuringly paced’
100% of health professionals would recommend the system to their colleagues

Here are a few standout quotes:

Thanks for the opportunity – I feel a product from the Trip family has particular value given your history in information architecture and providing credible, evidence-tracked, healthcare information support
It is very impressive to see the speed and capacity to extract and summarise data from evidence resources
Amazing system – would use very frequently in clinical practice!
Please continue this excellent initiative
Honestly, overall the database is intriguing. It has a resiliency and foundation that lends itself to be far more trustworthy and clinically focused than most other databases. I see it also as a great tool to teach med students about building blocks of clinical reasoning and research.

What’s Next: Immediate and Future Enhancements

As well as the good there were lots of constructive feedback which falls into a number of stages of the Q&A process, with some examples of the issues:

Initial question processing – when a user submits the question we need to do some processing to better disambiguate questions, for instance one Q we received was simply liver elastography.
Answer creation – we need to better handle the search process e.g. send additional meta data, make the search more sensitive if too few result etc.
Answer design – the way we include references was problematic for many but also there was a wish for an overall strength/weakness of evidence statement to be included.
Answer placement – we need to add the Q&As to the Trip search index and to have systems in place to deal with duplications

All the above are seen as being ‘immediate’ action points, by that I mean these will be done before we roll this out as an open beta on Trip. There are some medium-long term improvements we need to do:

Add extra content types eg drug information resources.
Use location information – if the user is from the USA then favour American guidelines.
For each Q&A give additional prompts for follow-up questions. In other words if a user asks What are the pros and cons of prostate cancer screening? We might suggest follow-up questions such as What is the best screening tool for prostate cancer? or What are the different mortality rates at various cancer stages in prostate cancer?
Multi-lingual – allow users to ask Qs in their own language and get the answer back in their own language (see Apoyando el uso del idioma español en Trip Database.).

In conclusion

The beta test has been energising and insightful. With such a strong foundation and clear areas to build on, we’re more confident than ever that we’re creating something genuinely valuable for clinical decision-making. The next phase? Opening up the beta and continuing to learn, refine, and improve – together with our users.

April 24, 2025 2

Changes coming to Trip – free and Pro – a consultation

As we begin rolling out AI features in Trip (the first being primary care new evidence summary and the second Mind Maps, with Q&A to follow) we need to consider how to do so sustainably. This includes encouraging more users to upgrade to Pro subscriptions and reviewing the pricing structure of those subscriptions.

To encourage free users to upgrade to Pro, we plan to limit the filtering options available to them. This will help further highlight the distinction between Free and Pro tiers. Here’s what that could look like:

We’re likely to increase Pro pricing and are considering a grandfather clause – allowing existing institutional customers to retain current pricing, while the new rates apply only to new customers.

We’re taking a considered approach to these changes, so please share your thoughts – we’d love to hear your feedback.

April 17, 2025 0

Beta Q&A Update – Gathering User Feedback

Over 30 people signed up to test our automated Q&A system, though it’s unclear how many actively participated. That said, we received over 200 questions – which averages out to around 7 per tester. Realistically, some asked just one or two while others were clearly more enthusiastic, which is a great sign that we’re on the right track!

We’re now moving into the feedback phase and have asked testers to share their views across several key areas:

User characteristics: Confirmation of professional status
Usage frequency: Number of questions asked during the trial
Perceived accuracy: Subjective judgement of how well answers reflected the evidence
Clinical relevance: Relevance of responses to the clinical scenario posed
Trustworthiness: Level of trust placed in the answer content
Responsiveness: Perceived speed of system response
Answer format: Feedback on the structure and style of the response (e.g. narrative vs. quantitative balance, referencing)
Likelihood of recommendation: Willingness to recommend the tool to colleagues
Improvement suggestions: Opportunities to improve usability, content quality, or design
Overall impressions: General feedback on value, potential for routine use, and any concerns

We’re a mix of nervous and excited – but that’s the whole point of testing. We know it’s not perfect, and with thoughtful feedback, we’re confident we can make it significantly better.

Let’s call it nervously optimistic.

April 17, 2025 0

Some sample Qs (and clinical bottom line answers) from the beta test

The volume and breadth of questions is amazing. The beta test, while challenging, has been a wonderful experience. Below are some example questions and, for brevity, I have simply add the clinical bottom line (NOTE: many of these will have been answered with a corpus lacking the majority of our systematic reviews so many are likely to be improved)

Treatment for Basedow Graves?
Clinical Bottom Line: The mainstays of treatment for Graves’ disease include antithyroid drugs, radioactive iodine, and surgical intervention, each with specific indications based on disease characteristics and patient preferences. Emerging therapies, such as monoclonal antibodies, offer promising future options, particularly for refractory cases or those with significant orbitopathy. A personalized approach, considering safety and patient-centric factors, is crucial for effective management.

What is the best treatment for binge eating disorders?
Clinical Bottom Line: For Binge Eating Disorder (BED), Cognitive Behavioral Therapy (CBT) remains the gold standard, supported by substantial evidence affirming its efficacy in reducing binge frequency and improving associated psychological outcomes. Pharmacotherapeutic options, such as lisdexamfetamine and naltrexone/bupropion, offer additional non-psychological avenues, particularly benefiting patients where BED coincides with obesity. Ultimately, a multidisciplinary approach tailored to individual needs—incorporating psychological, pharmacological, and possibly family-based interventions—is optimal for the management of BED.

Is type 2 diabetes mellitus a disease? What is the diseased organ if it is a disease?
Clinical Bottom Line: Type 2 diabetes mellitus is a complex metabolic disease involving multiple organs, primarily characterized by pancreatic β-cell dysfunction and insulin resistance across various tissues such as the liver, skeletal muscles, and adipose tissue, leading to systemic metabolic consequences.

Causes of septal panniculitis?
Clinical Bottom Line: Septal panniculitis encompasses a broad spectrum of diseases, chiefly characterized by etiologies involving inflammation of fat septa, either with or without vasculitis. Recognizing these underlying factors is imperative for targeted therapy and effective management.

What is the current evidence regarding nighttime dosing of antihypertensives? What is the rationale? What are the key trials regarding this? What are the current recommendations?
Clinical Bottom Line: While nighttime dosing of antihypertensive medications shows promise in certain contexts, overall evidence remains inconclusive. Current guidelines do not recommend routine switching to nighttime dosing, favoring individualized approaches based on patient needs and comprehensive BP management strategies.

In adult patients who have heart failure does providing early discharge teaching instead of current practice improve the 30-day readmission rate?
Clinical Bottom Line: Implementing early discharge education, particularly using interactive methods like the teach-back approach, significantly reduces 30-day readmission rates in heart failure patients compared to standard practice. Tailoring educational interventions to individual patient needs within a structured, multidisciplinary patient discharge plan is recommended to optimize outcomes and enhance self-management. Careful adoption of proven methods into practice will ensure better patient engagement, adherence, and overall health improvement.

Is niacinamide effective for primary or secondary prevention of skin cancer?
Clinical Bottom Line: Nicotinamide is beneficial for secondary prevention of non-melanoma skin cancers in high-risk patients, while its role in primary prevention warrants further investigation.

April 11, 2025 0

Still a rollercoaster

The Q&A beta testing has been going for just over a week. After 24 hours I described he trial as a rollercoaster, that is still the case and the trial is currently paused – hardly ideal.

The pause was triggered by informal feedback from a user who had asked about strength training for knee osteoarthritis. They felt the response was poor and lacked key references. Curious, I ran a quick search on Trip for strength training AND knee osteoarthritis and found plenty of systematic reviews. So why did the Q&A system miss them? It turned out the system was searching the free version of Trip, not the Pro version. And that’s a crucial difference—because the free version is missing nearly half a million systematic reviews, including those vital to answering the question properly.

It’s definitely a setback, with potential implications for many of the Q&As the system has previously answered. That said, this is exactly what beta testing is for – identifying issues so we can improve the final version – we’ve uncovered a major flaw and we’re already working on fixing it.

There are plenty of positives too. We’ve had well over 100 questions submitted (a sample are shown below), which suggests testers are coming back and engaging with the system – an encouraging sign that they like it. Many of these questions likely weren’t affected by the missing systematic reviews. Plus, a number of Q&As have been externally reviewed, and the quality of the answers remains strong.

Hopefully, the systematic review issue will be resolved today, allowing us to re-open the beta. Then, we’ll move into the user feedback stage next week.

Will the rollercoaster ever stop?

A sample of the questions we’ve answered:

🧠 Mental Health
– Do different age groups have different outcomes with rTMS for depression?

– Can GLP-1 drugs increase the risk of suicidality and self-harm in people with diabetes or obesity?

– Does magnesium supplementation help improve sleep?

❤️ Cardiology & Blood Health
– SGLT2 inhibitors: Do they reduce heart failure mortality?

– What’s the target blood pressure according to ESC 2024 guidelines?

– Low-dose aspirin for primary prevention: What do the latest guidelines say?

– Are there benefits and risks to long-term anticoagulation for VTE prevention?

🦴 Musculoskeletal & Rehab
– What’s the best evidence-based treatment for frozen shoulder?

– Strength training vs. other treatments in knee osteoarthritis

– Diagnosing a syndesmosis ankle injury

– When can kidney donors return to normal activity after surgery?

🧬 Endocrinology & Metabolism
– Is type 2 diabetes a disease — and what organ is affected?

– What cholesterol changes are caused by Actemra?

– Does endometriosis cause or result from infertility?

– Zoledronic acid: How it works and its role in treating osteoporosis

🧒 Paediatrics
– Best practices to prevent surgical site infections in post-op pediatric abdominal surgery

– Are there any unique Tdap booster recommendations for adults in contact with infants or who are pregnant?

🦠 Infectious Diseases
– Managing iatrogenic UTI caused by MDR Klebsiella

– Tetanus booster after Td: Is Tdap now recommended, and what’s the schedule?

🧘‍♂️ Complementary & Lifestyle Medicine
– Does magnesium help with sleep or muscle cramps?

– How does the patient experience shape care in primary care settings?

🧪 Pharmacology & Guidelines
– What are the effects of Actemra on cholesterol?

– How do different anticoagulation durations compare in terms of safety and efficacy?

🧠 Neurology & Imaging
– Predicting hospital stay with the Modified Rankin Scale

– What conditions (other than pneumothorax) show a lung point on ultrasound?

– How to conduct a peripheral neurological assessment

🧬 Genetics & Rare Conditions
– Are people with Ehlers-Danlos Syndrome at increased risk of periodontitis?

🩺 Kidney Health
– What’s the living kidney donor work-up process?

– How long is recovery and medication use for recipients of kidney transplants?

– What are the surgical risks of living kidney donation?

April 9, 2025 0

	When feedback become… on Introducing Beyond Trip: Expan…
	When feedback become… on Negative feedback is the …
	Negative feedback is… on Systematic review score…
	AskTrip Phase Two Te… on From GRADE to AskTrip: evaluat…
	How AskTrip’s new se… on What Is Vector Search?

	When feedback become… on Introducing Beyond Trip: Expan…
	When feedback become… on Negative feedback is the …
	Negative feedback is… on Systematic review score…
	AskTrip Phase Two Te… on From GRADE to AskTrip: evaluat…
	How AskTrip’s new se… on What Is Vector Search?

Trip Database Blog

Liberating the literature

Author