Search

Trip Database Blog

Liberating the literature

Beta Q&A Update – Gathering User Feedback

Over 30 people signed up to test our automated Q&A system, though it’s unclear how many actively participated. That said, we received over 200 questions – which averages out to around 7 per tester. Realistically, some asked just one or two while others were clearly more enthusiastic, which is a great sign that we’re on the right track!

We’re now moving into the feedback phase and have asked testers to share their views across several key areas:

  • User characteristics: Confirmation of professional status
  • Usage frequency: Number of questions asked during the trial
  • Perceived accuracy: Subjective judgement of how well answers reflected the evidence
  • Clinical relevance: Relevance of responses to the clinical scenario posed
  • Trustworthiness: Level of trust placed in the answer content
  • Responsiveness: Perceived speed of system response
  • Answer format: Feedback on the structure and style of the response (e.g. narrative vs. quantitative balance, referencing)
  • Likelihood of recommendation: Willingness to recommend the tool to colleagues
  • Improvement suggestions: Opportunities to improve usability, content quality, or design
  • Overall impressions: General feedback on value, potential for routine use, and any concerns

We’re a mix of nervous and excited – but that’s the whole point of testing. We know it’s not perfect, and with thoughtful feedback, we’re confident we can make it significantly better.

Let’s call it nervously optimistic.

Some sample Qs (and clinical bottom line answers) from the beta test

The volume and breadth of questions is amazing. The beta test, while challenging, has been a wonderful experience. Below are some example questions and, for brevity, I have simply add the clinical bottom line (NOTE: many of these will have been answered with a corpus lacking the majority of our systematic reviews so many are likely to be improved)

Treatment for Basedow Graves?
Clinical Bottom Line: The mainstays of treatment for Graves’ disease include antithyroid drugs, radioactive iodine, and surgical intervention, each with specific indications based on disease characteristics and patient preferences. Emerging therapies, such as monoclonal antibodies, offer promising future options, particularly for refractory cases or those with significant orbitopathy. A personalized approach, considering safety and patient-centric factors, is crucial for effective management.

What is the best treatment for binge eating disorders?
Clinical Bottom Line: For Binge Eating Disorder (BED), Cognitive Behavioral Therapy (CBT) remains the gold standard, supported by substantial evidence affirming its efficacy in reducing binge frequency and improving associated psychological outcomes. Pharmacotherapeutic options, such as lisdexamfetamine and naltrexone/bupropion, offer additional non-psychological avenues, particularly benefiting patients where BED coincides with obesity. Ultimately, a multidisciplinary approach tailored to individual needs—incorporating psychological, pharmacological, and possibly family-based interventions—is optimal for the management of BED.

Is type 2 diabetes mellitus a disease? What is the diseased organ if it is a disease?
Clinical Bottom Line: Type 2 diabetes mellitus is a complex metabolic disease involving multiple organs, primarily characterized by pancreatic β-cell dysfunction and insulin resistance across various tissues such as the liver, skeletal muscles, and adipose tissue, leading to systemic metabolic consequences.

Causes of septal panniculitis?
Clinical Bottom Line: Septal panniculitis encompasses a broad spectrum of diseases, chiefly characterized by etiologies involving inflammation of fat septa, either with or without vasculitis. Recognizing these underlying factors is imperative for targeted therapy and effective management.

What is the current evidence regarding nighttime dosing of antihypertensives? What is the rationale? What are the key trials regarding this? What are the current recommendations?
Clinical Bottom Line: While nighttime dosing of antihypertensive medications shows promise in certain contexts, overall evidence remains inconclusive. Current guidelines do not recommend routine switching to nighttime dosing, favoring individualized approaches based on patient needs and comprehensive BP management strategies.

In adult patients who have heart failure does providing early discharge teaching instead of current practice improve the 30-day readmission rate?
Clinical Bottom Line: Implementing early discharge education, particularly using interactive methods like the teach-back approach, significantly reduces 30-day readmission rates in heart failure patients compared to standard practice. Tailoring educational interventions to individual patient needs within a structured, multidisciplinary patient discharge plan is recommended to optimize outcomes and enhance self-management. Careful adoption of proven methods into practice will ensure better patient engagement, adherence, and overall health improvement.

Is niacinamide effective for primary or secondary prevention of skin cancer?
Clinical Bottom Line: Nicotinamide is beneficial for secondary prevention of non-melanoma skin cancers in high-risk patients, while its role in primary prevention warrants further investigation.

Still a rollercoaster

The Q&A beta testing has been going for just over a week. After 24 hours I described he trial as a rollercoaster, that is still the case and the trial is currently paused – hardly ideal.

The pause was triggered by informal feedback from a user who had asked about strength training for knee osteoarthritis. They felt the response was poor and lacked key references. Curious, I ran a quick search on Trip for strength training AND knee osteoarthritis and found plenty of systematic reviews. So why did the Q&A system miss them? It turned out the system was searching the free version of Trip, not the Pro version. And that’s a crucial difference—because the free version is missing nearly half a million systematic reviews, including those vital to answering the question properly.

It’s definitely a setback, with potential implications for many of the Q&As the system has previously answered. That said, this is exactly what beta testing is for – identifying issues so we can improve the final version – we’ve uncovered a major flaw and we’re already working on fixing it.

There are plenty of positives too. We’ve had well over 100 questions submitted (a sample are shown below), which suggests testers are coming back and engaging with the system – an encouraging sign that they like it. Many of these questions likely weren’t affected by the missing systematic reviews. Plus, a number of Q&As have been externally reviewed, and the quality of the answers remains strong.

Hopefully, the systematic review issue will be resolved today, allowing us to re-open the beta. Then, we’ll move into the user feedback stage next week.

Will the rollercoaster ever stop?

A sample of the questions we’ve answered:

🧠 Mental Health
– Do different age groups have different outcomes with rTMS for depression?

– Can GLP-1 drugs increase the risk of suicidality and self-harm in people with diabetes or obesity?

– Does magnesium supplementation help improve sleep?

❤️ Cardiology & Blood Health
– SGLT2 inhibitors: Do they reduce heart failure mortality?

– What’s the target blood pressure according to ESC 2024 guidelines?

– Low-dose aspirin for primary prevention: What do the latest guidelines say?

– Are there benefits and risks to long-term anticoagulation for VTE prevention?

🦴 Musculoskeletal & Rehab
– What’s the best evidence-based treatment for frozen shoulder?

– Strength training vs. other treatments in knee osteoarthritis

– Diagnosing a syndesmosis ankle injury

– When can kidney donors return to normal activity after surgery?

🧬 Endocrinology & Metabolism
– Is type 2 diabetes a disease — and what organ is affected?

– What cholesterol changes are caused by Actemra?

– Does endometriosis cause or result from infertility?

– Zoledronic acid: How it works and its role in treating osteoporosis

🧒 Paediatrics
– Best practices to prevent surgical site infections in post-op pediatric abdominal surgery

– Are there any unique Tdap booster recommendations for adults in contact with infants or who are pregnant?

🦠 Infectious Diseases
– Managing iatrogenic UTI caused by MDR Klebsiella

– Tetanus booster after Td: Is Tdap now recommended, and what’s the schedule?

🧘‍♂️ Complementary & Lifestyle Medicine
– Does magnesium help with sleep or muscle cramps?

– How does the patient experience shape care in primary care settings?

🧪 Pharmacology & Guidelines
– What are the effects of Actemra on cholesterol?

– How do different anticoagulation durations compare in terms of safety and efficacy?

🧠 Neurology & Imaging
– Predicting hospital stay with the Modified Rankin Scale

– What conditions (other than pneumothorax) show a lung point on ultrasound?

– How to conduct a peripheral neurological assessment

🧬 Genetics & Rare Conditions
– Are people with Ehlers-Danlos Syndrome at increased risk of periodontitis?

🩺 Kidney Health
– What’s the living kidney donor work-up process?

– How long is recovery and medication use for recipients of kidney transplants?

– What are the surgical risks of living kidney donation?

A Rollercoaster 24 Hours of Q&A System Testing

Wow, what a 24 hours!

It’s been intense – in a good way.

Nearly 30 people signed up to test the Q&A system (I was hoping for 10!), and we’ve already answered 20 questions, including:

  • Do different age groups have different outcomes with rTMS therapy for depression?
  • What are best practices for surgical site infection prevention for post operative pediatric abdominal surgery?
  • How do SGLT2 inhibitors affect heart failure mortality?
  • Hyperthyroidism causes dilated cardiomyopathy through which mechanism?
  • Please list any studies that use the modified Rankin scale to screen for length of stay in hospital that have a p value less than .05

A really interesting and challenging range of questions!

The system did break for a couple of hours (of course!), but it’s now fixed. The issue was one of three we’ve had to deal with, to improve performance – so even the hiccups was part of progress. With over a week of testing still to go there is still time for a lot more excitement/feedback…!

The only downside? I won’t know how accurate or useful the answers were until feedback comes in—and that might take a little while. But I can’t wait to find out.

Automated Q&A – ready for beta testing

We’re excited (and a little nervous!) to share something we’ve been working hard on. After resolving many of the early teething issues, we’re now ready for real-world testing — and that’s where you come in.

Our new system is designed specifically for health professionals, allowing you to ask any medical question — from the straightforward to the complex — and receive fast, evidence-based answers. We’re now looking for users to really put it through its paces. If you’re interested in trying it out, please get in touch: jon.brassey@tripdatabase.com.

We’d love your feedback on accuracy, speed, format, use of evidence, and anything else that could help us improve.

Below is an example Q&A – What are the pros and cons of prostate cancer screening? – it took 25 seconds to generate, so it gives you an idea of the type of output to expect

Clinical Bottom Line:

Prostate cancer screening, particularly PSA testing, presents a balanced proposition: it can lower disease-specific mortality and catch cancers early, yet it carries significant risks, including overdiagnosis and overtreatment. Decisions should be individualized, factoring in the patient’s risk profile and preferences, while ensuring informed discussions. Ultimately, the harms and benefits need careful evaluation to make appropriate clinical choices.

Answer:

Pros and Cons of Prostate Cancer Screening

Prostate cancer screening, particularly through prostate-specific antigen (PSA) testing, has been a subject of extensive debate. The evidence from several studies and guidelines highlights the following benefits and drawbacks:

Pros

  1. Reduction in Prostate Cancer-Specific Mortality: Screening can lead to a significant reduction in prostate cancer mortality. The European Randomized Study of Screening for Prostate Cancer (ERSPC) showed a 21% reduction after 13 years of follow-up (Canadian Urological Association, 2022).
  2. Early Detection: PSA tests enable the early detection of both clinically significant and insignificant prostate cancers, allowing for earlier intervention that might be curative (National Cancer Institute).
  3. Reduction in Metastatic Disease: Screening decreases the incidence of metastatic prostate cancer, potentially reducing its physical and psychological burden ([Ilic et al., 2018]).
  4. Refinement with Risk Stratification: Implementing tests like the 4Kscore as a reflex can considerably reduce unnecessary MRI and biopsy procedures by predicting the aggression of detected cancers (GÖTEBORG-2 Trial).

Cons

  1. Overdiagnosis and Overtreatment: A considerable proportion of screen-detected prostate cancers are clinically insignificant, leading to unnecessary treatments and associated side effects such as impotence and incontinence (IQWiG).
  2. Harms of Treatment: Treatments may lead to lasting side effects, impacting quality of life significantly. Radical prostatectomy and radiation therapy are linked to higher rates of urinary incontinence and sexual dysfunction ([BI]).
  3. Psychological Impact: The anxiety associated with false positives, biopsies, and even a diagnosis of indolent cancer can have significant psychological impacts (National Cancer Institute).
  4. Uncertain Benefit on All-Cause Mortality: Despite some decrease in prostate cancer mortality, the overall mortality benefit remains unclear, with no significant impact on all-cause mortality (IQWiG).
  5. False Positives and Biopsy Complications: The high false-positive rate with PSA testing leads to unnecessary biopsies, which have risks such as infection and significant patient distress (ACP, 2013).
  6. Implementation in the Elderly: The benefit of continued PSA screening in older men is contentious, with a considerable risk of overdiagnosis without significant mortality reduction (Pubmed).

Considerations for Decision-Making

  • Individual Preferences: It is essential to involve patients in shared decision-making. Their individual risks, values, and preferences should guide decisions about whether to pursue screening ([BI]).
  • Complexity of Screening: Screening-related decisions are complex and involve balancing potential life extension against the risk of significant harm and reduced quality of life.

Automated reviews – internal testing

This morning, we released the fully automated Q&A system internally. We’ve already asked it a number of questions, and responses are coming back in 10–30 seconds, which is promising.

That said, it hasn’t all been plain sailing—maybe my expectations were a bit too high. Two main issues cropped up:

  1. Format – The answers didn’t look great. They lacked flow and polish. Thankfully, this is a relatively easy fix.
  2. Content – Moving from the web interface of the LLMs (e.g., ChatGPT) to their APIs (so we can access the data without visiting the site) and then ‘stitching’ all the steps together, introduced some problems. In previous tests, we didn’t have a single bad answer. This new system, though, delivered a few that just weren’t up to scratch. I don’t mean disastrously wrong – just not good enough.

We’re now working through these issues, refining the prompts (the instructions we give to the LLMs), and tightening things up.

It’s a bit disappointing not to get it right first time, but we don’t think the fixes will be too onerous. The next version should be much stronger—and hopefully ready for external testing soon.

Automated rapid reviews

Some years ago, I worked at Public Health Wales (PHW), and I’ve remained connected to the organisation and interested in the field of public health. Recognising that the evidence needs in public health often differ from those in mainstream clinical medicine, I was curious to explore how automation—particularly large language models (LLMs)—might support the production of public health evidence syntheses.

To test this, I selected a topic at random from PHW’s evidence service: A rapid review of barriers and facilitators to cancer screening uptake (breast, cervical, and bowel) in underserved populations, which had been published on medRxiv.

To explore this further, I adapted the methodology we use in our automated Q&A system, introducing an additional step – citation chasing – into the evidence-gathering process. Starting with 11 highly relevant original articles from the initial search, we used both backward and forward citation chasing to identify an additional 28 studies. With further layers of chasing, we could have found even more, but for this proof-of-concept exercise, the goal was simply to see whether the approach would work.

The combined set of 39 studies was analysed by the LLM, which extracted relevant content and drafted a comprehensive review. In addition to the main review, we asked the system to generate an abstract, a one-page summary for policymakers, and a formal methods section. The final output totalled approximately 10 pages. As a taster, here is the abstract (let me know if you’d like to see the full report:

Cancer screening is a critical tool for early detection and improved health outcomes, yet underserved populations face significant barriers to accessing these life-saving services. This report synthesizes findings from multiple studies to identify key barriers and facilitators to cancer screening uptake for cervical, breast, and colorectal cancers among underserved groups. Common barriers include lack of knowledge, cultural and social norms, financial constraints, psychological fears, and distrust in the healthcare system. Facilitators include healthcare provider recommendations, culturally tailored interventions, alternative screening methods (e.g., self-sampling), and organized programs with support systems. The report highlights the need for multi-level, community-centered interventions that address systemic, cultural, and individual barriers. Policy recommendations include expanding access to affordable screening, increasing culturally competent care, and leveraging community-based outreach to improve screening rates and reduce health disparities.

To evaluate the quality and usefulness of the output, we asked ChatGPT to compare our generated review with the original medRxiv article:

Both the automated report and the medRxiv article explore the same core issue: understanding the barriers and facilitators to cancer screening uptake among underserved populations, specifically in relation to breast, cervical, and colorectal cancers. They arrive at broadly similar conclusions, highlighting key obstacles such as limited knowledge and awareness, cultural stigma, psychological fear, logistical and financial constraints, and widespread distrust in healthcare systems—particularly among marginalised groups. Likewise, both recognise similar facilitators: strong recommendations from healthcare providers, culturally tailored interventions, community-based outreach, the use of peer advocates, and alternative screening methods like self-sampling.

Despite this common ground, the two documents differ significantly in style, methodology, and intended use. The automated report takes a narrative and practice-oriented approach. It uses large language models (LLMs) to guide literature search and synthesis, offering rich thematic analysis with detailed contextual commentary for healthcare professionals. It includes a policy-maker one-pager, targeted insights for frontline practitioners, and practical recommendations rooted in real-world applicability. Its tone is accessible and it’s structured for immediate relevance to policy and service design.

In contrast, the medRxiv article follows a more formal academic structure, presenting a systematic review conducted using PRISMA guidelines. Its distinctive feature is the use of the COM-B model (Capability, Opportunity, Motivation – Behaviour), a behavioural science framework used to categorise and interpret barriers and enablers to screening. This gives it strong utility for designing or evaluating interventions, especially in academic or research settings. It is less practice-facing but more tightly aligned to theoretical models of behaviour change.

In terms of utility, the automated report is particularly valuable for practitioners, community health programme designers, and policy-makers seeking actionable guidance and context-sensitive strategies. It is grounded in practical implementation. The medRxiv article, by contrast, would be most useful for researchers, public health academics, and those developing behaviourally-informed interventions where theoretical coherence is important.

In essence, the two are complementary. One is a tool for action; the other, a tool for theory-driven design. Used together, they offer a robust foundation for improving cancer screening in underserved populations.”

It’s worth noting that this may not be a fully fair comparison, as the medRxiv article follows a formal academic journal format. There may well be a more practitioner-focused, internal version of the review that could have provided a closer parallel to the automated output. It’s also important to emphasise that this was a proof of concept designed to test the feasibility of the approach. With further refinement and more examples, the process and the quality of outputs are likely to improve significantly.

The PHW review involved eight authors and likely required hundreds of hours of work over several months. That effort undoubtedly brought depth, rigour, and valuable expert input—qualities that may exceed the automated version in certain areas.

In contrast, the fully automated approach we tested could produce a review in under ten minutes.

To be clear, I am not suggesting that this type of automation should replace traditional review processes used by organisations like PHW. However, I do see two clear potential use cases:

  • Efficiency and Acceleration – Could automation be used to do the heavy lifting in the early stages of a review, significantly speeding up the process and freeing up expert time for interpretation and refinement?
  • Capacity Gaps – During my time at PHW, the evidence service often had to decline requests due to limited capacity. In such cases, a fully automated review – while not perfect – may be far better than no review at all, or one done hastily under resource constraints.

It’s still early days, but the potential is hugely exciting. The question now is: where could this take us next?

Automated Q&A – more milestones

Two big milestones reached….

Firstly, our system has now been tested on over 50 Q&As – we’re refining the process and the outputs all the time.

Secondly, one question has proven difficult to get right and we now seem to have perfected the system to overcome that difficulty. The question being What is the best initial treatment for mild hypertension in adult afro Caribbean males? And the answer is below.

We hope to have a test version out before the end of the month and then it’s over to a small group of testers to start using it real world conditions – now that really will be interesting!

Best Initial Treatment for Mild Hypertension in Adult Afro-Caribbean Males

For adult Afro-Caribbean males with mild hypertension, the optimal initial treatment involves a combination of lifestyle modifications and pharmacotherapy, guided by robust evidence and clinical guidelines.

Pharmacotherapy:

  1. First-Line Agents:
    • Calcium channel blockers (CCBs), such as amlodipine, are the preferred first-line treatment due to their proven efficacy in reducing blood pressure and cardiovascular events in populations of African descent. CCBs are particularly effective in this group because they counteract salt sensitivity and lower peripheral vascular resistance, which are common pathophysiological features in Afro-Caribbean individuals.
    • Thiazide-like diuretics, such as chlorthalidone, are an alternative first-line option, especially if CCBs are not tolerated. Diuretics are effective in preventing cardiovascular events and are supported by findings from the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT), which demonstrated their superiority over ACE inhibitors in Black populations.
    ACE inhibitors and angiotensin II receptor blockers (ARBs) are less effective as monotherapy in Afro-Caribbean males due to lower renin levels in this population. However, they may be considered in combination with CCBs or diuretics if additional cardiovascular benefits are needed.
  2. Combination Therapy:
    • If monotherapy does not achieve target blood pressure levels, combining a CCB with a thiazide-like diuretic is often effective. This approach addresses multiple physiological pathways involved in hypertension and is recommended by the 2024 European Society of Cardiology (ESC) guidelines.

Lifestyle Modifications:

  • Dietary changes: Reducing sodium intake and adopting the DASH diet (Dietary Approaches to Stop Hypertension) can significantly lower blood pressure.
  • Physical activity: At least 150 minutes of moderate-intensity exercise per week is recommended.
  • Weight management: Achieving and maintaining a healthy BMI is critical.
  • Alcohol moderation: Limiting alcohol consumption to no more than 1-2 drinks per day.

These lifestyle interventions not only enhance the effectiveness of pharmacotherapy but also contribute to overall cardiovascular risk reduction.

Guideline Recommendations:

  • The 2024 ESC guidelines and NICE guidelines (2023) strongly recommend CCBs and thiazide-like diuretics as first-line agents for hypertension management in Black populations, including Afro-Caribbean males. These guidelines emphasize the importance of individualized treatment plans and regular monitoring to achieve optimal blood pressure control.

Conclusion:

The best initial treatment for mild hypertension in adult Afro-Caribbean males involves initiating therapy with a calcium channel blocker or thiazide-like diuretic, alongside lifestyle modifications. Regular follow-up and personalized adjustments are essential to ensure effective blood pressure control and reduce cardiovascular risk.


References:

  1. 2024 ESC Guidelineshttps://academic.oup.com/eurheartj/advance-article/doi/10.1093/eurheartj/ehae178/7741010?login=false
  2. NICE Guidelines (2023)https://www.nice.org.uk/guidance/ng136/resources/hypertension-in-adults-diagnosis-and-management-pdf-66141722710213
  3. ALLHAT Trialhttps://jamanetwork.com/journals/jama/fullarticle/195626

New mobile interface

A slightly mis-leading title as it’s actually a new website design that affects all versions of the site – mobile, tablet and PC (it’s a responsive website design). However, the main motivation was to improve the mobile interface that was far from ideal (an understatement). Here’s an example of the old mobile interface (left) and the new (right):

It’s neater, the search box is a decent size and not being cramped by the search button!

Lots of other subtle changes and some big ones. For instance, on the old version if you tried to filter the results (by, say, guidelines) you could end up with the screen looking like this:

Leaving the user having to swipe to see the results. Now it works really well and behaves as most would expect.

This is not a finished design as there are more change to be made, but it is a great start. I hope you enjoy!

Blog at WordPress.com.

Up ↑