Search

Trip Database Blog

Liberating the literature

Month

March 2025

Automated Q&A – ready for beta testing

We’re excited (and a little nervous!) to share something we’ve been working hard on. After resolving many of the early teething issues, we’re now ready for real-world testing — and that’s where you come in.

Our new system is designed specifically for health professionals, allowing you to ask any medical question — from the straightforward to the complex — and receive fast, evidence-based answers. We’re now looking for users to really put it through its paces. If you’re interested in trying it out, please get in touch: jon.brassey@tripdatabase.com.

We’d love your feedback on accuracy, speed, format, use of evidence, and anything else that could help us improve.

Below is an example Q&A – What are the pros and cons of prostate cancer screening? – it took 25 seconds to generate, so it gives you an idea of the type of output to expect

Clinical Bottom Line:

Prostate cancer screening, particularly PSA testing, presents a balanced proposition: it can lower disease-specific mortality and catch cancers early, yet it carries significant risks, including overdiagnosis and overtreatment. Decisions should be individualized, factoring in the patient’s risk profile and preferences, while ensuring informed discussions. Ultimately, the harms and benefits need careful evaluation to make appropriate clinical choices.

Answer:

Pros and Cons of Prostate Cancer Screening

Prostate cancer screening, particularly through prostate-specific antigen (PSA) testing, has been a subject of extensive debate. The evidence from several studies and guidelines highlights the following benefits and drawbacks:

Pros

  1. Reduction in Prostate Cancer-Specific Mortality: Screening can lead to a significant reduction in prostate cancer mortality. The European Randomized Study of Screening for Prostate Cancer (ERSPC) showed a 21% reduction after 13 years of follow-up (Canadian Urological Association, 2022).
  2. Early Detection: PSA tests enable the early detection of both clinically significant and insignificant prostate cancers, allowing for earlier intervention that might be curative (National Cancer Institute).
  3. Reduction in Metastatic Disease: Screening decreases the incidence of metastatic prostate cancer, potentially reducing its physical and psychological burden ([Ilic et al., 2018]).
  4. Refinement with Risk Stratification: Implementing tests like the 4Kscore as a reflex can considerably reduce unnecessary MRI and biopsy procedures by predicting the aggression of detected cancers (GÖTEBORG-2 Trial).

Cons

  1. Overdiagnosis and Overtreatment: A considerable proportion of screen-detected prostate cancers are clinically insignificant, leading to unnecessary treatments and associated side effects such as impotence and incontinence (IQWiG).
  2. Harms of Treatment: Treatments may lead to lasting side effects, impacting quality of life significantly. Radical prostatectomy and radiation therapy are linked to higher rates of urinary incontinence and sexual dysfunction ([BI]).
  3. Psychological Impact: The anxiety associated with false positives, biopsies, and even a diagnosis of indolent cancer can have significant psychological impacts (National Cancer Institute).
  4. Uncertain Benefit on All-Cause Mortality: Despite some decrease in prostate cancer mortality, the overall mortality benefit remains unclear, with no significant impact on all-cause mortality (IQWiG).
  5. False Positives and Biopsy Complications: The high false-positive rate with PSA testing leads to unnecessary biopsies, which have risks such as infection and significant patient distress (ACP, 2013).
  6. Implementation in the Elderly: The benefit of continued PSA screening in older men is contentious, with a considerable risk of overdiagnosis without significant mortality reduction (Pubmed).

Considerations for Decision-Making

  • Individual Preferences: It is essential to involve patients in shared decision-making. Their individual risks, values, and preferences should guide decisions about whether to pursue screening ([BI]).
  • Complexity of Screening: Screening-related decisions are complex and involve balancing potential life extension against the risk of significant harm and reduced quality of life.

Automated reviews – internal testing

This morning, we released the fully automated Q&A system internally. We’ve already asked it a number of questions, and responses are coming back in 10–30 seconds, which is promising.

That said, it hasn’t all been plain sailing—maybe my expectations were a bit too high. Two main issues cropped up:

  1. Format – The answers didn’t look great. They lacked flow and polish. Thankfully, this is a relatively easy fix.
  2. Content – Moving from the web interface of the LLMs (e.g., ChatGPT) to their APIs (so we can access the data without visiting the site) and then ‘stitching’ all the steps together, introduced some problems. In previous tests, we didn’t have a single bad answer. This new system, though, delivered a few that just weren’t up to scratch. I don’t mean disastrously wrong – just not good enough.

We’re now working through these issues, refining the prompts (the instructions we give to the LLMs), and tightening things up.

It’s a bit disappointing not to get it right first time, but we don’t think the fixes will be too onerous. The next version should be much stronger—and hopefully ready for external testing soon.

Automated rapid reviews

Some years ago, I worked at Public Health Wales (PHW), and I’ve remained connected to the organisation and interested in the field of public health. Recognising that the evidence needs in public health often differ from those in mainstream clinical medicine, I was curious to explore how automation—particularly large language models (LLMs)—might support the production of public health evidence syntheses.

To test this, I selected a topic at random from PHW’s evidence service: A rapid review of barriers and facilitators to cancer screening uptake (breast, cervical, and bowel) in underserved populations, which had been published on medRxiv.

To explore this further, I adapted the methodology we use in our automated Q&A system, introducing an additional step – citation chasing – into the evidence-gathering process. Starting with 11 highly relevant original articles from the initial search, we used both backward and forward citation chasing to identify an additional 28 studies. With further layers of chasing, we could have found even more, but for this proof-of-concept exercise, the goal was simply to see whether the approach would work.

The combined set of 39 studies was analysed by the LLM, which extracted relevant content and drafted a comprehensive review. In addition to the main review, we asked the system to generate an abstract, a one-page summary for policymakers, and a formal methods section. The final output totalled approximately 10 pages. As a taster, here is the abstract (let me know if you’d like to see the full report:

Cancer screening is a critical tool for early detection and improved health outcomes, yet underserved populations face significant barriers to accessing these life-saving services. This report synthesizes findings from multiple studies to identify key barriers and facilitators to cancer screening uptake for cervical, breast, and colorectal cancers among underserved groups. Common barriers include lack of knowledge, cultural and social norms, financial constraints, psychological fears, and distrust in the healthcare system. Facilitators include healthcare provider recommendations, culturally tailored interventions, alternative screening methods (e.g., self-sampling), and organized programs with support systems. The report highlights the need for multi-level, community-centered interventions that address systemic, cultural, and individual barriers. Policy recommendations include expanding access to affordable screening, increasing culturally competent care, and leveraging community-based outreach to improve screening rates and reduce health disparities.

To evaluate the quality and usefulness of the output, we asked ChatGPT to compare our generated review with the original medRxiv article:

Both the automated report and the medRxiv article explore the same core issue: understanding the barriers and facilitators to cancer screening uptake among underserved populations, specifically in relation to breast, cervical, and colorectal cancers. They arrive at broadly similar conclusions, highlighting key obstacles such as limited knowledge and awareness, cultural stigma, psychological fear, logistical and financial constraints, and widespread distrust in healthcare systems—particularly among marginalised groups. Likewise, both recognise similar facilitators: strong recommendations from healthcare providers, culturally tailored interventions, community-based outreach, the use of peer advocates, and alternative screening methods like self-sampling.

Despite this common ground, the two documents differ significantly in style, methodology, and intended use. The automated report takes a narrative and practice-oriented approach. It uses large language models (LLMs) to guide literature search and synthesis, offering rich thematic analysis with detailed contextual commentary for healthcare professionals. It includes a policy-maker one-pager, targeted insights for frontline practitioners, and practical recommendations rooted in real-world applicability. Its tone is accessible and it’s structured for immediate relevance to policy and service design.

In contrast, the medRxiv article follows a more formal academic structure, presenting a systematic review conducted using PRISMA guidelines. Its distinctive feature is the use of the COM-B model (Capability, Opportunity, Motivation – Behaviour), a behavioural science framework used to categorise and interpret barriers and enablers to screening. This gives it strong utility for designing or evaluating interventions, especially in academic or research settings. It is less practice-facing but more tightly aligned to theoretical models of behaviour change.

In terms of utility, the automated report is particularly valuable for practitioners, community health programme designers, and policy-makers seeking actionable guidance and context-sensitive strategies. It is grounded in practical implementation. The medRxiv article, by contrast, would be most useful for researchers, public health academics, and those developing behaviourally-informed interventions where theoretical coherence is important.

In essence, the two are complementary. One is a tool for action; the other, a tool for theory-driven design. Used together, they offer a robust foundation for improving cancer screening in underserved populations.”

It’s worth noting that this may not be a fully fair comparison, as the medRxiv article follows a formal academic journal format. There may well be a more practitioner-focused, internal version of the review that could have provided a closer parallel to the automated output. It’s also important to emphasise that this was a proof of concept designed to test the feasibility of the approach. With further refinement and more examples, the process and the quality of outputs are likely to improve significantly.

The PHW review involved eight authors and likely required hundreds of hours of work over several months. That effort undoubtedly brought depth, rigour, and valuable expert input—qualities that may exceed the automated version in certain areas.

In contrast, the fully automated approach we tested could produce a review in under ten minutes.

To be clear, I am not suggesting that this type of automation should replace traditional review processes used by organisations like PHW. However, I do see two clear potential use cases:

  • Efficiency and Acceleration – Could automation be used to do the heavy lifting in the early stages of a review, significantly speeding up the process and freeing up expert time for interpretation and refinement?
  • Capacity Gaps – During my time at PHW, the evidence service often had to decline requests due to limited capacity. In such cases, a fully automated review – while not perfect – may be far better than no review at all, or one done hastily under resource constraints.

It’s still early days, but the potential is hugely exciting. The question now is: where could this take us next?

Automated Q&A – more milestones

Two big milestones reached….

Firstly, our system has now been tested on over 50 Q&As – we’re refining the process and the outputs all the time.

Secondly, one question has proven difficult to get right and we now seem to have perfected the system to overcome that difficulty. The question being What is the best initial treatment for mild hypertension in adult afro Caribbean males? And the answer is below.

We hope to have a test version out before the end of the month and then it’s over to a small group of testers to start using it real world conditions – now that really will be interesting!

Best Initial Treatment for Mild Hypertension in Adult Afro-Caribbean Males

For adult Afro-Caribbean males with mild hypertension, the optimal initial treatment involves a combination of lifestyle modifications and pharmacotherapy, guided by robust evidence and clinical guidelines.

Pharmacotherapy:

  1. First-Line Agents:
    • Calcium channel blockers (CCBs), such as amlodipine, are the preferred first-line treatment due to their proven efficacy in reducing blood pressure and cardiovascular events in populations of African descent. CCBs are particularly effective in this group because they counteract salt sensitivity and lower peripheral vascular resistance, which are common pathophysiological features in Afro-Caribbean individuals.
    • Thiazide-like diuretics, such as chlorthalidone, are an alternative first-line option, especially if CCBs are not tolerated. Diuretics are effective in preventing cardiovascular events and are supported by findings from the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT), which demonstrated their superiority over ACE inhibitors in Black populations.
    ACE inhibitors and angiotensin II receptor blockers (ARBs) are less effective as monotherapy in Afro-Caribbean males due to lower renin levels in this population. However, they may be considered in combination with CCBs or diuretics if additional cardiovascular benefits are needed.
  2. Combination Therapy:
    • If monotherapy does not achieve target blood pressure levels, combining a CCB with a thiazide-like diuretic is often effective. This approach addresses multiple physiological pathways involved in hypertension and is recommended by the 2024 European Society of Cardiology (ESC) guidelines.

Lifestyle Modifications:

  • Dietary changes: Reducing sodium intake and adopting the DASH diet (Dietary Approaches to Stop Hypertension) can significantly lower blood pressure.
  • Physical activity: At least 150 minutes of moderate-intensity exercise per week is recommended.
  • Weight management: Achieving and maintaining a healthy BMI is critical.
  • Alcohol moderation: Limiting alcohol consumption to no more than 1-2 drinks per day.

These lifestyle interventions not only enhance the effectiveness of pharmacotherapy but also contribute to overall cardiovascular risk reduction.

Guideline Recommendations:

  • The 2024 ESC guidelines and NICE guidelines (2023) strongly recommend CCBs and thiazide-like diuretics as first-line agents for hypertension management in Black populations, including Afro-Caribbean males. These guidelines emphasize the importance of individualized treatment plans and regular monitoring to achieve optimal blood pressure control.

Conclusion:

The best initial treatment for mild hypertension in adult Afro-Caribbean males involves initiating therapy with a calcium channel blocker or thiazide-like diuretic, alongside lifestyle modifications. Regular follow-up and personalized adjustments are essential to ensure effective blood pressure control and reduce cardiovascular risk.


References:

  1. 2024 ESC Guidelineshttps://academic.oup.com/eurheartj/advance-article/doi/10.1093/eurheartj/ehae178/7741010?login=false
  2. NICE Guidelines (2023)https://www.nice.org.uk/guidance/ng136/resources/hypertension-in-adults-diagnosis-and-management-pdf-66141722710213
  3. ALLHAT Trialhttps://jamanetwork.com/journals/jama/fullarticle/195626

New mobile interface

A slightly mis-leading title as it’s actually a new website design that affects all versions of the site – mobile, tablet and PC (it’s a responsive website design). However, the main motivation was to improve the mobile interface that was far from ideal (an understatement). Here’s an example of the old mobile interface (left) and the new (right):

It’s neater, the search box is a decent size and not being cramped by the search button!

Lots of other subtle changes and some big ones. For instance, on the old version if you tried to filter the results (by, say, guidelines) you could end up with the screen looking like this:

Leaving the user having to swipe to see the results. Now it works really well and behaves as most would expect.

This is not a finished design as there are more change to be made, but it is a great start. I hope you enjoy!

The strengths and weaknesses of scores

We have recently released our systematic review score, we’ve had our clinical guideline score for ages and we’re exploring an RCT score.

In the spirit of transparency, wanting to encourage discussion and ultimately improve things I have posted an email I sent to the Evidence-Based Health discussion list. Hopefully it’s self-explanatory!

Hi All,

I recently asked for comments on a possible scoring system for RCTs (following on from our recent systematic review scoring system) and I ended up having a really interesting discussion with someone not on this mail-list (it was forwarded to him by someone on this list).  It was quite long and so I asked ChatGPT to summarise the respective positions (that’s point 1 below). I shared the summary with the respondent and his comments are point 2. Finally, point 3 is a possible enhancement on the scoring system, asking ChatGPT to critique a study (there are two examples).

To reiterate, my motivation for introducing a scoring system is to try to ensure users (of the Trip Database) do not see a systematic review or RCT and assume – given their privileged position in EBM – to be ‘good’ evidence. To me, the scores are an attempt to induce scepticism (although I do think the ChatGPT critiques are at a different level).  I really would be interested in what the collective wisdom of the list makes of this and possible ways I could enhance the way these scores are framed to ensure they have maximal effect.

Thanks in advance.

jon

1) LLM Summary of debate

Main Points of Concern:

  1. Oversimplification of Complex Issues:
    • Reducing complex study design elements (e.g., blinding, sample size) to numeric scores risks missing important nuances. Some study flaws may not be easily quantifiable, leading to a misleading or oversimplified evaluation.
  2. Subjectivity in Scoring:
    • Numeric scores are often subjective and can be arbitrary (e.g., how to weight factors like sample size or blinding). This subjectivity risks distorting the assessment of study quality.
  3. Undermining Critical Thinking:
    • A scoring system could discourage users from critically engaging with the study. It might promote the idea that a score is enough, instead of encouraging users to examine the study’s methodology in depth.
  4. LLMs and Critical Appraisal:
    • While LLMs can flag potential issues, they don’t provide a nuanced, complete critique of a study, and may not capture critical flaws or explain them in a meaningful way.

Counter Arguments (Your Position):

  1. Better Than Doing Nothing:
    • A scoring system offers a framework for users to begin evaluating study quality, providing a starting point, especially for those without advanced knowledge in study design. It can guide further investigation rather than replace critical thinking.
  2. Encouraging Engagement:
    • Even if simplified, the score encourages users to think about study quality. It can prompt users to dive deeper into flagged issues and learn more about study design, bias, and other important factors.
  3. Making Information Accessible:
    • A scoring system simplifies complex information and makes it more digestible, particularly for non-experts. It helps make the evaluation of studies more accessible without overwhelming users with technical details.
  4. Practical Decision-Making Tool:
    • In practical settings, especially in healthcare, having a quick, accessible tool for assessing study quality is better than having no tool at all. It saves time and allows for quicker decision-making when reviewing large volumes of studies or systematic reviews.

2) Commentators Reflection on the summary

The summary you sent is not bad … but it feels to me like it suffers in much the same way that the scoring system would suffer — it’s a little too superficial, and doesn’t explain the criticisms it makes.  (It mentions the word “nuanced” several times … but it itself lacks depth, and thus also nuance.)  Still, I’m not willing to write a long essay about this … which is what would really be needed. 

3) LLM Supported Critical Appraisal

Example OneTreating asthma patients with probiotics: a systematic review and meta-analysis

Summary (which we show when they make the initial call)

This meta-analysis of 10 randomized controlled trials involving 1,101 asthma patients found that probiotics significantly improved lung inflammation (FeNO), symptom severity, CACT scores, and reduced acute asthma episodes. However, no significant improvements were found in lung function (FEV1 and FEV1/FVC). The study suggests that probiotics may be useful as an adjunct treatment for asthma, especially for reducing symptoms and inflammation, but not for improving lung function. Further research is needed to explore the long-term safety and effectiveness of probiotics in asthma.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Meta-Analysis of Randomized Controlled Trials (RCTs): The study is a meta-analysis, which combines data from multiple RCTs to provide a more comprehensive understanding of the effect of probiotics on asthma. RCTs are considered the gold standard in clinical research due to their ability to minimize bias.

Sample Size and Power:

  • Total Sample Size: The meta-analysis includes 1,101 participants across 10 RCTs, which provides a reasonable sample size to detect significant effects and increase the generalizability of the findings.
  • Power: The sample size from the combined trials appears large enough to detect meaningful differences in asthma-related outcomes.

Outcome Measures:

  • Primary Outcomes: The meta-analysis assessed several outcomes related to asthma treatment:
    • Fractional exhaled nitric oxide (FeNO): Used as a marker of airway inflammation. The probiotics group showed a significant reduction (MD = -7.17, 95% CI: -12.81, -1.54).
    • Asthma symptom severity: The probiotics group showed improvement (MD = -0.07, 95% CI: -0.10, -0.04).
    • Childhood Asthma Control Test (CACT): The probiotics group had a significant improvement (MD = 2.26, 95% CI: 1.14, 3.39).
    • Number of acute episodes of asthma: The probiotics group showed a reduced rate of acute episodes (OR = 0.30, 95% CI: 0.19, 0.47).
  • Secondary Outcomes:
    • Forced expiratory volume in the first second (FEV1) and FEV1/FVC (%): No significant difference between probiotics and control groups (MD = 0.11, 95% CI: -0.05, 0.26 for FEV1 and MD = 0.32, 95% CI: -1.48, 2.12 for FEV1/FVC).

Statistical Methods:

  • Meta-analysis: The meta-analysis used Revman 5.4 software and combined the effect sizes from the individual studies using odds ratio (OR) and mean difference (MD), with 95% confidence intervals (CI). These are appropriate statistical methods for synthesizing data across multiple studies.
  • Significance: The significant improvements in FeNOsymptom severityCACT, and acute episodes all had narrow confidence intervals, indicating the robustness of the findings. However, no significant changes were found in lung function (FEV1 and FEV1/FVC), suggesting that probiotics may not impact lung function directly.

Bias and Validity:

  • Inclusion Criteria: The meta-analysis included only randomized controlled trials (RCTs), which reduces the risk of bias. However, it is important to consider whether all included RCTs had proper blinding and allocation concealment.
  • Publication Bias: The abstract does not mention whether a funnel plot or other method was used to assess publication bias, which could affect the validity of the findings.
  • Heterogeneity: The abstract does not mention whether there was significant heterogeneity between the studies, which would be important to assess the consistency of the results across the included trials.

Safety and Adverse Events:

  • Adverse Events: The abstract does not mention any adverse events associated with probiotics. While probiotics are generally considered safe, any potential side effects or concerns about long-term use should be addressed in the full study.

Conflicts of Interest:

  • Not Reported: The abstract does not provide information on potential conflicts of interest or funding sources, which would help assess the potential for bias in the conduct or reporting of the meta-analysis.

Conclusions and Applicability:

  • Conclusions: The meta-analysis suggests that probiotics can improve lung inflammation, reduce asthma symptoms, and decrease the frequency of asthma attacks in patients with asthma. However, probiotics did not affect lung function (FEV1 and FEV1/FVC), which indicates that their effects may be limited to reducing inflammation and symptoms rather than improving airflow.
  • Clinical Applicability: The findings support the use of probiotics as an adjunctive treatment for asthma, particularly for improving symptoms and reducing acute episodes. However, the lack of effect on lung function means that probiotics should not be considered a replacement for conventional asthma treatments, such as inhalers or corticosteroids.

Example TwoBenralizumab improves symptoms of patients with severe, eosinophilic asthma with a diagnosis of nasal polyposis

Summary (which we show when they make the initial call)

This randomized controlled trial of benralizumab in patients with severe, eosinophilic asthma and nasal polyposis showed significant improvements in SNOT-22 scores, asthma outcomes, and reduced exacerbations compared to placebo. While adverse events were similar between groups, no serious safety concerns were raised. However, the analysis was post-hoc, and nominal p-values were used, which limits the strength of the conclusions. The study’s small sample size in the NP substudy and the lack of information on conflicts of interest or blinding are additional considerations. The findings suggest that benralizumab may be a promising treatment for this patient population, but further research is needed to confirm these results.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Randomized Controlled Trial (RCT): This design helps ensure that the results are due to the intervention (benralizumab) and not other factors. The random allocation of participants helps minimize selection bias.
  • Post-hoc Analysis: This is a secondary analysis of a previously conducted trial (ANDHI), which may introduce bias due to non-randomization for the specific analysis and can limit the strength of conclusions.

Sample Size and Power:

  • Sample Size: Of the 656 total participants in the ANDHI trial, 153 patients participated in the NP substudy (96 in the benralizumab group and 57 in the placebo group). The sample size for the NP substudy is relatively small, particularly in the placebo group, which may limit statistical power and generalizability of results.
  • Power Consideration: The study does not explicitly mention a sample size calculation for the NP substudy, which makes it difficult to assess if the study is adequately powered to detect meaningful differences.

Outcome Measures:

  • Primary Outcome: Improvement in Sino-Nasal Outcome Test-22 (SNOT-22) scores, with an improvement of at least 8.9 defined as clinically significant.
  • Secondary Outcomes: Several asthma efficacy measures, including:
    • Annualized asthma exacerbation rate (AER)
    • St. George’s Respiratory Questionnaire (SGRQ) total score
    • Forced expiratory volume in one second (FEV1)
    • Asthma Control Questionnaire-6 (ACQ-6)
  • Results: Benralizumab was associated with significant improvement in all measured outcomes (SNOT-22, AER, FEV1, SGRQ, ACQ-6).

Statistical Methods:

  • P-values: Nominal p-values were reported for primary and secondary outcomes, and all findings related to benralizumab treatment showed statistical significance, including p-values of 0.0176 for SNOT-22 and <0.0001 for AER, FEV1, and SGRQ.
  • Effect Size: The study reports percentages of responders and magnitude of change in the outcomes (e.g., 69% reduction in AER, +0.32 L change in FEV1).
  • Nominal p-values: This is a limitation because without proper adjustment for multiple comparisons, the p-values could be inflated, and the findings might not be robust.

Bias and Validity:

  • Selection Bias: Randomization helps reduce selection bias, but the post-hoc nature of the analysis could lead to selective reporting and influence the findings.
  • Blinding: The abstract does not mention blinding, which could potentially lead to performance or detection bias if the outcomes are measured subjectively.
  • Generalizability: The study population had severe, eosinophilic asthma and NP, and findings may not generalize to other asthma subtypes or patients without NP.

Safety and Adverse Events:

  • Adverse Events: The frequency of adverse events (AEs) was similar between the benralizumab (76%) and placebo (73.7%) groups, indicating no significant safety concerns between groups.
    • Most common AEs reported in benralizumab group: headache, sinusitis, pyrexia, and influenza.
  • No serious safety issues were identified, but the study does not report any severe or life-threatening adverse events, which would be important for understanding the full safety profile.

Conflicts of Interest:

  • Not Reported: The abstract does not mention any potential conflicts of interest or funding sources. This information would help determine if there is any financial interest influencing the results.

Conclusions and Applicability:

  • Effectiveness: Benralizumab showed clinically meaningful improvements in both nasal polyposis (SNOT-22) and asthma outcomes (AER, FEV1, SGRQ, ACQ-6).
  • Clinical Applicability: These results suggest that benralizumab could be an effective treatment for patients with severe, eosinophilic asthma and nasal polyposis. However, the findings are based on a substudy of a larger trial and require confirmation in larger, more diverse populations before broad clinical implementation.
  • Future Research: The conclusion indicates that further studies are needed to confirm these findings and establish broader applicability

RCT score

Now our systematic review score has been released we’re exploring trying a similar approach to RCTs. The rationale for the RCT score is the same as the systematic review score – to encourage healthy scepticism among our users!

We are generating a list of potential criteria to score the RCT (it’ll need to be in the abstract or the metadata) and here is the initial list:

  • Age
  • Size
  • Trial registry
  • Funding
  • Trial phrase
  • Intention to treat
  • Absolute versus relative effects/risks
  • Blinding
  • Citations
  • Conflict of interest

What have we missed?

Another possibility is to use the above score as a simple guide (for all RCTs) and then allow users to ask for more in-depth analysis (using LLMs such as ChatGPT). Here’s an example of what information the LLM could extract, based on two different trials:

Example OneTreating asthma patients with probiotics: a systematic review and meta-analysis

Summary (which we show when they make the initial call)

This meta-analysis of 10 randomized controlled trials involving 1,101 asthma patients found that probiotics significantly improved lung inflammation (FeNO), symptom severity, CACT scores, and reduced acute asthma episodes. However, no significant improvements were found in lung function (FEV1 and FEV1/FVC). The study suggests that probiotics may be useful as an adjunct treatment for asthma, especially for reducing symptoms and inflammation, but not for improving lung function. Further research is needed to explore the long-term safety and effectiveness of probiotics in asthma.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Meta-Analysis of Randomized Controlled Trials (RCTs): The study is a meta-analysis, which combines data from multiple RCTs to provide a more comprehensive understanding of the effect of probiotics on asthma. RCTs are considered the gold standard in clinical research due to their ability to minimize bias.

Sample Size and Power:

  • Total Sample Size: The meta-analysis includes 1,101 participants across 10 RCTs, which provides a reasonable sample size to detect significant effects and increase the generalizability of the findings.
  • Power: The sample size from the combined trials appears large enough to detect meaningful differences in asthma-related outcomes.

Outcome Measures:

  • Primary Outcomes: The meta-analysis assessed several outcomes related to asthma treatment:
    • Fractional exhaled nitric oxide (FeNO): Used as a marker of airway inflammation. The probiotics group showed a significant reduction (MD = -7.17, 95% CI: -12.81, -1.54).
    • Asthma symptom severity: The probiotics group showed improvement (MD = -0.07, 95% CI: -0.10, -0.04).
    • Childhood Asthma Control Test (CACT): The probiotics group had a significant improvement (MD = 2.26, 95% CI: 1.14, 3.39).
    • Number of acute episodes of asthma: The probiotics group showed a reduced rate of acute episodes (OR = 0.30, 95% CI: 0.19, 0.47).
  • Secondary Outcomes:
    • Forced expiratory volume in the first second (FEV1) and FEV1/FVC (%): No significant difference between probiotics and control groups (MD = 0.11, 95% CI: -0.05, 0.26 for FEV1 and MD = 0.32, 95% CI: -1.48, 2.12 for FEV1/FVC).

Statistical Methods:

  • Meta-analysis: The meta-analysis used Revman 5.4 software and combined the effect sizes from the individual studies using odds ratio (OR) and mean difference (MD), with 95% confidence intervals (CI). These are appropriate statistical methods for synthesizing data across multiple studies.
  • Significance: The significant improvements in FeNOsymptom severityCACT, and acute episodes all had narrow confidence intervals, indicating the robustness of the findings. However, no significant changes were found in lung function (FEV1 and FEV1/FVC), suggesting that probiotics may not impact lung function directly.

Bias and Validity:

  • Inclusion Criteria: The meta-analysis included only randomized controlled trials (RCTs), which reduces the risk of bias. However, it is important to consider whether all included RCTs had proper blinding and allocation concealment.
  • Publication Bias: The abstract does not mention whether a funnel plot or other method was used to assess publication bias, which could affect the validity of the findings.
  • Heterogeneity: The abstract does not mention whether there was significant heterogeneity between the studies, which would be important to assess the consistency of the results across the included trials.

Safety and Adverse Events:

  • Adverse Events: The abstract does not mention any adverse events associated with probiotics. While probiotics are generally considered safe, any potential side effects or concerns about long-term use should be addressed in the full study.

Conflicts of Interest:

  • Not Reported: The abstract does not provide information on potential conflicts of interest or funding sources, which would help assess the potential for bias in the conduct or reporting of the meta-analysis.

Conclusions and Applicability:

  • Conclusions: The meta-analysis suggests that probiotics can improve lung inflammation, reduce asthma symptoms, and decrease the frequency of asthma attacks in patients with asthma. However, probiotics did not affect lung function (FEV1 and FEV1/FVC), which indicates that their effects may be limited to reducing inflammation and symptoms rather than improving airflow.
  • Clinical Applicability: The findings support the use of probiotics as an adjunctive treatment for asthma, particularly for improving symptoms and reducing acute episodes. However, the lack of effect on lung function means that probiotics should not be considered a replacement for conventional asthma treatments, such as inhalers or corticosteroids.

Example TwoBenralizumab improves symptoms of patients with severe, eosinophilic asthma with a diagnosis of nasal polyposis

Summary (which we show when they make the initial call)

This randomized controlled trial of benralizumab in patients with severe, eosinophilic asthma and nasal polyposis showed significant improvements in SNOT-22 scores, asthma outcomes, and reduced exacerbations compared to placebo. While adverse events were similar between groups, no serious safety concerns were raised. However, the analysis was post-hoc, and nominal p-values were used, which limits the strength of the conclusions. The study’s small sample size in the NP substudy and the lack of information on conflicts of interest or blinding are additional considerations. The findings suggest that benralizumab may be a promising treatment for this patient population, but further research is needed to confirm these results.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Randomized Controlled Trial (RCT): This design helps ensure that the results are due to the intervention (benralizumab) and not other factors. The random allocation of participants helps minimize selection bias.
  • Post-hoc Analysis: This is a secondary analysis of a previously conducted trial (ANDHI), which may introduce bias due to non-randomization for the specific analysis and can limit the strength of conclusions.

Sample Size and Power:

  • Sample Size: Of the 656 total participants in the ANDHI trial, 153 patients participated in the NP substudy (96 in the benralizumab group and 57 in the placebo group). The sample size for the NP substudy is relatively small, particularly in the placebo group, which may limit statistical power and generalizability of results.
  • Power Consideration: The study does not explicitly mention a sample size calculation for the NP substudy, which makes it difficult to assess if the study is adequately powered to detect meaningful differences.

Outcome Measures:

  • Primary Outcome: Improvement in Sino-Nasal Outcome Test-22 (SNOT-22) scores, with an improvement of at least 8.9 defined as clinically significant.
  • Secondary Outcomes: Several asthma efficacy measures, including:
    • Annualized asthma exacerbation rate (AER)
    • St. George’s Respiratory Questionnaire (SGRQ) total score
    • Forced expiratory volume in one second (FEV1)
    • Asthma Control Questionnaire-6 (ACQ-6)
  • Results: Benralizumab was associated with significant improvement in all measured outcomes (SNOT-22, AER, FEV1, SGRQ, ACQ-6).

Statistical Methods:

  • P-values: Nominal p-values were reported for primary and secondary outcomes, and all findings related to benralizumab treatment showed statistical significance, including p-values of 0.0176 for SNOT-22 and <0.0001 for AER, FEV1, and SGRQ.
  • Effect Size: The study reports percentages of responders and magnitude of change in the outcomes (e.g., 69% reduction in AER, +0.32 L change in FEV1).
  • Nominal p-values: This is a limitation because without proper adjustment for multiple comparisons, the p-values could be inflated, and the findings might not be robust.

Bias and Validity:

  • Selection Bias: Randomization helps reduce selection bias, but the post-hoc nature of the analysis could lead to selective reporting and influence the findings.
  • Blinding: The abstract does not mention blinding, which could potentially lead to performance or detection bias if the outcomes are measured subjectively.
  • Generalizability: The study population had severe, eosinophilic asthma and NP, and findings may not generalize to other asthma subtypes or patients without NP.

Safety and Adverse Events:

  • Adverse Events: The frequency of adverse events (AEs) was similar between the benralizumab (76%) and placebo (73.7%) groups, indicating no significant safety concerns between groups.
    • Most common AEs reported in benralizumab group: headache, sinusitis, pyrexia, and influenza.
  • No serious safety issues were identified, but the study does not report any severe or life-threatening adverse events, which would be important for understanding the full safety profile.

Conflicts of Interest:

  • Not Reported: The abstract does not mention any potential conflicts of interest or funding sources. This information would help determine if there is any financial interest influencing the results.

Conclusions and Applicability:

  • Effectiveness: Benralizumab showed clinically meaningful improvements in both nasal polyposis (SNOT-22) and asthma outcomes (AER, FEV1, SGRQ, ACQ-6).
  • Clinical Applicability: These results suggest that benralizumab could be an effective treatment for patients with severe, eosinophilic asthma and nasal polyposis. However, the findings are based on a substudy of a larger trial and require confirmation in larger, more diverse populations before broad clinical implementation.
  • Future Research: The conclusion indicates that further studies are needed to confirm these findings and establish broader applicability

To me, this appears massively helpful. It may be just as a way to help people structure their critical thinking around a particular article.

As ever let me know what you think!

Automating the update of old clinical Q&As

Trip was started due to our work with clinical Q&A and started with the ATTRACT project. Since then Trip has worked on multiple separate Q&A projects and those teams have answered over 10,000 clinical questions. Our experience is unrivalled!

Within Trip we link to thousands of clinical Q&As but the vast majority are over ten years old so likely to be out of date. However, these are real clinical questions, most are likely to be relevant today and could be a valuable resource.

So, in testing our automated Q&A system, we thought we would update a random clinical question: A 43 year old lady wanting to conceive, is currently taking amlodipine and losartan for hypertension. Is this safe if she was to get pregnant whilst taking them and what is the alternative ‘safe’ antihypertensive for pregnancy? Clicking on the document will let you see the original answer from 2013.

I’m not sure how long the original answer would have taken, I would estimate 3-4 hours. Well, our updated answer took 90 seconds and – as we operationalise things – could take 10-15 seconds.

Oh yes, the new answer:

Evidence from extracted articles:

  1. Losartan (ARB):
    • The Hypertension in Pregnancy Guideline (2023) states that angiotensin receptor blockers (ARBs), such as losartan, are contraindicated in pregnancy due to the risk of fetal malformations, including renal dysfunction, oligohydramnios, and skull hypoplasia.
    • The Management of Postpartum Hypertensive Disorders of Pregnancy document reinforces this, confirming that losartan should be discontinued before conception or as soon as pregnancy is confirmed.
    • The FDA and European Medicines Agency (EMA) classify losartan as pregnancy category D, meaning there is evidence of fetal risk, and it should be avoided during pregnancy.
    • Reference: FDA Pregnancy Categories (Note: The FDA has moved away from letter categories, but losartan remains contraindicated in pregnancy.)
  2. Amlodipine (Calcium Channel Blocker):
    • The Pharmacokinetics of Common Antihypertensive Drugs in Pregnancy (2022) review notes that calcium channel blockers, including amlodipine, require further research to establish their safety profile in pregnancy.
    • systematic review (2024) comparing amlodipine and nifedipine suggests that nifedipine is the preferred option due to more established safety data.
    • While amlodipine is not explicitly contraindicated, nifedipine (another calcium channel blocker) has a stronger safety profile for pregnancy-related hypertension and is often preferred.
    • Reference: Pharmacokinetics of Antihypertensive Drugs in Pregnancy (Note: Specific link to the 2022 review is not provided, but this is a general reference for further reading.)

What are the alternative ‘safe’ antihypertensive options for pregnancy?

Evidence from extracted articles:

  1. The Hypertension in Pregnancy Guideline (2023) recommends the following as first-line treatments for hypertension in pregnancy:
    • Labetalol (a beta-blocker)
    • Nifedipine (extended-release, a calcium channel blocker)
    • Methyldopa (a centrally acting alpha-agonist)
  2. The systematic review on pharmacokinetics of antihypertensive drugs (2022) confirms that methyldopalabetalol, and nifedipine are generally safe options for pregnancy.

Additional knowledge from clinical guidelines (outside extracted articles):

  • Labetalol is often the first-line choice due to its efficacy, safety, and minimal fetal side effects.
  • Nifedipine is used when labetalol is not suitable or contraindicated.
  • Methyldopa, though safe, is less commonly used due to side effects such as fatigue, depression, and drowsiness.

References:


Final Clinical Recommendations:

  1. Losartan should be discontinued before conception or immediately upon confirming pregnancy due to the risk of fetal harm.
  2. Amlodipine is not absolutely contraindicated, but nifedipine is preferred as a safer calcium channel blocker during pregnancy.
  3. First-line alternatives for managing hypertension in pregnancy include:
    • Labetalol (most commonly recommended)
    • Nifedipine (extended-release)
    • Methyldopa (less preferred due to side effects)

Additional Notes:

  • If the patient is planning to conceive, it is crucial to switch from losartan to a safer antihypertensive before pregnancy.
  • Regular monitoring of blood pressure and fetal growth is essential during pregnancy.
  • Always consult a healthcare provider for personalized advice and management.

Blog at WordPress.com.

Up ↑