Search

Trip Database Blog

Liberating the literature

Author

jrbtrip

Automated rapid reviews

Some years ago, I worked at Public Health Wales (PHW), and I’ve remained connected to the organisation and interested in the field of public health. Recognising that the evidence needs in public health often differ from those in mainstream clinical medicine, I was curious to explore how automation—particularly large language models (LLMs)—might support the production of public health evidence syntheses.

To test this, I selected a topic at random from PHW’s evidence service: A rapid review of barriers and facilitators to cancer screening uptake (breast, cervical, and bowel) in underserved populations, which had been published on medRxiv.

To explore this further, I adapted the methodology we use in our automated Q&A system, introducing an additional step – citation chasing – into the evidence-gathering process. Starting with 11 highly relevant original articles from the initial search, we used both backward and forward citation chasing to identify an additional 28 studies. With further layers of chasing, we could have found even more, but for this proof-of-concept exercise, the goal was simply to see whether the approach would work.

The combined set of 39 studies was analysed by the LLM, which extracted relevant content and drafted a comprehensive review. In addition to the main review, we asked the system to generate an abstract, a one-page summary for policymakers, and a formal methods section. The final output totalled approximately 10 pages. As a taster, here is the abstract (let me know if you’d like to see the full report:

Cancer screening is a critical tool for early detection and improved health outcomes, yet underserved populations face significant barriers to accessing these life-saving services. This report synthesizes findings from multiple studies to identify key barriers and facilitators to cancer screening uptake for cervical, breast, and colorectal cancers among underserved groups. Common barriers include lack of knowledge, cultural and social norms, financial constraints, psychological fears, and distrust in the healthcare system. Facilitators include healthcare provider recommendations, culturally tailored interventions, alternative screening methods (e.g., self-sampling), and organized programs with support systems. The report highlights the need for multi-level, community-centered interventions that address systemic, cultural, and individual barriers. Policy recommendations include expanding access to affordable screening, increasing culturally competent care, and leveraging community-based outreach to improve screening rates and reduce health disparities.

To evaluate the quality and usefulness of the output, we asked ChatGPT to compare our generated review with the original medRxiv article:

Both the automated report and the medRxiv article explore the same core issue: understanding the barriers and facilitators to cancer screening uptake among underserved populations, specifically in relation to breast, cervical, and colorectal cancers. They arrive at broadly similar conclusions, highlighting key obstacles such as limited knowledge and awareness, cultural stigma, psychological fear, logistical and financial constraints, and widespread distrust in healthcare systems—particularly among marginalised groups. Likewise, both recognise similar facilitators: strong recommendations from healthcare providers, culturally tailored interventions, community-based outreach, the use of peer advocates, and alternative screening methods like self-sampling.

Despite this common ground, the two documents differ significantly in style, methodology, and intended use. The automated report takes a narrative and practice-oriented approach. It uses large language models (LLMs) to guide literature search and synthesis, offering rich thematic analysis with detailed contextual commentary for healthcare professionals. It includes a policy-maker one-pager, targeted insights for frontline practitioners, and practical recommendations rooted in real-world applicability. Its tone is accessible and it’s structured for immediate relevance to policy and service design.

In contrast, the medRxiv article follows a more formal academic structure, presenting a systematic review conducted using PRISMA guidelines. Its distinctive feature is the use of the COM-B model (Capability, Opportunity, Motivation – Behaviour), a behavioural science framework used to categorise and interpret barriers and enablers to screening. This gives it strong utility for designing or evaluating interventions, especially in academic or research settings. It is less practice-facing but more tightly aligned to theoretical models of behaviour change.

In terms of utility, the automated report is particularly valuable for practitioners, community health programme designers, and policy-makers seeking actionable guidance and context-sensitive strategies. It is grounded in practical implementation. The medRxiv article, by contrast, would be most useful for researchers, public health academics, and those developing behaviourally-informed interventions where theoretical coherence is important.

In essence, the two are complementary. One is a tool for action; the other, a tool for theory-driven design. Used together, they offer a robust foundation for improving cancer screening in underserved populations.”

It’s worth noting that this may not be a fully fair comparison, as the medRxiv article follows a formal academic journal format. There may well be a more practitioner-focused, internal version of the review that could have provided a closer parallel to the automated output. It’s also important to emphasise that this was a proof of concept designed to test the feasibility of the approach. With further refinement and more examples, the process and the quality of outputs are likely to improve significantly.

The PHW review involved eight authors and likely required hundreds of hours of work over several months. That effort undoubtedly brought depth, rigour, and valuable expert input—qualities that may exceed the automated version in certain areas.

In contrast, the fully automated approach we tested could produce a review in under ten minutes.

To be clear, I am not suggesting that this type of automation should replace traditional review processes used by organisations like PHW. However, I do see two clear potential use cases:

  • Efficiency and Acceleration – Could automation be used to do the heavy lifting in the early stages of a review, significantly speeding up the process and freeing up expert time for interpretation and refinement?
  • Capacity Gaps – During my time at PHW, the evidence service often had to decline requests due to limited capacity. In such cases, a fully automated review – while not perfect – may be far better than no review at all, or one done hastily under resource constraints.

It’s still early days, but the potential is hugely exciting. The question now is: where could this take us next?

Automated Q&A – more milestones

Two big milestones reached….

Firstly, our system has now been tested on over 50 Q&As – we’re refining the process and the outputs all the time.

Secondly, one question has proven difficult to get right and we now seem to have perfected the system to overcome that difficulty. The question being What is the best initial treatment for mild hypertension in adult afro Caribbean males? And the answer is below.

We hope to have a test version out before the end of the month and then it’s over to a small group of testers to start using it real world conditions – now that really will be interesting!

Best Initial Treatment for Mild Hypertension in Adult Afro-Caribbean Males

For adult Afro-Caribbean males with mild hypertension, the optimal initial treatment involves a combination of lifestyle modifications and pharmacotherapy, guided by robust evidence and clinical guidelines.

Pharmacotherapy:

  1. First-Line Agents:
    • Calcium channel blockers (CCBs), such as amlodipine, are the preferred first-line treatment due to their proven efficacy in reducing blood pressure and cardiovascular events in populations of African descent. CCBs are particularly effective in this group because they counteract salt sensitivity and lower peripheral vascular resistance, which are common pathophysiological features in Afro-Caribbean individuals.
    • Thiazide-like diuretics, such as chlorthalidone, are an alternative first-line option, especially if CCBs are not tolerated. Diuretics are effective in preventing cardiovascular events and are supported by findings from the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT), which demonstrated their superiority over ACE inhibitors in Black populations.
    ACE inhibitors and angiotensin II receptor blockers (ARBs) are less effective as monotherapy in Afro-Caribbean males due to lower renin levels in this population. However, they may be considered in combination with CCBs or diuretics if additional cardiovascular benefits are needed.
  2. Combination Therapy:
    • If monotherapy does not achieve target blood pressure levels, combining a CCB with a thiazide-like diuretic is often effective. This approach addresses multiple physiological pathways involved in hypertension and is recommended by the 2024 European Society of Cardiology (ESC) guidelines.

Lifestyle Modifications:

  • Dietary changes: Reducing sodium intake and adopting the DASH diet (Dietary Approaches to Stop Hypertension) can significantly lower blood pressure.
  • Physical activity: At least 150 minutes of moderate-intensity exercise per week is recommended.
  • Weight management: Achieving and maintaining a healthy BMI is critical.
  • Alcohol moderation: Limiting alcohol consumption to no more than 1-2 drinks per day.

These lifestyle interventions not only enhance the effectiveness of pharmacotherapy but also contribute to overall cardiovascular risk reduction.

Guideline Recommendations:

  • The 2024 ESC guidelines and NICE guidelines (2023) strongly recommend CCBs and thiazide-like diuretics as first-line agents for hypertension management in Black populations, including Afro-Caribbean males. These guidelines emphasize the importance of individualized treatment plans and regular monitoring to achieve optimal blood pressure control.

Conclusion:

The best initial treatment for mild hypertension in adult Afro-Caribbean males involves initiating therapy with a calcium channel blocker or thiazide-like diuretic, alongside lifestyle modifications. Regular follow-up and personalized adjustments are essential to ensure effective blood pressure control and reduce cardiovascular risk.


References:

  1. 2024 ESC Guidelineshttps://academic.oup.com/eurheartj/advance-article/doi/10.1093/eurheartj/ehae178/7741010?login=false
  2. NICE Guidelines (2023)https://www.nice.org.uk/guidance/ng136/resources/hypertension-in-adults-diagnosis-and-management-pdf-66141722710213
  3. ALLHAT Trialhttps://jamanetwork.com/journals/jama/fullarticle/195626

New mobile interface

A slightly mis-leading title as it’s actually a new website design that affects all versions of the site – mobile, tablet and PC (it’s a responsive website design). However, the main motivation was to improve the mobile interface that was far from ideal (an understatement). Here’s an example of the old mobile interface (left) and the new (right):

It’s neater, the search box is a decent size and not being cramped by the search button!

Lots of other subtle changes and some big ones. For instance, on the old version if you tried to filter the results (by, say, guidelines) you could end up with the screen looking like this:

Leaving the user having to swipe to see the results. Now it works really well and behaves as most would expect.

This is not a finished design as there are more change to be made, but it is a great start. I hope you enjoy!

The strengths and weaknesses of scores

We have recently released our systematic review score, we’ve had our clinical guideline score for ages and we’re exploring an RCT score.

In the spirit of transparency, wanting to encourage discussion and ultimately improve things I have posted an email I sent to the Evidence-Based Health discussion list. Hopefully it’s self-explanatory!

Hi All,

I recently asked for comments on a possible scoring system for RCTs (following on from our recent systematic review scoring system) and I ended up having a really interesting discussion with someone not on this mail-list (it was forwarded to him by someone on this list).  It was quite long and so I asked ChatGPT to summarise the respective positions (that’s point 1 below). I shared the summary with the respondent and his comments are point 2. Finally, point 3 is a possible enhancement on the scoring system, asking ChatGPT to critique a study (there are two examples).

To reiterate, my motivation for introducing a scoring system is to try to ensure users (of the Trip Database) do not see a systematic review or RCT and assume – given their privileged position in EBM – to be ‘good’ evidence. To me, the scores are an attempt to induce scepticism (although I do think the ChatGPT critiques are at a different level).  I really would be interested in what the collective wisdom of the list makes of this and possible ways I could enhance the way these scores are framed to ensure they have maximal effect.

Thanks in advance.

jon

1) LLM Summary of debate

Main Points of Concern:

  1. Oversimplification of Complex Issues:
    • Reducing complex study design elements (e.g., blinding, sample size) to numeric scores risks missing important nuances. Some study flaws may not be easily quantifiable, leading to a misleading or oversimplified evaluation.
  2. Subjectivity in Scoring:
    • Numeric scores are often subjective and can be arbitrary (e.g., how to weight factors like sample size or blinding). This subjectivity risks distorting the assessment of study quality.
  3. Undermining Critical Thinking:
    • A scoring system could discourage users from critically engaging with the study. It might promote the idea that a score is enough, instead of encouraging users to examine the study’s methodology in depth.
  4. LLMs and Critical Appraisal:
    • While LLMs can flag potential issues, they don’t provide a nuanced, complete critique of a study, and may not capture critical flaws or explain them in a meaningful way.

Counter Arguments (Your Position):

  1. Better Than Doing Nothing:
    • A scoring system offers a framework for users to begin evaluating study quality, providing a starting point, especially for those without advanced knowledge in study design. It can guide further investigation rather than replace critical thinking.
  2. Encouraging Engagement:
    • Even if simplified, the score encourages users to think about study quality. It can prompt users to dive deeper into flagged issues and learn more about study design, bias, and other important factors.
  3. Making Information Accessible:
    • A scoring system simplifies complex information and makes it more digestible, particularly for non-experts. It helps make the evaluation of studies more accessible without overwhelming users with technical details.
  4. Practical Decision-Making Tool:
    • In practical settings, especially in healthcare, having a quick, accessible tool for assessing study quality is better than having no tool at all. It saves time and allows for quicker decision-making when reviewing large volumes of studies or systematic reviews.

2) Commentators Reflection on the summary

The summary you sent is not bad … but it feels to me like it suffers in much the same way that the scoring system would suffer — it’s a little too superficial, and doesn’t explain the criticisms it makes.  (It mentions the word “nuanced” several times … but it itself lacks depth, and thus also nuance.)  Still, I’m not willing to write a long essay about this … which is what would really be needed. 

3) LLM Supported Critical Appraisal

Example OneTreating asthma patients with probiotics: a systematic review and meta-analysis

Summary (which we show when they make the initial call)

This meta-analysis of 10 randomized controlled trials involving 1,101 asthma patients found that probiotics significantly improved lung inflammation (FeNO), symptom severity, CACT scores, and reduced acute asthma episodes. However, no significant improvements were found in lung function (FEV1 and FEV1/FVC). The study suggests that probiotics may be useful as an adjunct treatment for asthma, especially for reducing symptoms and inflammation, but not for improving lung function. Further research is needed to explore the long-term safety and effectiveness of probiotics in asthma.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Meta-Analysis of Randomized Controlled Trials (RCTs): The study is a meta-analysis, which combines data from multiple RCTs to provide a more comprehensive understanding of the effect of probiotics on asthma. RCTs are considered the gold standard in clinical research due to their ability to minimize bias.

Sample Size and Power:

  • Total Sample Size: The meta-analysis includes 1,101 participants across 10 RCTs, which provides a reasonable sample size to detect significant effects and increase the generalizability of the findings.
  • Power: The sample size from the combined trials appears large enough to detect meaningful differences in asthma-related outcomes.

Outcome Measures:

  • Primary Outcomes: The meta-analysis assessed several outcomes related to asthma treatment:
    • Fractional exhaled nitric oxide (FeNO): Used as a marker of airway inflammation. The probiotics group showed a significant reduction (MD = -7.17, 95% CI: -12.81, -1.54).
    • Asthma symptom severity: The probiotics group showed improvement (MD = -0.07, 95% CI: -0.10, -0.04).
    • Childhood Asthma Control Test (CACT): The probiotics group had a significant improvement (MD = 2.26, 95% CI: 1.14, 3.39).
    • Number of acute episodes of asthma: The probiotics group showed a reduced rate of acute episodes (OR = 0.30, 95% CI: 0.19, 0.47).
  • Secondary Outcomes:
    • Forced expiratory volume in the first second (FEV1) and FEV1/FVC (%): No significant difference between probiotics and control groups (MD = 0.11, 95% CI: -0.05, 0.26 for FEV1 and MD = 0.32, 95% CI: -1.48, 2.12 for FEV1/FVC).

Statistical Methods:

  • Meta-analysis: The meta-analysis used Revman 5.4 software and combined the effect sizes from the individual studies using odds ratio (OR) and mean difference (MD), with 95% confidence intervals (CI). These are appropriate statistical methods for synthesizing data across multiple studies.
  • Significance: The significant improvements in FeNOsymptom severityCACT, and acute episodes all had narrow confidence intervals, indicating the robustness of the findings. However, no significant changes were found in lung function (FEV1 and FEV1/FVC), suggesting that probiotics may not impact lung function directly.

Bias and Validity:

  • Inclusion Criteria: The meta-analysis included only randomized controlled trials (RCTs), which reduces the risk of bias. However, it is important to consider whether all included RCTs had proper blinding and allocation concealment.
  • Publication Bias: The abstract does not mention whether a funnel plot or other method was used to assess publication bias, which could affect the validity of the findings.
  • Heterogeneity: The abstract does not mention whether there was significant heterogeneity between the studies, which would be important to assess the consistency of the results across the included trials.

Safety and Adverse Events:

  • Adverse Events: The abstract does not mention any adverse events associated with probiotics. While probiotics are generally considered safe, any potential side effects or concerns about long-term use should be addressed in the full study.

Conflicts of Interest:

  • Not Reported: The abstract does not provide information on potential conflicts of interest or funding sources, which would help assess the potential for bias in the conduct or reporting of the meta-analysis.

Conclusions and Applicability:

  • Conclusions: The meta-analysis suggests that probiotics can improve lung inflammation, reduce asthma symptoms, and decrease the frequency of asthma attacks in patients with asthma. However, probiotics did not affect lung function (FEV1 and FEV1/FVC), which indicates that their effects may be limited to reducing inflammation and symptoms rather than improving airflow.
  • Clinical Applicability: The findings support the use of probiotics as an adjunctive treatment for asthma, particularly for improving symptoms and reducing acute episodes. However, the lack of effect on lung function means that probiotics should not be considered a replacement for conventional asthma treatments, such as inhalers or corticosteroids.

Example TwoBenralizumab improves symptoms of patients with severe, eosinophilic asthma with a diagnosis of nasal polyposis

Summary (which we show when they make the initial call)

This randomized controlled trial of benralizumab in patients with severe, eosinophilic asthma and nasal polyposis showed significant improvements in SNOT-22 scores, asthma outcomes, and reduced exacerbations compared to placebo. While adverse events were similar between groups, no serious safety concerns were raised. However, the analysis was post-hoc, and nominal p-values were used, which limits the strength of the conclusions. The study’s small sample size in the NP substudy and the lack of information on conflicts of interest or blinding are additional considerations. The findings suggest that benralizumab may be a promising treatment for this patient population, but further research is needed to confirm these results.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Randomized Controlled Trial (RCT): This design helps ensure that the results are due to the intervention (benralizumab) and not other factors. The random allocation of participants helps minimize selection bias.
  • Post-hoc Analysis: This is a secondary analysis of a previously conducted trial (ANDHI), which may introduce bias due to non-randomization for the specific analysis and can limit the strength of conclusions.

Sample Size and Power:

  • Sample Size: Of the 656 total participants in the ANDHI trial, 153 patients participated in the NP substudy (96 in the benralizumab group and 57 in the placebo group). The sample size for the NP substudy is relatively small, particularly in the placebo group, which may limit statistical power and generalizability of results.
  • Power Consideration: The study does not explicitly mention a sample size calculation for the NP substudy, which makes it difficult to assess if the study is adequately powered to detect meaningful differences.

Outcome Measures:

  • Primary Outcome: Improvement in Sino-Nasal Outcome Test-22 (SNOT-22) scores, with an improvement of at least 8.9 defined as clinically significant.
  • Secondary Outcomes: Several asthma efficacy measures, including:
    • Annualized asthma exacerbation rate (AER)
    • St. George’s Respiratory Questionnaire (SGRQ) total score
    • Forced expiratory volume in one second (FEV1)
    • Asthma Control Questionnaire-6 (ACQ-6)
  • Results: Benralizumab was associated with significant improvement in all measured outcomes (SNOT-22, AER, FEV1, SGRQ, ACQ-6).

Statistical Methods:

  • P-values: Nominal p-values were reported for primary and secondary outcomes, and all findings related to benralizumab treatment showed statistical significance, including p-values of 0.0176 for SNOT-22 and <0.0001 for AER, FEV1, and SGRQ.
  • Effect Size: The study reports percentages of responders and magnitude of change in the outcomes (e.g., 69% reduction in AER, +0.32 L change in FEV1).
  • Nominal p-values: This is a limitation because without proper adjustment for multiple comparisons, the p-values could be inflated, and the findings might not be robust.

Bias and Validity:

  • Selection Bias: Randomization helps reduce selection bias, but the post-hoc nature of the analysis could lead to selective reporting and influence the findings.
  • Blinding: The abstract does not mention blinding, which could potentially lead to performance or detection bias if the outcomes are measured subjectively.
  • Generalizability: The study population had severe, eosinophilic asthma and NP, and findings may not generalize to other asthma subtypes or patients without NP.

Safety and Adverse Events:

  • Adverse Events: The frequency of adverse events (AEs) was similar between the benralizumab (76%) and placebo (73.7%) groups, indicating no significant safety concerns between groups.
    • Most common AEs reported in benralizumab group: headache, sinusitis, pyrexia, and influenza.
  • No serious safety issues were identified, but the study does not report any severe or life-threatening adverse events, which would be important for understanding the full safety profile.

Conflicts of Interest:

  • Not Reported: The abstract does not mention any potential conflicts of interest or funding sources. This information would help determine if there is any financial interest influencing the results.

Conclusions and Applicability:

  • Effectiveness: Benralizumab showed clinically meaningful improvements in both nasal polyposis (SNOT-22) and asthma outcomes (AER, FEV1, SGRQ, ACQ-6).
  • Clinical Applicability: These results suggest that benralizumab could be an effective treatment for patients with severe, eosinophilic asthma and nasal polyposis. However, the findings are based on a substudy of a larger trial and require confirmation in larger, more diverse populations before broad clinical implementation.
  • Future Research: The conclusion indicates that further studies are needed to confirm these findings and establish broader applicability

RCT score

Now our systematic review score has been released we’re exploring trying a similar approach to RCTs. The rationale for the RCT score is the same as the systematic review score – to encourage healthy scepticism among our users!

We are generating a list of potential criteria to score the RCT (it’ll need to be in the abstract or the metadata) and here is the initial list:

  • Age
  • Size
  • Trial registry
  • Funding
  • Trial phrase
  • Intention to treat
  • Absolute versus relative effects/risks
  • Blinding
  • Citations
  • Conflict of interest

What have we missed?

Another possibility is to use the above score as a simple guide (for all RCTs) and then allow users to ask for more in-depth analysis (using LLMs such as ChatGPT). Here’s an example of what information the LLM could extract, based on two different trials:

Example OneTreating asthma patients with probiotics: a systematic review and meta-analysis

Summary (which we show when they make the initial call)

This meta-analysis of 10 randomized controlled trials involving 1,101 asthma patients found that probiotics significantly improved lung inflammation (FeNO), symptom severity, CACT scores, and reduced acute asthma episodes. However, no significant improvements were found in lung function (FEV1 and FEV1/FVC). The study suggests that probiotics may be useful as an adjunct treatment for asthma, especially for reducing symptoms and inflammation, but not for improving lung function. Further research is needed to explore the long-term safety and effectiveness of probiotics in asthma.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Meta-Analysis of Randomized Controlled Trials (RCTs): The study is a meta-analysis, which combines data from multiple RCTs to provide a more comprehensive understanding of the effect of probiotics on asthma. RCTs are considered the gold standard in clinical research due to their ability to minimize bias.

Sample Size and Power:

  • Total Sample Size: The meta-analysis includes 1,101 participants across 10 RCTs, which provides a reasonable sample size to detect significant effects and increase the generalizability of the findings.
  • Power: The sample size from the combined trials appears large enough to detect meaningful differences in asthma-related outcomes.

Outcome Measures:

  • Primary Outcomes: The meta-analysis assessed several outcomes related to asthma treatment:
    • Fractional exhaled nitric oxide (FeNO): Used as a marker of airway inflammation. The probiotics group showed a significant reduction (MD = -7.17, 95% CI: -12.81, -1.54).
    • Asthma symptom severity: The probiotics group showed improvement (MD = -0.07, 95% CI: -0.10, -0.04).
    • Childhood Asthma Control Test (CACT): The probiotics group had a significant improvement (MD = 2.26, 95% CI: 1.14, 3.39).
    • Number of acute episodes of asthma: The probiotics group showed a reduced rate of acute episodes (OR = 0.30, 95% CI: 0.19, 0.47).
  • Secondary Outcomes:
    • Forced expiratory volume in the first second (FEV1) and FEV1/FVC (%): No significant difference between probiotics and control groups (MD = 0.11, 95% CI: -0.05, 0.26 for FEV1 and MD = 0.32, 95% CI: -1.48, 2.12 for FEV1/FVC).

Statistical Methods:

  • Meta-analysis: The meta-analysis used Revman 5.4 software and combined the effect sizes from the individual studies using odds ratio (OR) and mean difference (MD), with 95% confidence intervals (CI). These are appropriate statistical methods for synthesizing data across multiple studies.
  • Significance: The significant improvements in FeNOsymptom severityCACT, and acute episodes all had narrow confidence intervals, indicating the robustness of the findings. However, no significant changes were found in lung function (FEV1 and FEV1/FVC), suggesting that probiotics may not impact lung function directly.

Bias and Validity:

  • Inclusion Criteria: The meta-analysis included only randomized controlled trials (RCTs), which reduces the risk of bias. However, it is important to consider whether all included RCTs had proper blinding and allocation concealment.
  • Publication Bias: The abstract does not mention whether a funnel plot or other method was used to assess publication bias, which could affect the validity of the findings.
  • Heterogeneity: The abstract does not mention whether there was significant heterogeneity between the studies, which would be important to assess the consistency of the results across the included trials.

Safety and Adverse Events:

  • Adverse Events: The abstract does not mention any adverse events associated with probiotics. While probiotics are generally considered safe, any potential side effects or concerns about long-term use should be addressed in the full study.

Conflicts of Interest:

  • Not Reported: The abstract does not provide information on potential conflicts of interest or funding sources, which would help assess the potential for bias in the conduct or reporting of the meta-analysis.

Conclusions and Applicability:

  • Conclusions: The meta-analysis suggests that probiotics can improve lung inflammation, reduce asthma symptoms, and decrease the frequency of asthma attacks in patients with asthma. However, probiotics did not affect lung function (FEV1 and FEV1/FVC), which indicates that their effects may be limited to reducing inflammation and symptoms rather than improving airflow.
  • Clinical Applicability: The findings support the use of probiotics as an adjunctive treatment for asthma, particularly for improving symptoms and reducing acute episodes. However, the lack of effect on lung function means that probiotics should not be considered a replacement for conventional asthma treatments, such as inhalers or corticosteroids.

Example TwoBenralizumab improves symptoms of patients with severe, eosinophilic asthma with a diagnosis of nasal polyposis

Summary (which we show when they make the initial call)

This randomized controlled trial of benralizumab in patients with severe, eosinophilic asthma and nasal polyposis showed significant improvements in SNOT-22 scores, asthma outcomes, and reduced exacerbations compared to placebo. While adverse events were similar between groups, no serious safety concerns were raised. However, the analysis was post-hoc, and nominal p-values were used, which limits the strength of the conclusions. The study’s small sample size in the NP substudy and the lack of information on conflicts of interest or blinding are additional considerations. The findings suggest that benralizumab may be a promising treatment for this patient population, but further research is needed to confirm these results.

Extended analysis (user would have to press ‘More’ to be shown it)

Study Design:

  • Randomized Controlled Trial (RCT): This design helps ensure that the results are due to the intervention (benralizumab) and not other factors. The random allocation of participants helps minimize selection bias.
  • Post-hoc Analysis: This is a secondary analysis of a previously conducted trial (ANDHI), which may introduce bias due to non-randomization for the specific analysis and can limit the strength of conclusions.

Sample Size and Power:

  • Sample Size: Of the 656 total participants in the ANDHI trial, 153 patients participated in the NP substudy (96 in the benralizumab group and 57 in the placebo group). The sample size for the NP substudy is relatively small, particularly in the placebo group, which may limit statistical power and generalizability of results.
  • Power Consideration: The study does not explicitly mention a sample size calculation for the NP substudy, which makes it difficult to assess if the study is adequately powered to detect meaningful differences.

Outcome Measures:

  • Primary Outcome: Improvement in Sino-Nasal Outcome Test-22 (SNOT-22) scores, with an improvement of at least 8.9 defined as clinically significant.
  • Secondary Outcomes: Several asthma efficacy measures, including:
    • Annualized asthma exacerbation rate (AER)
    • St. George’s Respiratory Questionnaire (SGRQ) total score
    • Forced expiratory volume in one second (FEV1)
    • Asthma Control Questionnaire-6 (ACQ-6)
  • Results: Benralizumab was associated with significant improvement in all measured outcomes (SNOT-22, AER, FEV1, SGRQ, ACQ-6).

Statistical Methods:

  • P-values: Nominal p-values were reported for primary and secondary outcomes, and all findings related to benralizumab treatment showed statistical significance, including p-values of 0.0176 for SNOT-22 and <0.0001 for AER, FEV1, and SGRQ.
  • Effect Size: The study reports percentages of responders and magnitude of change in the outcomes (e.g., 69% reduction in AER, +0.32 L change in FEV1).
  • Nominal p-values: This is a limitation because without proper adjustment for multiple comparisons, the p-values could be inflated, and the findings might not be robust.

Bias and Validity:

  • Selection Bias: Randomization helps reduce selection bias, but the post-hoc nature of the analysis could lead to selective reporting and influence the findings.
  • Blinding: The abstract does not mention blinding, which could potentially lead to performance or detection bias if the outcomes are measured subjectively.
  • Generalizability: The study population had severe, eosinophilic asthma and NP, and findings may not generalize to other asthma subtypes or patients without NP.

Safety and Adverse Events:

  • Adverse Events: The frequency of adverse events (AEs) was similar between the benralizumab (76%) and placebo (73.7%) groups, indicating no significant safety concerns between groups.
    • Most common AEs reported in benralizumab group: headache, sinusitis, pyrexia, and influenza.
  • No serious safety issues were identified, but the study does not report any severe or life-threatening adverse events, which would be important for understanding the full safety profile.

Conflicts of Interest:

  • Not Reported: The abstract does not mention any potential conflicts of interest or funding sources. This information would help determine if there is any financial interest influencing the results.

Conclusions and Applicability:

  • Effectiveness: Benralizumab showed clinically meaningful improvements in both nasal polyposis (SNOT-22) and asthma outcomes (AER, FEV1, SGRQ, ACQ-6).
  • Clinical Applicability: These results suggest that benralizumab could be an effective treatment for patients with severe, eosinophilic asthma and nasal polyposis. However, the findings are based on a substudy of a larger trial and require confirmation in larger, more diverse populations before broad clinical implementation.
  • Future Research: The conclusion indicates that further studies are needed to confirm these findings and establish broader applicability

To me, this appears massively helpful. It may be just as a way to help people structure their critical thinking around a particular article.

As ever let me know what you think!

Automating the update of old clinical Q&As

Trip was started due to our work with clinical Q&A and started with the ATTRACT project. Since then Trip has worked on multiple separate Q&A projects and those teams have answered over 10,000 clinical questions. Our experience is unrivalled!

Within Trip we link to thousands of clinical Q&As but the vast majority are over ten years old so likely to be out of date. However, these are real clinical questions, most are likely to be relevant today and could be a valuable resource.

So, in testing our automated Q&A system, we thought we would update a random clinical question: A 43 year old lady wanting to conceive, is currently taking amlodipine and losartan for hypertension. Is this safe if she was to get pregnant whilst taking them and what is the alternative ‘safe’ antihypertensive for pregnancy? Clicking on the document will let you see the original answer from 2013.

I’m not sure how long the original answer would have taken, I would estimate 3-4 hours. Well, our updated answer took 90 seconds and – as we operationalise things – could take 10-15 seconds.

Oh yes, the new answer:

Evidence from extracted articles:

  1. Losartan (ARB):
    • The Hypertension in Pregnancy Guideline (2023) states that angiotensin receptor blockers (ARBs), such as losartan, are contraindicated in pregnancy due to the risk of fetal malformations, including renal dysfunction, oligohydramnios, and skull hypoplasia.
    • The Management of Postpartum Hypertensive Disorders of Pregnancy document reinforces this, confirming that losartan should be discontinued before conception or as soon as pregnancy is confirmed.
    • The FDA and European Medicines Agency (EMA) classify losartan as pregnancy category D, meaning there is evidence of fetal risk, and it should be avoided during pregnancy.
    • Reference: FDA Pregnancy Categories (Note: The FDA has moved away from letter categories, but losartan remains contraindicated in pregnancy.)
  2. Amlodipine (Calcium Channel Blocker):
    • The Pharmacokinetics of Common Antihypertensive Drugs in Pregnancy (2022) review notes that calcium channel blockers, including amlodipine, require further research to establish their safety profile in pregnancy.
    • systematic review (2024) comparing amlodipine and nifedipine suggests that nifedipine is the preferred option due to more established safety data.
    • While amlodipine is not explicitly contraindicated, nifedipine (another calcium channel blocker) has a stronger safety profile for pregnancy-related hypertension and is often preferred.
    • Reference: Pharmacokinetics of Antihypertensive Drugs in Pregnancy (Note: Specific link to the 2022 review is not provided, but this is a general reference for further reading.)

What are the alternative ‘safe’ antihypertensive options for pregnancy?

Evidence from extracted articles:

  1. The Hypertension in Pregnancy Guideline (2023) recommends the following as first-line treatments for hypertension in pregnancy:
    • Labetalol (a beta-blocker)
    • Nifedipine (extended-release, a calcium channel blocker)
    • Methyldopa (a centrally acting alpha-agonist)
  2. The systematic review on pharmacokinetics of antihypertensive drugs (2022) confirms that methyldopalabetalol, and nifedipine are generally safe options for pregnancy.

Additional knowledge from clinical guidelines (outside extracted articles):

  • Labetalol is often the first-line choice due to its efficacy, safety, and minimal fetal side effects.
  • Nifedipine is used when labetalol is not suitable or contraindicated.
  • Methyldopa, though safe, is less commonly used due to side effects such as fatigue, depression, and drowsiness.

References:


Final Clinical Recommendations:

  1. Losartan should be discontinued before conception or immediately upon confirming pregnancy due to the risk of fetal harm.
  2. Amlodipine is not absolutely contraindicated, but nifedipine is preferred as a safer calcium channel blocker during pregnancy.
  3. First-line alternatives for managing hypertension in pregnancy include:
    • Labetalol (most commonly recommended)
    • Nifedipine (extended-release)
    • Methyldopa (less preferred due to side effects)

Additional Notes:

  • If the patient is planning to conceive, it is crucial to switch from losartan to a safer antihypertensive before pregnancy.
  • Regular monitoring of blood pressure and fetal growth is essential during pregnancy.
  • Always consult a healthcare provider for personalized advice and management.

Identifying research gaps in Parkinson’s Disease – prioritised and compared

In our previous post we used AI to prioritise research priorities for Parkinson’s Disease. But how do we judge if it’s any good? Well, we went and explored the literature and found this 2024 paper in Nature Reviews Neurology Parkinson disease therapy: current strategies and future research priorities. It is unclear how they arrived at their list, I assume expert opinion.

But the comparison—using ChatGPT—is fascinating. Not just in terms of where they align, but also in their differences. For me, the standout difference is that Nature Reviews Neurology places a strong emphasis on interventions such as pharmaceutical drugs and surgical treatments. This seems intuitively correct, but it also reminds me of Iain Chalmers’ work on research prioritisation.

Iain has done extensive work in this area and was the driving force behind the James Lind Alliance. In his 2015 paper Patients’, clinicians’ and the research communities’ priorities for treatment research: there is an important mismatch highlights a significant divide between research agendas. The study found that patients and clinicians often prioritise non-drug treatments, such as physical or psychological therapies, while researchers tend to focus on drug treatments.

Specifically, only 18% of the treatment priorities identified by patients and clinicians were drug-related, compared to 37% in non-commercial trials and 86% in commercial trials. The authors emphasise the need for the research community to better align with the priorities of patients and healthcare professionals, advocating for increased efforts to address issues most relevant to those directly affected.

Comparison of Research Priorities: Nature Reviews Neurology vs. Trip Database Blog

I’ve compared the research priorities outlined in Nature Reviews Neurology (“Parkinson Disease Therapy: Current Strategies and Future Research Priorities”) with those in the Trip Database Blog (“Identifying Research Gaps in Parkinson’s Disease – Prioritised”). Below is a breakdown of where they align and where they diverge.


1. Shared Research Priorities

1.1 Early Detection & Biomarkers

Both sources emphasize the importance of developing reliable biomarkers for early diagnosis and monitoring disease progression.

  • Nature Reviews Neurology focuses on integrating sensitive biomarkers (biofluid, imaging, digital) to track disease progression and support early intervention.
  • The Trip Database Blog highlights the need for non-invasive biomarkers (e.g., saliva, blood, microbiota) to improve accessibility and reduce inequities in early detection.

1.2 Advancing Symptomatic Treatments

Both sources prioritize improving symptom management, particularly for motor and non-motor complications.

  • Nature Reviews Neurology discusses refining deep brain stimulation (DBS), surgical, and infusion therapies to target levodopa-resistant symptoms and delay complications.
  • The Trip Database Blog explores optimal exercise regimens and cognitive training interventions to improve motor function and cognitive symptoms.

1.3 Disease Modification Strategies

Both sources recognize the need for treatments that alter the disease course, rather than just alleviating symptoms.

  • Nature Reviews Neurology discusses potential disease-modifying therapies, including α-synuclein-targeting treatments, neuroinflammation approaches, and genetic therapies (LRRK2, GBA mutations, GLP-1 receptor agonists like Exenatide).
  • The Trip Database Blog specifically calls out GLP-1 receptor agonists (like Exenatide) as a promising intervention and highlights the need to determine which patients will benefit most from disease-modifying therapies.

2. Differences Between the Two Sources

2.1 Role of Artificial Intelligence (AI) (Trip Database Blog Only)

  • The Trip Database Blog explicitly prioritizes the use of AI in early diagnosis to improve accuracy and reduce misdiagnosis rates.
  • Nature Reviews Neurology does not specifically mention AI, though its emphasis on biomarkers could theoretically align with AI-driven diagnostics.

2.2 Personalized vs. Population-Level Approaches

  • Nature Reviews Neurology emphasizes a personalized medicine approach, tailoring therapies based on genetic and biomarker profiling.
  • The Trip Database Blog is more focused on population-level solutions, such as broad biomarker screening and AI integration to ensure equitable early diagnosis.

2.3 Focus on Exercise & Lifestyle (Trip Database Blog Only)

  • The Trip Database Blog highlights the need to identify the most effective exercise regimens for symptom management and potential disease modification.
  • Nature Reviews Neurology does not explicitly discuss exercise or lifestyle interventions, although it does acknowledge environmental and genetic risk factors in disease development.

2.4 Deep Dive into Disease Mechanisms (Nature Reviews Neurology Only)

  • Nature Reviews Neurology provides a much deeper discussion of underlying disease mechanisms, including α-synuclein aggregation, genetic mutations, and neuroinflammatory pathways.
  • The Trip Database Blog focuses more on practical applications (e.g., biomarker accessibility, AI tools, and exercise programs) rather than mechanistic research.

Conclusion

Key Overlaps:

Early Detection & Biomarkers – Both emphasize the importance of finding reliable, accessible biomarkers.
Symptom Management – Both focus on improving treatments for motor and non-motor symptoms.
Disease Modification – Both support research into disease-modifying therapies, particularly GLP-1 receptor agonists (like Exenatide).

Key Differences:

AI in Diagnosis – Only Trip Database Blog prioritizes the role of AI in improving diagnosis.
Exercise & Lifestyle – Only Trip Database Blog emphasizes exercise regimens as a research priority.
Personalized Medicine vs. Population-Level ApproachesNature Reviews Neurology focuses on tailored treatments based on genetics and biomarkers, while the Trip Database Blog highlights broad interventions for all patients.
Mechanistic Research FocusNature Reviews Neurology is heavily focused on underlying disease mechanisms, while the Trip Database Blog is more focused on applied interventions.

Identifying research gaps in Parkinson’s Disease – prioritised!

UPDATE: Shortly after publishing this blog, I had a thought—does any literature exist on prioritisations? It turns out there is, which inspired a new blog: Identifying Research Gaps in Parkinson’s Disease – Prioritised and Compared.

All gaps are weighted equally so it would also be interesting to see if it could provide some sort of ranking.” Building on yesterday’s discussion on research priorities, I leveraged AI/LLMs to do the heavy lifting. I asked two separate LLMs (ChatGPT and DeepSeek) to prioritize the original list based on things such as potential impact on disease burden, equity, and feasibility of implementation. I then used the LLMs to reconcile their suggestions, leading to the following final prioritization:

Top Priority (High Impact, Feasible, and Addresses Urgent Needs & Equity)

These research areas have the highest potential to significantly alter disease progression, improve early diagnosis, and enhance patient quality of life, while being feasible for near-term implementation.

1. Lack of Reliable Biomarkers for Early Detection

  • Why? Early diagnosis is critical for slowing disease progression and improving quality of life. Biomarkers (e.g., α-synuclein, mitochondrial DNA, blood-based markers) could transform Parkinson’s care by enabling earlier intervention.
  • Feasibility: Advances in genetic, imaging, and blood-based biomarkers are progressing rapidly.
  • Equity: Biomarkers would increase access to earlier diagnosis, especially if they can be non-invasive and cost-effective.

2. Non-Invasive Biomarkers for Disease Detection & Progression Monitoring

  • Why? Saliva, blood, and microbiota-based markers offer less invasive and more accessible alternatives to expensive imaging or invasive procedures.
  • Feasibility: These biomarkers are already under active investigation and could be implemented relatively quickly with further validation.
  • Equity: Improves accessibility in low-resource settings, benefiting underserved populations.

3. Integration of AI for Early & Accurate Diagnosis

  • Why? AI-based tools enhance diagnostic accuracy, potentially reducing misdiagnoses and allowing earlier intervention.
  • Feasibility: AI-based algorithms for imaging, digital cognitive assessments, and motor function tracking are already in development and testing.
  • Equity: If designed with diverse datasets, AI can expand diagnostic access to regions with limited specialists.

4. Understanding Non-Motor Symptoms as Early Indicators

  • Why? Non-motor symptoms (e.g., cognitive decline, gut microbiota changes, depression, psychiatric symptoms) often appear years before motor symptoms, making them valuable for early diagnosis.
  • Feasibility: Ongoing research suggests non-motor symptoms could be integrated into diagnostic criteria with further validation.
  • Equity: Improves early detection and personalized treatment in diverse populations.

5. Role of GLP-1 Receptor Agonists as Disease-Modifying Treatments

  • Why? GLP-1 agonists (e.g., Lixisenatide, Exenatide) have shown promise in slowing disease progression, addressing a major unmet need.
  • Feasibility: These drugs are already approved for diabetes and other conditions, making repurposing feasible.
  • Equity: If effective, they could be integrated into treatment regimens globally.

6. Best Exercise Type & Intensity for Motor Symptoms

  • Why? Exercise is a low-cost, widely accessible intervention with proven benefits for motor symptoms and no significant side effects.
  • Feasibility: Exercise is already widely studied, but optimal regimens for symptom management still need refinement.
  • Equity: Can be implemented globally at low cost, benefiting all socioeconomic groups.

7. Cognitive Training Interventions for Parkinson’s-Related Dementia & Mild Cognitive Impairment

  • Why? Cognitive decline significantly impacts quality of life and caregiver burden, yet current treatments are limited.
  • Feasibility: Digital and behavioral cognitive training interventions are emerging and could be scaled up.
  • Equity: Can be widely accessible via mobile apps, online programs, and community health initiatives.

Final Adjustments & Notes on Reconciliation

  • Both original lists shared the same core priorities, but some terminology and emphasis differed.
  • Equity considerations were harmonized—especially in how early detection, AI, and exercise could reach underserved populations.
  • Feasibility analysis was refined to ensure short-to-medium-term implementation is realistic.
  • The ranking maintains a balance between high-impact, scalable interventions and urgent research gaps.

Final comments: Looks and sounds plausible but we have no way of verifying the ‘accuracy’ of the above. As with yesterday, and interesting experiment none-the-less!

Systematic review score – explained

We host one of the largest dedicated collections of systematic reviews on the web, with over 550,000 reviews available. With this vast resource comes a responsibility: ensuring that users critically assess the validity of the systematic reviews they access.

Systematic reviews sit at the top of the evidence pyramid/hierarchy, but their inclusion in this category does not automatically guarantee high-quality evidence. While well-conducted systematic reviews deserve their status, many are poorly executed and risk misleading users.

To address this, we have developed a scoring system designed to help users gauge the reliability of systematic reviews. By identifying markers of both rigorous methodology and potential weaknesses, this system will generate a score displayed alongside each review, providing a quick and informed assessment of its credibility.

These scores may be controversial, and we recognize the limitations of any scoring system. To ensure transparency, we provide an explanation of each scoring element and how it impacts the final score. Ultimately, the primary aim of this system is to encourage healthy scepticism among our users.

Elements

Age of review?: The older the review the more likely new research has been published, meaning the systematic review out of date. Not exact as there might be no new research, but hopefully the sentiment is clear.

  • 0-2 years = Good
  • 3-5 = OK
  • 6+ = Cause for concern

Number of Authors?: Systematic reviews require rigorous methodology, and having a single author raises concerns about bias and robustness. While more isn’t always better, a minimum of three authors is generally seen as best practice.

  • 1 author = Red flag
  • 2 authors = Cause for concern
  • ≥ 3 = No concerns

Databases Searched ?: A systematic review should search multiple databases to minimize bias and ensure comprehensive coverage of relevant literature. A review relying on just one database is highly problematic, and even two may be insufficient.

  • 1 database = Bad
  • 2 = Cause for concern
  • ≥ 3 = No concerns

Mention of registration?: Registering systematic reviews (e.g., in PROSPERO) is considered best practice. Reviews published before PROSPERO’s launch in 2011 won’t have this option, but they may already be affected by the Age of Review criterion.

  • Mentions PROSPERO or similar registry = Good
  • No mention of registration = Cause for concern

Mention of quality tools ?: Quality assessment frameworks such as GRADE, AMSTAR, and PRISMA enhance systematic review rigor. Their inclusion signals a commitment to high methodological standards.

  • Mentions any of these tools = Good
  • No mention = Cause for concern

Retraction watch check for the article and referenced work?: We check the Retraction Watch database for articles that are included in systematic reviews. Any retraction gets a Red Flag, regardless of the contribution to the systematic review.

  • No retractions = No cause for concern
  • Any retractions = Red flag

Each element is given a score and these are combined to assign one of three scores:

Some obvious issues with the score, here are a handful, I’m sure others will highlight other issues:

  • Abstract-based assessment – The system evaluates what is explicitly mentioned in the abstract, not the full text.
  • Text-matching accuracy – Automated detection of databases, PROSPERO registration, etc., may not be 100% accurate.
  • Reliance on DOIs – Missing digital object identifiers (DOIs) can disrupt the scoring process.
  • No text – Our system needs text to analyse, so no text = no score. Text might be absent if there are redirects, or other website anomalies.
  • Arbitrary thresholds – The cutoffs for each category are subjective and may evolve over time.

Whitelisted sources

For a number of trusted publishers we default to ‘Appears promising’, these are typically national bodies such as IQWiG, CADTH, NICE and Cochrane. We still look for retractions (where the publication has a DOI) and will ‘red flag’ them if there is a retraction.

Summary

So, there you have the scoring system, flaws and all. Despite these imperfections, we are excited to release this scoring system. Its primary purpose is to encourage critical thinking, ensuring that users do not accept systematic reviews at face value but instead engage with them sceptically and thoughtfully.

Blog at WordPress.com.

Up ↑