We have recently released our systematic review score, we’ve had our clinical guideline score for ages and we’re exploring an RCT score.
In the spirit of transparency, wanting to encourage discussion and ultimately improve things I have posted an email I sent to the Evidence-Based Health discussion list. Hopefully it’s self-explanatory!
Hi All,
I recently asked for comments on a possible scoring system for RCTs (following on from our recent systematic review scoring system) and I ended up having a really interesting discussion with someone not on this mail-list (it was forwarded to him by someone on this list). It was quite long and so I asked ChatGPT to summarise the respective positions (that’s point 1 below). I shared the summary with the respondent and his comments are point 2. Finally, point 3 is a possible enhancement on the scoring system, asking ChatGPT to critique a study (there are two examples).
To reiterate, my motivation for introducing a scoring system is to try to ensure users (of the Trip Database) do not see a systematic review or RCT and assume – given their privileged position in EBM – to be ‘good’ evidence. To me, the scores are an attempt to induce scepticism (although I do think the ChatGPT critiques are at a different level). I really would be interested in what the collective wisdom of the list makes of this and possible ways I could enhance the way these scores are framed to ensure they have maximal effect.
Thanks in advance.
jon
1) LLM Summary of debate
Main Points of Concern:
- Oversimplification of Complex Issues:
- Reducing complex study design elements (e.g., blinding, sample size) to numeric scores risks missing important nuances. Some study flaws may not be easily quantifiable, leading to a misleading or oversimplified evaluation.
- Subjectivity in Scoring:
- Numeric scores are often subjective and can be arbitrary (e.g., how to weight factors like sample size or blinding). This subjectivity risks distorting the assessment of study quality.
- Undermining Critical Thinking:
- A scoring system could discourage users from critically engaging with the study. It might promote the idea that a score is enough, instead of encouraging users to examine the study’s methodology in depth.
- LLMs and Critical Appraisal:
- While LLMs can flag potential issues, they don’t provide a nuanced, complete critique of a study, and may not capture critical flaws or explain them in a meaningful way.
Counter Arguments (Your Position):
- Better Than Doing Nothing:
- A scoring system offers a framework for users to begin evaluating study quality, providing a starting point, especially for those without advanced knowledge in study design. It can guide further investigation rather than replace critical thinking.
- Encouraging Engagement:
- Even if simplified, the score encourages users to think about study quality. It can prompt users to dive deeper into flagged issues and learn more about study design, bias, and other important factors.
- Making Information Accessible:
- A scoring system simplifies complex information and makes it more digestible, particularly for non-experts. It helps make the evaluation of studies more accessible without overwhelming users with technical details.
- Practical Decision-Making Tool:
- In practical settings, especially in healthcare, having a quick, accessible tool for assessing study quality is better than having no tool at all. It saves time and allows for quicker decision-making when reviewing large volumes of studies or systematic reviews.
2) Commentators Reflection on the summary
The summary you sent is not bad … but it feels to me like it suffers in much the same way that the scoring system would suffer — it’s a little too superficial, and doesn’t explain the criticisms it makes. (It mentions the word “nuanced” several times … but it itself lacks depth, and thus also nuance.) Still, I’m not willing to write a long essay about this … which is what would really be needed.
3) LLM Supported Critical Appraisal
Example One: Treating asthma patients with probiotics: a systematic review and meta-analysis
Summary (which we show when they make the initial call)
This meta-analysis of 10 randomized controlled trials involving 1,101 asthma patients found that probiotics significantly improved lung inflammation (FeNO), symptom severity, CACT scores, and reduced acute asthma episodes. However, no significant improvements were found in lung function (FEV1 and FEV1/FVC). The study suggests that probiotics may be useful as an adjunct treatment for asthma, especially for reducing symptoms and inflammation, but not for improving lung function. Further research is needed to explore the long-term safety and effectiveness of probiotics in asthma.
Extended analysis (user would have to press ‘More’ to be shown it)
Study Design:
- Meta-Analysis of Randomized Controlled Trials (RCTs): The study is a meta-analysis, which combines data from multiple RCTs to provide a more comprehensive understanding of the effect of probiotics on asthma. RCTs are considered the gold standard in clinical research due to their ability to minimize bias.
Sample Size and Power:
- Total Sample Size: The meta-analysis includes 1,101 participants across 10 RCTs, which provides a reasonable sample size to detect significant effects and increase the generalizability of the findings.
- Power: The sample size from the combined trials appears large enough to detect meaningful differences in asthma-related outcomes.
Outcome Measures:
- Primary Outcomes: The meta-analysis assessed several outcomes related to asthma treatment:
- Fractional exhaled nitric oxide (FeNO): Used as a marker of airway inflammation. The probiotics group showed a significant reduction (MD = -7.17, 95% CI: -12.81, -1.54).
- Asthma symptom severity: The probiotics group showed improvement (MD = -0.07, 95% CI: -0.10, -0.04).
- Childhood Asthma Control Test (CACT): The probiotics group had a significant improvement (MD = 2.26, 95% CI: 1.14, 3.39).
- Number of acute episodes of asthma: The probiotics group showed a reduced rate of acute episodes (OR = 0.30, 95% CI: 0.19, 0.47).
- Secondary Outcomes:
- Forced expiratory volume in the first second (FEV1) and FEV1/FVC (%): No significant difference between probiotics and control groups (MD = 0.11, 95% CI: -0.05, 0.26 for FEV1 and MD = 0.32, 95% CI: -1.48, 2.12 for FEV1/FVC).
Statistical Methods:
- Meta-analysis: The meta-analysis used Revman 5.4 software and combined the effect sizes from the individual studies using odds ratio (OR) and mean difference (MD), with 95% confidence intervals (CI). These are appropriate statistical methods for synthesizing data across multiple studies.
- Significance: The significant improvements in FeNO, symptom severity, CACT, and acute episodes all had narrow confidence intervals, indicating the robustness of the findings. However, no significant changes were found in lung function (FEV1 and FEV1/FVC), suggesting that probiotics may not impact lung function directly.
Bias and Validity:
- Inclusion Criteria: The meta-analysis included only randomized controlled trials (RCTs), which reduces the risk of bias. However, it is important to consider whether all included RCTs had proper blinding and allocation concealment.
- Publication Bias: The abstract does not mention whether a funnel plot or other method was used to assess publication bias, which could affect the validity of the findings.
- Heterogeneity: The abstract does not mention whether there was significant heterogeneity between the studies, which would be important to assess the consistency of the results across the included trials.
Safety and Adverse Events:
- Adverse Events: The abstract does not mention any adverse events associated with probiotics. While probiotics are generally considered safe, any potential side effects or concerns about long-term use should be addressed in the full study.
Conflicts of Interest:
- Not Reported: The abstract does not provide information on potential conflicts of interest or funding sources, which would help assess the potential for bias in the conduct or reporting of the meta-analysis.
Conclusions and Applicability:
- Conclusions: The meta-analysis suggests that probiotics can improve lung inflammation, reduce asthma symptoms, and decrease the frequency of asthma attacks in patients with asthma. However, probiotics did not affect lung function (FEV1 and FEV1/FVC), which indicates that their effects may be limited to reducing inflammation and symptoms rather than improving airflow.
- Clinical Applicability: The findings support the use of probiotics as an adjunctive treatment for asthma, particularly for improving symptoms and reducing acute episodes. However, the lack of effect on lung function means that probiotics should not be considered a replacement for conventional asthma treatments, such as inhalers or corticosteroids.
Example Two: Benralizumab improves symptoms of patients with severe, eosinophilic asthma with a diagnosis of nasal polyposis
Summary (which we show when they make the initial call)
This randomized controlled trial of benralizumab in patients with severe, eosinophilic asthma and nasal polyposis showed significant improvements in SNOT-22 scores, asthma outcomes, and reduced exacerbations compared to placebo. While adverse events were similar between groups, no serious safety concerns were raised. However, the analysis was post-hoc, and nominal p-values were used, which limits the strength of the conclusions. The study’s small sample size in the NP substudy and the lack of information on conflicts of interest or blinding are additional considerations. The findings suggest that benralizumab may be a promising treatment for this patient population, but further research is needed to confirm these results.
Extended analysis (user would have to press ‘More’ to be shown it)
Study Design:
- Randomized Controlled Trial (RCT): This design helps ensure that the results are due to the intervention (benralizumab) and not other factors. The random allocation of participants helps minimize selection bias.
- Post-hoc Analysis: This is a secondary analysis of a previously conducted trial (ANDHI), which may introduce bias due to non-randomization for the specific analysis and can limit the strength of conclusions.
Sample Size and Power:
- Sample Size: Of the 656 total participants in the ANDHI trial, 153 patients participated in the NP substudy (96 in the benralizumab group and 57 in the placebo group). The sample size for the NP substudy is relatively small, particularly in the placebo group, which may limit statistical power and generalizability of results.
- Power Consideration: The study does not explicitly mention a sample size calculation for the NP substudy, which makes it difficult to assess if the study is adequately powered to detect meaningful differences.
Outcome Measures:
- Primary Outcome: Improvement in Sino-Nasal Outcome Test-22 (SNOT-22) scores, with an improvement of at least 8.9 defined as clinically significant.
- Secondary Outcomes: Several asthma efficacy measures, including:
- Annualized asthma exacerbation rate (AER)
- St. George’s Respiratory Questionnaire (SGRQ) total score
- Forced expiratory volume in one second (FEV1)
- Asthma Control Questionnaire-6 (ACQ-6)
- Results: Benralizumab was associated with significant improvement in all measured outcomes (SNOT-22, AER, FEV1, SGRQ, ACQ-6).
Statistical Methods:
- P-values: Nominal p-values were reported for primary and secondary outcomes, and all findings related to benralizumab treatment showed statistical significance, including p-values of 0.0176 for SNOT-22 and <0.0001 for AER, FEV1, and SGRQ.
- Effect Size: The study reports percentages of responders and magnitude of change in the outcomes (e.g., 69% reduction in AER, +0.32 L change in FEV1).
- Nominal p-values: This is a limitation because without proper adjustment for multiple comparisons, the p-values could be inflated, and the findings might not be robust.
Bias and Validity:
- Selection Bias: Randomization helps reduce selection bias, but the post-hoc nature of the analysis could lead to selective reporting and influence the findings.
- Blinding: The abstract does not mention blinding, which could potentially lead to performance or detection bias if the outcomes are measured subjectively.
- Generalizability: The study population had severe, eosinophilic asthma and NP, and findings may not generalize to other asthma subtypes or patients without NP.
Safety and Adverse Events:
- Adverse Events: The frequency of adverse events (AEs) was similar between the benralizumab (76%) and placebo (73.7%) groups, indicating no significant safety concerns between groups.
- Most common AEs reported in benralizumab group: headache, sinusitis, pyrexia, and influenza.
- No serious safety issues were identified, but the study does not report any severe or life-threatening adverse events, which would be important for understanding the full safety profile.
Conflicts of Interest:
- Not Reported: The abstract does not mention any potential conflicts of interest or funding sources. This information would help determine if there is any financial interest influencing the results.
Conclusions and Applicability:
- Effectiveness: Benralizumab showed clinically meaningful improvements in both nasal polyposis (SNOT-22) and asthma outcomes (AER, FEV1, SGRQ, ACQ-6).
- Clinical Applicability: These results suggest that benralizumab could be an effective treatment for patients with severe, eosinophilic asthma and nasal polyposis. However, the findings are based on a substudy of a larger trial and require confirmation in larger, more diverse populations before broad clinical implementation.
- Future Research: The conclusion indicates that further studies are needed to confirm these findings and establish broader applicability

Recent Comments