Trip Database Blog

Liberating the literature

De-duplications: more quality improvements

One of the developments strands of Trip is improving the quality of existing content or functionality. De-duplication was mentioned in a recent post on quality and we’re pleased to announce significant progress.

Given the complex nature of Trip and the variety of sources of content, we have generated a number of duplicate records – two (or more) examples of the same article. Often identical but sometimes a link to the abstract and another to the full-text. Having two copies of the same article is good for no-one and just adds ‘noise’ to the search results. To identify and remove these has proved to be a challenging piece of work but we’ve finished the work and identified a total of 143,218 duplicates and these are currently being removed from the index.

Are we now duplicate free? Invariably not, but we’ve probably got the vast majority. But, if you do spot one please let us know.

Up Next

As the de-duplication finishes our next quality issue is to remove articles, from PubMed, that contain no abstract. We never used to include them but with the new system it was overlooked so they’ve crept back in. PubMed articles with no abstract contain no/little actionable information so it adds ‘noise’ to the results and very little ‘signal’.

Introducing our RCT score

Hot on the heels of us releasing our guideline score we’re releasing our RCT score. We’ve been working with the wonderful RobotReviewer team for years now and one of their products is a Risk of Bias (RoB) score for RCTs. We introduced it in 2016 where we classified all the trials into categories of ‘low risk of bias’ and ‘high/unknown risk of bias’. When we recently re-wrote the site we did not immediately include the RoB score. In part this reflects that, since 2016, the thinking and technology has developed considerably. So, we’re very pleased to reintroduce it to the site.

The new score does not categorise the RoB into ‘low’ or ‘high/unknown’, it gives a score based on the likely RoB on a linear scale. We take that score and transform that into a graphic that is similar to that seen on the guideline score:

RCTs are important in the world of EBM and, as with guidelines, they are not all equally good! This score reflects the likelihood of bias and should help our users better make sense of the evidence base.

New Filter: European Guidelines

One relatively minor addition to our recent guideline enhancements has been the introduction of a new geographic guideline filter – ‘Europe’

Over the last few years we’ve been diligently identifying and adding guidelines from Europe so it made sense to add a new filter.

One final tweak was to make the ability to filter by geographic area a ‘Pro’ only feature.

Introducing our guideline score

The production of guidelines is a complex task and there are a multitude of methods, some more rigorous than others. While Trip places guidelines at the top of the evidence pyramid we need to recognise this is an over simplification . Our guideline score is designed to help our users understand how robust a guideline might be.

The guideline score has been a concept we’ve explored for a number of years (eg Quality and guidelines from 2019 and Grading guidelines from 2020) and involves us scoring each publisher (not individual guideline – see limitations below) based on 5 criteria:

  • Do they publish their methodology? No = 0, Yes = 1, Yes and mention AGREE (or similar) = 2
  • Do they use any evidence grading e.g. GRADE? No = 0, Yes = 2
  • Do they undertake a systematic evidence search? Unsure/No = 0, Yes = 2
  • Are they clear about funding? No = 0, Yes = 1
  • Do they mention how they handle conflict of interest? No = 0, Yes = 1

The highest score being 8. Our work has shown that the above results give a very good approximations to the more formal methods, hence we’re using this simpler approach. And this is what it looks like:


This approach has a number of issues, for instance:

  • It is carried out at the publisher level and was done at a certain date. So, if we scored things in 2021 the scoring covers guidelines produced by the publisher in 2014 (say) and 2023. The methodology might well have changed between those dates. This is not reflected in our scoring.
  • Linked with the above point, it assumes the guideline publisher uses the same methodology for all guidelines.
  • Many of the lowest scoring producers do so due to the lack of publication of their methodologies making it impossible to properly score them, so our approach may underestimate the rigour of the methodology. If our approach encourages publishers to be more transparent then it’ll be a great result in itself!
  • The scoring system uses 5 elements, it might benefit from more but we have to pragmatically balance rigour and resource.

LLMs again….

Fascinating technology and we’re continuing to explore its use. One focus is on auto-summarising documents and so far it’s pretty good/interesting (with the large caveat that it can be problematic and hallucinate)! And there are two examples (below), generated from

  • Extract the main recommendations from a SIGN guideline
  • Summarise the methods section of a Cochrane review

The first response seems reasonably good. However, the second one was problematic. I tried it a few times – more around the wording – to get it to ‘correctly’ summarise. A few times it suggested the document had nothing to do with cranberry juice and UTIs! Also, on one occasion it stuck to summarising the methods section whereas, in the example below, it went further!

Our first use case is to use it to summarise long documents into abstracts. Initially this would just be used by our search index and it’d allow us to modify our search to default to ‘title and abstract’. Currently our default is ‘title and body of the document’ which, for long documents, is particularly sensitive returning low relevancy documents. For instance, a document on measles, which is 100,000 words long mentions prostate cancer one time. This means it is returned for a search on prostate cancer! This is normally fine as it’d be low down in the results. However, if you filter the documents to a single area (eg UK guidelines) where there are few documents then suddenly the low relevancy results can be seen on the first page of results. A summary of the document would not include prostate cancer and would therefore not be returned in the results!

The second use case would be to allow users to see the summary, helping them decide if it was useful or not. There are issues around copyright to consider….

We’re having some fun with this technology and we’re happy to hear from you if you have any suggestions as to how Trip could use it to good effect!

Extract the main recommendations from a SIGN guideline

Prompt: Can you extract the background and main recommendations from this document

Response: Yes, I can extract the background and main recommendations from the document you provided.

The document is a guideline developed by the Scottish Intercollegiate Guidelines Network (SIGN) on the assessment and management of eating disorders. It provides recommendations based on the best available evidence for healthcare professionals involved in the care of individuals with eating disorders.

Main recommendations:
The document provides recommendations for the assessment, treatment, and management of eating disorders, including anorexia nervosa, bulimia nervosa, binge eating disorder, and other specified feeding or eating disorders. Here are some of the main recommendations:

Assessment: A thorough assessment of the individual’s physical, psychological, and social needs should be conducted as soon as possible. This includes obtaining a detailed history of the eating disorder, physical examination, and laboratory investigations.

Treatment: A multidisciplinary team approach is recommended for the treatment of eating disorders. The team should include a range of healthcare professionals, such as psychiatrists, dietitians, psychologists, and physicians.

Psychological interventions: Evidence-based psychological interventions, such as cognitive-behavioral therapy, interpersonal therapy, and family therapy, should be offered to individuals with eating disorders.

Nutritional management: Nutritional management should be tailored to the individual’s needs and preferences. A dietitian should be involved in the development and implementation of the nutritional plan.

Pharmacological interventions: Medication should only be used in conjunction with psychological and nutritional interventions. The use of medication should be carefully monitored, and the benefits and risks should be weighed.

Monitoring and follow-up: Individuals with eating disorders should receive ongoing monitoring and follow-up care. This includes regular physical and mental health assessments, as well as ongoing support from the multidisciplinary team.

Overall, the document emphasizes the importance of a collaborative, evidence-based approach to the assessment and management of eating disorders.

Summarise the methods section of a Cochrane review

Prompt: Can you summarise the methods section, as bullet points, from this document


  • This Cochrane systematic review aimed to evaluate the effectiveness of cranberry products in preventing urinary tract infections (UTIs) in people of all ages.
  • The review included 28 randomized controlled trials (RCTs) involving a total of 4,473 participants, who were mostly women.
  • The studies compared cranberry products (juice, capsules, tablets, or other formulations) to placebo, no treatment, or other interventions for preventing UTIs.
  • The primary outcome was the incidence of symptomatic UTIs, defined as the presence of at least one urinary tract symptom (such as pain, burning, or frequency) and the presence of at least 10^5 colony-forming units (CFU) of bacteria per milliliter of urine.
  • The secondary outcomes included the incidence of asymptomatic bacteriuria (the presence of bacteria in the urine without symptoms), the incidence of UTIs caused by specific bacteria (such as E. coli), adverse events, and compliance with the interventions.
  • The review used standard Cochrane methods for searching, selecting, and assessing the quality of the studies, and for synthesizing the data using meta-analysis.
  • The quality of the evidence was assessed using the GRADE approach, which takes into account factors such as study design, risk of bias, inconsistency, imprecision, and publication bias.
  • The review found that cranberry products may slightly reduce the incidence of symptomatic UTIs compared to placebo or no treatment, but the effect was small and of uncertain clinical significance.
  • The absolute reduction in UTIs ranged from 0% to 13%, depending on the type and dose of cranberry product, the population studied, and other factors.
  • The review did not find evidence of a difference in the incidence of asymptomatic bacteriuria or adverse events between cranberry and control groups.
  • The review concluded that cranberry products cannot be recommended as a routine prophylaxis for UTIs, but may be considered as a non-pharmacological option for some women who are motivated to try them and understand the limitations of the evidence.

LLMs and Clickstream

We’re still playing with LLMs and and it’s going interestingly. We’re moving away from a chatbot approach (like ChatGPT) and have started exploring the use of these on our clickstream data. We took the NICE guideline Acute heart failure: diagnosis and management and asked ChatGPT to analyse all the search terms used to find the document (in our sample this was 40 separate searches) and to suggest related concepts. It reported:

These search terms are related to the topic of heart failure, its diagnosis, and management, as well as related conditions such as acute coronary syndrome and pulmonary edema. Specific medications like Sodium nitroprusside and interventions like Percutaneous coronary intervention and continuous positive airway pressure (CPAP) are also mentioned.

Related search terms could include:

  • Cardiac biomarkers
  • Echocardiogram
  • B-type natriuretic peptide (BNP)
  • Angiotensin-converting enzyme (ACE) inhibitors
  • Diuretics
  • Left ventricular assist device (LVAD)
  • Coronary artery disease (CAD)
  • Myocardial infarction (MI)
  • Chest pain
  • Smoking cessation
  • Pain management

It is important to note that seeking medical advice from a healthcare professional is always recommended for proper diagnosis and treatment.

It’s interesting, but is it useful?

I also took a load of documents related to pregnancy and UTI and asked ChatGPT to find the most popular and look for themes. It reported:

The most popular documents appear to be:

  • Interventions for preventing recurrent urinary tract infection during pregnancy (appears multiple times)
  • Urinary Tract Infections in Pregnancy (Treatment) (appears multiple times)
  • Treatments for symptomatic urinary tract infections during pregnancy (appears multiple times)

The themes that emerge from the list of documents include:

  • Urinary tract infections during pregnancy, including their prevention, treatment, and recurrence
  • Guidelines and recommendations for the management of UTIs in adults, particularly uncomplicated cases
  • Bacterial infections during pregnancy, including Group B Streptococcal Bacteriuria and sepsis
  • Azoospermic males and their work-up
  • Ectopic pregnancy and miscarriage.

The popular articles look good, less convinced by the themes (azoospermic males?)

One to reflect on and if you have any ideas what we could look at, please let us know:

Quality, quality, quality

Now the recoding of the site is in the past we’re looking forward to adding new features to the site and improving the quality of what is already there. This latter was brought into focus by a strong, but fair, email about issues with Trip. Perhaps there was an element of complacency on our part (subscribers are increasing, we had the rewrite under our belt and little negative feedback). Perhaps we took our eye off the ball. So, it was a great email to receive – removing the complacency and reminding us that quality is really important.

So, what have we done and/or are doing in the short-term:

  • Mentioned in the last blog we’re working on a better de-duplication system. There are a small, but significant, number of duplicate articles which we will soon be removing [Work starting next week]
  • Dual categories – another blog post highlighting some articles are classified as, say, a ‘systematic review’ and ‘primary research’. This has now been fixed! [Fixed]
  • Mis-classified results – linked to the above, some articles were appearing in the wrong filter. So, a user might select a ‘guideline’ and be shown a ‘systematic review’! [Work starting next week]
  • Covid-19 – I really dropped the ball here! At the start of Covid-19 we tried to keep up with things, which meant some shortcuts were taken. To speed up the synonym aspect we hard-wired certain terms. This meant we avoided the main synonym function and resulted in early articles having their title words being double-counted – so early documents (particularly those mentioning ‘coronavirus’) were put at the top of the results. In the early days this wasn’t an issue but became problematic as time rolled on. This has now been fixed! [Fixed]
  • Ongoing systematic reviews and ongoing clinical trials. Our results are ordered by 3 main elements (1) Relevancy (2) Date and (3) publication score. So, we had a situation where, due to the sheer volume of ongoing trials and reviews (all with high relevancy to the search terms and often many in the last few years) they had started to dominate the results. So, what we have done is to significantly reduce the other element (publication score) with a view to pushing these results down. This has required a complete reindex of the site and this should hopefully be finished early next week [Ongoing].

So, lots has happened and lots more happening next week. Including a visit to the senders of the email. I suspect I’ll have another few things to think about after that!

But, if you’re reading this and think there are other problems with the site please, please, please reach out to me directly and let me know. We’ll hopefully make the site better for you and your fellow users…

What’s next on the Trip development front?

After a prolonged battle we have finally sorted out our email system (switching to AWS). A major problem was that, while over 150k have registered for Trip, many of these are over ten years old and therefore likely to have changed. So, we needed a very intensive effort to weed these defunct ones out. So, now that’s done we can look forward to what’s next. The two immediate bigger jobs we’ll tackle are:

De-duplication: This is part of our commitment to improving the quality of Trip (as opposed to new functionality). We’re aware of a small, but significant, number of duplicate links, for instance:

As you’ll see, results 11 and 13 are the same article. This is hopefully an easy fix as we need to make better use of DOIs in the de-duplication process.

Guidelines: As we know, not all guidelines are great and we’ve been talking about this project for over three years (see original posts Dec 2019 and Jan 2020). Well, we’re hoping to finally start work shortly on this. As well as assigning a quality score for guideline publishers we will also be creating a new category for European guidelines. We are also discussing removing the country filters for free users….!

After that, the next big piece of work is improving our advanced search – in consultation most people favour Ovid’s approach to advanced search. While we don’t use MeSH we can still learn lessons from their approach. But this is in the medium term, so watch this space!

ChatGPT and Trip

Many of you will have seen the hype around ChatGPT and other Large Language Models (LLM). ChatGPT uses artificial intelligence (AI) to generate an answer to various prompts (such as questions) and the answer appears written by humans and in real-time. After a huge amount of initial hype the hype has died down a bit, in part due to a number of issues were raised, mainly related to quality.

Irrespective of the problems we’re keen to have a play with this technology and as such we’re exploring using one such LLM called BioMedLM, a model specifically trained on biomedical data (in theory making it more accurate). The idea being that a user can ask a natural-language question and get an AI generated answer. Applying the trained model to just Trip data (as opposed to all of PubMed) should be an interesting experiment. At the very least the answers are more likely to be based on higher quality content than PubMed alone.

We’re secured the funding to explore this and we’re just lining up the team to deliver. Being an optimist I’d like to think we can deliver on that and, if we do, we’ll be asking for volunteers to test it….!

Blog at

Up ↑