Search

Trip Database Blog

Liberating the literature

Author

jrbtrip

LLMs and Clickstream

We’re still playing with LLMs and and it’s going interestingly. We’re moving away from a chatbot approach (like ChatGPT) and have started exploring the use of these on our clickstream data. We took the NICE guideline Acute heart failure: diagnosis and management and asked ChatGPT to analyse all the search terms used to find the document (in our sample this was 40 separate searches) and to suggest related concepts. It reported:

These search terms are related to the topic of heart failure, its diagnosis, and management, as well as related conditions such as acute coronary syndrome and pulmonary edema. Specific medications like Sodium nitroprusside and interventions like Percutaneous coronary intervention and continuous positive airway pressure (CPAP) are also mentioned.

Related search terms could include:

  • Cardiac biomarkers
  • Echocardiogram
  • B-type natriuretic peptide (BNP)
  • Angiotensin-converting enzyme (ACE) inhibitors
  • Diuretics
  • Left ventricular assist device (LVAD)
  • Coronary artery disease (CAD)
  • Myocardial infarction (MI)
  • Chest pain
  • Smoking cessation
  • Pain management

It is important to note that seeking medical advice from a healthcare professional is always recommended for proper diagnosis and treatment.

It’s interesting, but is it useful?

I also took a load of documents related to pregnancy and UTI and asked ChatGPT to find the most popular and look for themes. It reported:

The most popular documents appear to be:

  • Interventions for preventing recurrent urinary tract infection during pregnancy (appears multiple times)
  • Urinary Tract Infections in Pregnancy (Treatment) (appears multiple times)
  • Treatments for symptomatic urinary tract infections during pregnancy (appears multiple times)

The themes that emerge from the list of documents include:

  • Urinary tract infections during pregnancy, including their prevention, treatment, and recurrence
  • Guidelines and recommendations for the management of UTIs in adults, particularly uncomplicated cases
  • Bacterial infections during pregnancy, including Group B Streptococcal Bacteriuria and sepsis
  • Azoospermic males and their work-up
  • Ectopic pregnancy and miscarriage.

The popular articles look good, less convinced by the themes (azoospermic males?)

One to reflect on and if you have any ideas what we could look at, please let us know: jon.brassey@tripdatabase.com.

Quality, quality, quality

Now the recoding of the site is in the past we’re looking forward to adding new features to the site and improving the quality of what is already there. This latter was brought into focus by a strong, but fair, email about issues with Trip. Perhaps there was an element of complacency on our part (subscribers are increasing, we had the rewrite under our belt and little negative feedback). Perhaps we took our eye off the ball. So, it was a great email to receive – removing the complacency and reminding us that quality is really important.

So, what have we done and/or are doing in the short-term:

  • Mentioned in the last blog we’re working on a better de-duplication system. There are a small, but significant, number of duplicate articles which we will soon be removing [Work starting next week]
  • Dual categories – another blog post highlighting some articles are classified as, say, a ‘systematic review’ and ‘primary research’. This has now been fixed! [Fixed]
  • Mis-classified results – linked to the above, some articles were appearing in the wrong filter. So, a user might select a ‘guideline’ and be shown a ‘systematic review’! [Work starting next week]
  • Covid-19 – I really dropped the ball here! At the start of Covid-19 we tried to keep up with things, which meant some shortcuts were taken. To speed up the synonym aspect we hard-wired certain terms. This meant we avoided the main synonym function and resulted in early articles having their title words being double-counted – so early documents (particularly those mentioning ‘coronavirus’) were put at the top of the results. In the early days this wasn’t an issue but became problematic as time rolled on. This has now been fixed! [Fixed]
  • Ongoing systematic reviews and ongoing clinical trials. Our results are ordered by 3 main elements (1) Relevancy (2) Date and (3) publication score. So, we had a situation where, due to the sheer volume of ongoing trials and reviews (all with high relevancy to the search terms and often many in the last few years) they had started to dominate the results. So, what we have done is to significantly reduce the other element (publication score) with a view to pushing these results down. This has required a complete reindex of the site and this should hopefully be finished early next week [Ongoing].

So, lots has happened and lots more happening next week. Including a visit to the senders of the email. I suspect I’ll have another few things to think about after that!

But, if you’re reading this and think there are other problems with the site please, please, please reach out to me directly jon.brassey@tripdatabase.com and let me know. We’ll hopefully make the site better for you and your fellow users…

What’s next on the Trip development front?

After a prolonged battle we have finally sorted out our email system (switching to AWS). A major problem was that, while over 150k have registered for Trip, many of these are over ten years old and therefore likely to have changed. So, we needed a very intensive effort to weed these defunct ones out. So, now that’s done we can look forward to what’s next. The two immediate bigger jobs we’ll tackle are:

De-duplication: This is part of our commitment to improving the quality of Trip (as opposed to new functionality). We’re aware of a small, but significant, number of duplicate links, for instance:

As you’ll see, results 11 and 13 are the same article. This is hopefully an easy fix as we need to make better use of DOIs in the de-duplication process.

Guidelines: As we know, not all guidelines are great and we’ve been talking about this project for over three years (see original posts Dec 2019 and Jan 2020). Well, we’re hoping to finally start work shortly on this. As well as assigning a quality score for guideline publishers we will also be creating a new category for European guidelines. We are also discussing removing the country filters for free users….!

After that, the next big piece of work is improving our advanced search – in consultation most people favour Ovid’s approach to advanced search. While we don’t use MeSH we can still learn lessons from their approach. But this is in the medium term, so watch this space!

ChatGPT and Trip

Many of you will have seen the hype around ChatGPT and other Large Language Models (LLM). ChatGPT uses artificial intelligence (AI) to generate an answer to various prompts (such as questions) and the answer appears written by humans and in real-time. After a huge amount of initial hype the hype has died down a bit, in part due to a number of issues were raised, mainly related to quality.

Irrespective of the problems we’re keen to have a play with this technology and as such we’re exploring using one such LLM called BioMedLM, a model specifically trained on biomedical data (in theory making it more accurate). The idea being that a user can ask a natural-language question and get an AI generated answer. Applying the trained model to just Trip data (as opposed to all of PubMed) should be an interesting experiment. At the very least the answers are more likely to be based on higher quality content than PubMed alone.

We’re secured the funding to explore this and we’re just lining up the team to deliver. Being an optimist I’d like to think we can deliver on that and, if we do, we’ll be asking for volunteers to test it….!

Confusing dual categories

In the image below you will see that the user has restricted the category to ‘Primary research’:

So, you can understand their confusion when the top result is classed as a ‘Systematic review’. This is not the first time it has been raised and, as it’s a recurring source of confusion, we need to act!

The issue arises because we treat ‘Primary research’ to mean ‘Journal articles’. So, in the example above the systematic review was published in a journal – so it gets two categories and we display the highest.

So, our plan is to make sure that any dual category article (primary research and systematic review) is treated only as a systematic review. This means a stricter interpretation of ‘Primary research’ (excluding systematic reviews, which are secondary research).

No timeline on this but it’s in planning. And thank you to Adam T for raising it with us.

Full text articles in Trip

Trip tries to link to complete articles where these are available. For many evidence types (eg clinical guidelines and evidence-based synopses) this is fairly straightforward. However, when you look at articles obtained from journals it gets complicated as many are behind a paywall, with only the abstract freely available. But Trip has a few bits of functionality that can help.

Here are the main ways of linking to full-text articles in Trip:

Pro subscription: Working with third-party organisations we identify freely available full text (from places like PubMed Central, other full-text repositories and authors archive copies). If a Pro user searches Trip we always place the full-text, if we have it, as the main link-out. User’s can still find the PubMed abstract via the link to the right of the article:

Approximately 70% of all the journal articles we include in Trip link out to full text versions. In the top example, if the user wishes to see the abstract, they simply click on the abstract icon (the one on the right hand side).

Free users have a different experience as they would see this:

This shows, to the user, that we have the full text in our system and if they were to subscribe they could link out to it directly.

Link resolvers: These are a tool to link articles in Trip to an organisation’s subscription journals. So, if an institution has a subscription to, say, NEJM it makes sense that a user from that organisation can easily access the full-text. Unfortunately, this technology is ‘dumb’ in that it doesn’t know if the institution actually subscribes to the journal or not. So, it’s a bit hit and miss. As such we use this icon to signify that you may be able to access a full text copy:

The title of the document still links to the PubMed abstract, while the icon is displayed to the right of the title and the user needs to click on that to attempt to access the full text. Often, if the full-text isn’t available, there is a library holding page which offers further support regarding accessing the full text.

LibKey: Is certainly not ‘dumb’ as it knows – thanks to the work of an organisation’s librarians – what journals an organisation subscribes to. LibKey is a great tool and the team behind it (Third Iron) are great to work with! When an institution has LibKey we work with them to ensure we seamlessly link out to known subscriptions. But, two things to point out:

  • There is no special icon to signify it’s a LibKey link out – you’ll just see the usual ‘full text’ icon (as mentioned above).
  • If we have the full-text in our repository we link to that, in preference to any LibKey links. So, often (as we have full text links to around 70% of all articles) you’ll not find many link outs via LibKey! Notwithstanding that, LibKey is still a great resource to use and one we know our users appreciate.

If you have any further questions on full-text options then send us an email fulltext@tripdatabase.com.

 

We’re back

We’re still trying to unpick what went on but the important thing is that Trip is back!

That was the longest outage, by a huge margin, in our history. This has been unpleasant for everyone and I’m sorry about that!

The site is down and has been for over 12 hours…

We are really sorry about our site being down.

The issue is with our server, hosted by Rackspace. Yesterday afternoon we had to restart it (a process that normally takes a couple of minutes) and in doing so it resulted in it automatically updating the Windows software – this took ages, got to 100% and then it just stopped working. Rackspace engineers then tried to get it to work but they have clearly been unsuccessful! The last we heard they are trying to roll back the changes. We are trying everything we can to get Rackspace to speed things up!

Once again, we’re sorry for this loss of service.

PICO search – it’s a special type of search

Introduced over ten years ago, the PICO search is very popular with our users. However, it is frequently misunderstood. Yesterday we received an email from a user:

We are finding different results between advanced search and PICO search. We are using the same terms. Is this possible? Is there any way to solve it?

So, what’s going on? PICO search is not designed to be an exhaustive search, it is designed to find a small number of highly relevant results. Or, put it another way, it’s a very specific search not a sensitive one.

At the heart of the PICO search is something called contingency searching, which helps us deliver a specific search.  After the user enters their search term our first search is for all the PICO elements as title only searches. If there are too few results we then make the final search term entered a ‘title and text’ search and repeat the search and if that too has too few results we make the penultimate term a ‘title and text’ and we repeat that until we get a manageable number of results. All these repeated searches are done in the background; from a user’s perspective it’s a single search.

So, if you want lots of results use the default or advanced search. However, if you want a more focussed set of results use PICO (although the default search is pretty good as well)!

NOTE: Do not use Boolean search terms (AND, OR, NOT).

Blog at WordPress.com.

Up ↑