Search

Trip Database Blog

Liberating the literature

Month

April 2024

Using document clustering to show evolution of a topic area

Mpox (formerly known as Monkeypox) became a WHO designated Public Health Emergency of International Concern (PHEIC) between 23 July 2022 and 10 May 2023. Using the Carrot2 technology to cluster text (as shown yesterday) we thought it might be interesting to look at cluster pre and post the outbreak. So, we did two searches:

  1. Documents with Mpox or Monkeypox in the title published between 1980-2021. This yielded 104 results.
  2. Documents with Mpox or Monkeypox in the title published between 2022-2024. This yielded 679 results – showing a huge interest in the topic.

1980-2021

2022-2024

Or, looking at the top ten topic areas in a different format:

Small numbers pre-2022 but clear difference in topic areas. And, as an EBM source, nice to see a prominence for including systematic reviews.

Clustering search results using Carrot2

I’ve been wanting to use Carrot2 for ages and have finally got the chance… Carrot2, as it says on their website “Carrot2 organizes your search results into topics. With an instant overview of what’s available, you will quickly find what you’re looking for

So, two examples to show you. The first is for a search for ‘prostate cancer screening’ and we used Carrot2 to process just over 250 results and this is the output:

Treemap view

List view

The second example is using just under 700 results for the search ‘arteriovenous malformation’

Treemap view

List view

I can’t help feeling this topic clustering might be useful in a number of situations. For instance it’d be a nice way of refining your search or it could be useful to give you an overview of the topics covered. Let me know what you think either via comments or via email jon.brassey@tripdatabase.com.

Moving to the cloud

Since the start of 2024 the vast majority of our development time has been taken up with moving Trip onto ‘the cloud’. Currently we use a dedicated server, this has served us well but the server was getting old and needed replacing, so moving to the cloud was a no-brainer. We didn’t expect it to take quite so long and we’re currently having to reindex over 5 million documents. This process will be over in the next day or so and then it’ll be on to testing, which will hopefully not take long.

Broadly, users shouldn’t notice any difference. We’ve upgraded the underlying search software which has improved relevancy scoring so that might make the results order change – but I can’t imagine that being huge. I’ll update, if necessary, during testing.

Sorry, boring post saying we’re busy doing stuff, in the background, that you’ll not notice – but I wanted to be transparent. After that we can move on to forward facing developments.

Understanding simple searches – poll

Our previous post highlighted the fact that many of our top searches are for single concepts e.g. asthma, pregnancy, aspirin. In some situations this sort of search is fine but in others it might be imprecise.

If you use simple searches can you answer the following question please..:

If there is another reason, not listed above, please let us know. This can be done via comments or email: survey@tripdatabase.com.

Blog at WordPress.com.

Up ↑