Search

Trip Database Blog

Liberating the literature

Author

jrbtrip

Connected articles: personalised search results

Connected articles launched at the weekend and we’ve explained how it works here. But this blog is more about why we’ve introduced it and the benefits.

Every time you search and click on an article the system starts to ‘understand’ your interests. This is important as it can be very difficult to convey, via a handful of search terms, what your intention is. In search, user intention is vitally important. Two users might both search for the same thing e.g. prostate cancer screening yet one is interested from the public health perspective while the other might be interested in the best test to use.

However, while search terms might hide the intention user’s clicks quickly ‘reveal’ their actual intention by clicking on document that they feel might answer the question they have. So, a public health search might click on articles that discuss the cost-benefit of screening at a population level. While someone else might click on articles comparing PSA to DRE.

The Secret Sauce: Co-Clicks, Semantics, and Citations

So, what makes Connected articles so clever? Three words: co-clicks, semantics, and citations.

Our system takes the data available from the above sources and combines them using a special algorithm to ensure the most closely connected appears top.

Ok, so why should you care?

  • You’ll find articles your keyword search might have missed. You might have searched for atrial fibrillation but a closely related – and useful – article is on arrythmias. Different clinical terms but closely connected.
  • You’ll save time. Results more focussed on your interests will mean fewer articles to look at to obtain the answers you need.
  • Safety! Whatever search you’ve done – superficial or in-depth – Connected articles is a really useful tool to ensure you’ve not missed really important articles!

An example of using Connected articles

Using a search for urinary tract infections we clicked on three articles on a similar topic (can you guess what that might be?) below are the top 6 results, but you can scroll down through the results and see many more:

And the topic, I’m sure you can see it’s concentrated the results down to cranberry juice and UTIs!

Convinced? Curious? Sceptical? Give Connected articles a try and tell us what you think. Find an unexpected article that delighted or surprised you? Share it with us! We want you to be delighted!

Understanding Connected Articles

NEW: Watch our explainer video.

Connected articles is a system that is designed to find articles similar or linked to articles a user has already clicked on. This can be incredibly useful as users often search in an imprecise way and if they only look at a page or two of results they may miss some important articles. Connected articles looks for connections between document and helps unearth hidden/missed gems!

Connected uses three sources of information to find the connections:

  • Clickstream data – when a user searches and clicks on more than one document, we infer that the documents clicked are connected by the users intention.
  • Citation data – for every document clicked we explore articles that the article cited (as references in the document) and we also looked for articles that have cited document as well. This is also known as forward and backward citation searching. This data is restricted to articles that have a DOI (which is over 95%)
  • Related articles – we use the data from PubMed’s “similar articles” feature and grab the top 20 articles deemed to be most semantically similar to the article(s) clicked.

We take these three types of connections and combine them, using a special algorithm, to create a list of results. Those deemed most closely connected – using all three sources – appear at the top:

NOTE: Free users of Trip only see the first three results. Pro subscribers get no such restriction.

Connected articles: release imminent

We are planning on releasing our Connected articles feature on Sunday.

Connected articles is an amazing tool to unearth closely connected articles to the articles a user has already clicked. It uses three types of connection data:

  • Clickstream/co-click data
  • Citations (forward and backward)
  • Semantic similarity

For every article opened by a user we look for connections from any of the three sources mentioned above. These are then algorithmically combined and presented to the user. The results will look like this:

Advanced search – is this about right?

We’ve been working on the advanced search for a while now and it has undergone one round of testing in an un-designed format. Based on feedback we’ve arrived at something like this:

The bottom half is where you build individual lines of queries and these appear at the top of the page. You then combine them using the AND, OR and NOT commands.

What do you think?

Advanced search update

After the recent post, asking for volunteers, we have had a number of users testing the advanced search. We have finished analysing these and most of them relate to design and usability. This is good news as it means things will probably move more quickly (as no massive debugging operation is needed).

So, we will shortly pass the feedback on to our designer and after that we can deploy. Hopefully this will be by early November.

More experiments with LLMs/ChatGPT

This is really interesting. We’re busy looking at ways of using the power of LLMs to improve the user experience on Trip. One recent set of experiments relates to automated evidence reviews. While the following was semi-automatic it it is entirely possible to fully automate this. The review was exploring Brexpiprazole in schizophrenia and the overall report was three pages long and this included asking the system to generate both a conclusion and a clinical bottom line:

 

Clinical bottom line: Brexpiprazole shows efficacy in treating schizophrenia with most randomized controlled trials reporting it as ‘effective’, especially at non-low doses. However, its effectiveness compared to certain other antipsychotics can vary, and further studies are needed to assess its cognitive benefits.

Conclusion: Brexpiprazole’s efficacy in treating schizophrenia has been studied in several contexts. A systematic review from 2022 has raised concerns about the efficacy of low-dose brexpiprazole as it was not found to be superior to placebo or standard-dose brexpiprazole. However, another systematic review from 2023 suggests that while brexpiprazole may be less effective than a specific dose of lurasidone (160 mg), it generally demonstrates comparable efficacy and safety to both lurasidone and cariprazine.

Guidelines have noted some promising indications of brexpiprazole’s efficacy on cognitive performance in the context of second-generation antipsychotic compounds. Still, they have highlighted the need for further studies to accurately evaluate the extent of such benefits.

Randomized controlled trials (RCTs) have provided mixed but mostly positive feedback. The majority of the RCTs mentioned rated the effectiveness of brexpiprazole as ‘effective’. Notably, these trials reported consistent improvements in PANSS scores, maintained improvements over time, and a favorable side-effect profile. Only one RCT gave a ‘neutral’ rating, mainly due to mixed findings and potential bias introduced by the known side effects of an active reference.

Taking into account the weight typically given to systematic reviews and guidelines, the evidence suggests that brexpiprazole has some efficacy in treating schizophrenia, especially when not considering the low-dose. However, its effectiveness compared to some other antipsychotic compounds may vary, and further studies are recommended to determine its full range of benefits, especially in cognitive performance.

This is amazing technology and we’ll continue to test the capabilities.

 

Advanced search: volunteers need to test!

We have just pushed out the revamped Advanced Search onto our test site:

We’re testing it in-house but are keen to get some external users to test it. So, if you are interested please contact me via jon.brassey@tripdatabase.com.

Relevancy: Experiments with ChatGPT

As mentioned in the previous post we have been spending a lot of effort trying to improve our search and the current focus (possibly obsession) is removing low relevancy results from the search.

TLDR long documents might mention the search term only once, in say 50,000 words. In that situation it’s almost an incidental result – but it’s still a true hit as it contains the user’s search terms even though it’s irrelevant to the user’s intention. One approach we have tried is to create a pseudo-abstract of guidelines – typically long documents – to see how that fared (by removing terms not linked to the core themes of the guideline). And here’s an example search taken from our testing site:

This image shows a search for diabetes and metformin and it returns 19 UK guidelines and the top results all look good. However, one was Guidelines for the investigation of chronic diarrhoea in adults. This contains the word metformin 1 time and diabetes 8 times in a 21 page document. So, another example of a result that is a poor match! This next image is when we searched just the ChatGPT summary:

3 results, so removing 16 results, including Guidelines for the investigation of chronic diarrhoea in adults. So, that’s good. However, it also removed Diabetes (type 1 and type 2) in children and young people: diagnosis and management, from NICE. In this 91 page document it mentions metformin 28 times. It is entirely feasible that a user, searching for diabetes and metformin, might think the NICE document was relevant!

Bottom line: Using the ChatGPT summary, as we have, means the search is too specific. So, on to the next approach….

Relevancy

Relevancy is a key element of search. A user types a term and the intention it to retrieve a document related to the term(s) used. But relevancy is relative and in the eye of the beholder. If someone searches for measles and the document has measles in the title, then it’s clear it’s relevant. But there might be another document, about infectious diseases, which has a chapter on measles. The document is 10,000 words long and has 50 mentions of measles = 0.5%. So, that seems a reasonable matche.

But what about a 100,000 word document, entitled prostate cancer, which mentions measles once = 0.001%. The document is a true match – as in it mentions the search term – but the reality is it’s clearly not about measles. Another example from a recent presentation I gave:

It’s a contrived example, but helps illustrate the issue!

For most searches this isn’t really a big deal as most of the time the top results will always be relevant. If the search returns 50 pages of results the low relevancy results will appear towards the end of the search – say from page 40. Not many people go to that results page – so it’s not an issue.

However, it is an issue when you have few results – either a very specific search OR if you click on a filter (eg UK guidelines) – then if 75% of the results are relevant and 25% poor – you can see some fairly poor results even on the first page. True hits as they contain the search terms but not really relevant to the user’s intention!

So, we’re exploring multiple options to help for instance An alternative search button? But another approach is to summarise long documents into shorter ones – so removing very low frequency words. We’ve experimented with ChatGPT and that summarised too much, so the search went from too sensitive to too specific. So, another approach is to do text analysis to explore word frequency (how often a word appears in a document) and remove those terms that are rarely mentioned (perhaps remove those terms only mentioned 1-3 times (depending on the document length.

We took one NICE guideline and analysed the frequency of words across the document and it looks like this:

The Y-axis denotes the number of times words appear in the document. With more granularity:

So, we’re going to run some tests where we remove terms mentioned 1 time, 2 times or 3 times (so, three separate tests). These don’t remove many terms but will hopefully remove the terms that cause problematic sensitivity. In the search example above, removing terms that appear once, would remove the term measles, while removing terms that are mentioned twice will remove prostate cancer.

This issue has been frustrating me for years so hopefully we’re edging closer to solving it!

Blog at WordPress.com.

Up ↑