Search

Trip Database Blog

Liberating the literature

Month

August 2016

Experiments in machine learning at Trip

At Trip we like to ‘muck around’ with new techniques to make the site even better.  Sometimes there is a clear reason and other times it’s just to explore these techniques to see what they can offer.  Currently we’re doing lots of work involving machine learning and recently we released our work on the automated assessment of bias in RCTs.  But a few other things we’re involved in:

Word2Vec: Completely speculative and I have no idea what the output will be (I believe that it looks for similarities and relationships between words/concepts).  This is working with Vienna University of Technology (TUW) as part of our Horizon 2020 funded KConnect project.  There is loads of hype around this technique so we thought it was too good an opportunity to not get involved.

Learning to Rank: Again with TUW this is a much more understandable technique.  It is a machine learning technique used to improve the search results.  It’s one of a number of algorithm tweaks we’re attempting and all will be thoroughly tested using interleaving or A/B testing (probably the former).

Document summarisation: Another speculative venture.  Yesterday I saw that Google have opened up something called TensorFlow to support document summarisation.  This is something I’ve been interested in for a while so I contacted my freelance machine learning contact and we agreed to give it a go (he did most of the work on our 5 minute systematic review system).  I’m not sure how document summarisation fits in with Trip but seeing outputs can only help me figure it out.

Hopefully we’ll start seeing results on all these projects before the end of the year.

One important thing to point out (and something I relish) is Trip’s ability to get involved in these projects and get things moving quickly.  The document summarisation work was set up within 12 hours of seeing the announcement of the TensorFlow being opened up (I’d never even heard of it before).  One can only imagine the bureaucratic steps a large organisation would need to go through to even start considering these ground-breaking initiatives.

Trip plays an important role in the health information retrieval ecosystem as we are so innovative.  Larger, better funded, members of the ecosystem observe and copy/adopt where we succeed. It’s classic diffusion of innovations.   I much prefer being at the front of the adoption curve!

Search suggestions

In our recent poll the feature most users wanted to see was a search suggestions function.  Well, we’ve delivered on that and it is freely available on Trip.

Search suggestions

In the image above you’ll see the search suggestions to the right of the search box. The user has done a simple search and we’ve made a number of suggestions to help the user formulate a more focused search.  Clicking on one of those suggestions, for example, ‘breast cancer’ results in a new search for ‘breast cancer’ and further search suggestions, the top ones being:

  • breast cancer screening
  • negative breast cancer
  • breast cancer therapy
  • breast cancer treatment
  • triple negative breast cancer
  • breast cancer metastatic
  • breast cancer risk
  • breast cancer radiotherapy

So, it’s a really simple system to get better search results.  In addition our system is available as you start typing your search in the search box.

The search results system has been created as part of our involvement in the KConnect project (funded via the EU Horizon 2020 scheme).  The team at the Institute of Software Technology and Interactive Systems, Technische Universität Wien (Vienna University of Technology) have taken search suggestions from two sources:

  • PubMed – they have a system which we’ve used for a number of years (but restricted to a user typing in the search box).  This has never been satisfactory and always seemed a bit ‘dry’ – hence wanting to improve on it.
  • The Trip search logs.  Users search Trip thousands of times a day and we start to build up a picture of terms that go together.  We can mine this data to come up with potential search suggestions.

And, being evidence-based, we’re mixing the search suggestions and recording which get clicked.  So, will our users prefer PubMed or search log suggestions?  Either way, the results will help inform future developments of the system.  But, as it stands, the mix is already much better than the PubMed suggestions alone.

The one obvious improvement to make is the design – as it’s fairly poor.  But that will have to wait till we roll out our next new feature – mis-spelling (the second most wanted new feature requested in the poll).  This is near to being released and again has been created with the help of the team at the Institute of Software Technology and Interactive Systems as part of the KConnect project.  When that’s released we’ll get our designer involved to make it look seamless.

Trip, making search simple!

New feature: automated assessment of bias

I love it when we roll out new features and few have been as significant and innovative as this one.  Over the last few months I’ve been working with the wonderful team at RobotReviewer to introduce two major improvements to Trip.

Identification of RCTs.

Trip has featured a search results category called ‘Controlled trials’ for years.  To identify trials we used a filter to highlight trials in PubMed and imported them in to Trip.  This used a series of keywords and was good at identifying trials but was also prone to identifying a number of other articles that were not trials.  In other words there were a number of false positives (ie noise) and we invariably missed a few trials as well.

RobotReviewer used machine learning to identify trials from Trip and it works brilliantly.  In internal tests our controlled trials is about 97% accurate, which is amazing.  The total ‘count’ of trials has dropped by over 200,000 which means they were incorrectly identified by the filter.  So, when using the controlled trials filter you’re significantly more likely to just find trials and avoid the noise of incorrectly identified trials!

Automatic assessment of bias.

Last year the RobotReviewer team published RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials.  The paper concluded:

Risk of bias assessment may be automated with reasonable accuracy. Automatically identified text supporting bias assessment is of equal quality to the manually identified text in the CDSR. This technology could substantially reduce reviewer workload and expedite evidence syntheses.

In short their techniques pretty much matched human ability in assessing bias.  Now, in conjunction with Trip, they have extended their techniques to work on the controlled trials that Trip has: abstracts.  With very little loss of accuracy we have just released this feature (see their blog for more technical details).  In this first image it shows what to expect:

RR1

The ‘Estimate of bias…’ is clickable to reveal:

RR2

This is a significant moment for Trip and I’m delighted that we have this feature.  Assessment of bias is not most people’s idea of fun and if we can help reduce the barriers to using evidence – which we have with this feature – then everyone should be delighted.

Blog at WordPress.com.

Up ↑