Sometime tomorrow morning (1st July) we will switch over from the old site to the new site.
This has been a massive rewriting of code – it’s taken over 12 months – and it has been well tested over the last three months. However, it’d be naïve to think they’ll be no issues. But our development team are primed to act quickly so, if there are any disruptions, then they shouldn’t take too long to fix.
If a user conducts a search for, say, prostate cancer screening we can say these terms are linked. Now, if someone else searches for breast cancer screening you can see there are linkages between those three terms. But, you can also link back to the previous search via the terms cancer screening. So, why not map them? The below images are based on a really small sample of our clickstream data, but map the connections between search terms.
The above is based on a small sample of search terms around UTI. The one below uses a linked, but different technique:
You can see that there are different circle sizes (representing popularity of search) and some lines are thicker than others, showing these we searched together more often. Below is an easier to read sample of the above:
So what? Why am I sharing?
I can’t help feeling this is useful for highlighting search terms of interest for reviews. For instance, you may have 5 terms in your search, by harnessing the power of linked terms a system may suggest a further ten that may be useful! A form of query expansion perhaps?!
The Trip algorithm is great. To explain, the algorithm is the ‘behind the scenes’ way we order the results you see on the screen. As mentioned, it works great.
However, that’s not to mean it can’t be improved and we are currently working with a number of academics to try to use our data to improve search methods generally (not just Trip). We have an accompanying paper TripClick: The Log Files of a Large Health Web Search Engine. The idea is that, by using our clickstream data (what people search for, what they click on etc), machine learning techniques can be used to improve search results.
What’s particularly exciting is that we have created a competition, pitting different academic centres against each other, to see who returns the best results. Yesterday we had our first academic centre to report results:
The improvement over baseline was large
It was from a team headed by Prof Allan Hanbury at TU Wein, the wonderful lead of Trip’s Horizon 2020 work a few years back.
The competition is likely to run for months and after that it’s a question of taking stock and seeing how we can utilise the techniques within Trip.
If we can improve on our search results, even marginally, it’ll be a great result.