Wednesday, April 15, 2015

Evidence Live, systematic reviews and the US Air Force

I'm just back from the wonderful Evidence Live.  While I was away I saw this news story Is the West losing its edge on defence? and I was particularly drawn to the following passage:

The military have also contributed to their own misfortunes by conspiring with defence contractors to build ever more expensive weapons that can only be afforded in much smaller numbers than those they are supposed to replace.

Pierre Sprey, chief designer on the F-16 fighter noted the ruinous consequences of buying stealth aircraft at hundreds of millions of dollars a copy.

"It's a triumph of the black arts of selling an airplane that doesn't work," he said.


This fits in very nicely with my perspective on systematic review methods, and was one of the main threads in my presentation on the future of evidence synthesis.  The current methods of systematic review production are costing way too much for what they deliver.  If you consider that the majority of systematic reviews rely on published trials they are inherently unreliable.

In the EBM world we're buying F-16s...!

More to follow on this theme.

UPDATE: The wonderful Anne Marie Cunningham has pointed out (see comments) has pointed out that the consequence is of buying the really expensive stealth fighters (not F16s).  That's a consequence of rushing a blog post so soon after a vigorous conference!  The point remains - purchasing too expensive planes has caused massive problems. 

Saturday, March 28, 2015

A taste of things to come

It is hopefully self-explanatory...


NOTE: click to enlarge

Sunday, March 08, 2015

The important breakthrough

Trip has been operating for over 15 years and I can easily say we have arrived at the most significant breakthrough yet.  It is still in our 'labs' section and still has much work to do before being rolled out.  But, the path is clear and, finance aside, there is no reason why we can't produce a significant increase in search performance.

In search a really important concept is intention.  So, when a user searches they may add 2-3 search terms but what are they thinking about when they use those terms?  For instance, and this is a true story, I showed Trip to a Professor of Anaesthesiology  and asked for his views on the site.  He came back saying that he was unimpressed!  The reason - his interest was in awareness (as in, when a person is under anaesthetic are they truly anaesthetised or may they be aware) and when you search Trip for awareness you get lots of results, mostly on things like the awareness of public health messages! Another example I use to illustrate the point is the search pain.  We return the same results whether the person is an oncologist or a rheumatologist - which to me is ridiculous - as the intention is likely to be significantly different.  But, to date, there has been no good solution.

The below image (click to enlarge) shows a breakthrough.



In the image above there are 4 sets of results for the same search antibiotics.  This is a test system and not based on the real Trip results.  However, on the left-hand side we have the normal/natural results for the search antibiotics in the test system.  In the top right set of results the natural results have been reordered based on the clickstream activity of the users of Trip, those who have not logged in (85%).  At the simplest level this promotes results that have been clicked on and relegates those that have not been clicked.  It really is more complex than that - but I hope you get the point!

But the bottom right is where the magic it.  Even though it only accounts for 0.2% of the activity, we have reordered the results based on the clickthrough activity of dentists.  There are a few erroneous results, but I'd like to think you can see the effect - dental articles are promoted.

So, the effect of this is that - when we eventually roll out the system - and we know the user is a dentist we improve their results based on the previous activity of other dentists.  The reality is that this technique will work with any speciality and profession.

There are a few issues, the paucity of data is the biggest and we have two significant ways of tackling this:
  • When we roll out the new Trip we will - to a large extent - make login/registration obligatory.  This will mean we get lots more clickstream data which will make the results even better.
  • Machine learning.  We've already worked on machine learning and will bring these techniques to the system to enhance/compliment the clickstream work.
Oh yes, we've even figured out a way to mitigate the effects of filter bubbles.

This really has been a good few weeks.

Saturday, February 21, 2015

The light at the end of the tunnel...

...is, I hope, not the light of an oncoming train. I've nabbed that line from my favourite band - Half Man Half Biscuit (HMHB) who wrote The Light At The End Of The Tunnel (Is The Light Of An Oncoming Train) a good few years ago! My love for HMHB aside, I keep reflecting on how things seem to be going really well for Trip and I'm desperately hoping we've turned a corner.  So, why the optimism:

  • 2014 was pretty good.
  • We're working on the new Freemium version of Trip.  What's going to come out is going to be impressively good and some of the premium upgrades will be great.
  • We're involved in the really interesting EU funded project which will be doing some really innovative things.  I'll blog about that more when the final specifications are agreed, but we'll be looking at making Trip more multi-lingual, we're going to be improving the Trip Rapid Review system and loads of work around similarity which is useful for the next point.
  • Relatedness/similarity is looking very useful for what we want to do with regard developing our financial viability.  The measures we're developing will allow us to do all sorts of interesting things, for instance we can highlight a new book that's useful to a particular clinician, we can highlight a new trial that's pertinent to an existing systematic review.  Many more uses on top of that, but I've got to keep some secrets.
  • I'm starting to realise the value in our clickstream data (helped by two separate teams and soon to be joined by a PhD student as part of the EU project).  You only have to look at most of this year's blog posts to see I'm working hard on this.  This can help with the relatedness work but it can do other useful things, such as improving the search results and better predicting new articles that are of use to a Trip user.  If our mission is to ensure health professionals get the right evidence to support their care - using clickstream data will make it so much more effective.  The advantage of the clickstream data is that it's Trip's data to utilise, it's our IP.  It's at the heart of our future.  I actually think it's this point that's making me so happy/optimistic.
  • Lots of other nice bits and bobs e.g. I've just been invited to lecture in the USA in Autumn/Fall; I'm part of a large consortium bidding to be a support team for complex reviews; I'm presenting at the wonderful Evidence Live; I'm making headway in my new NHS job (I am lead for Knowledge Mobilisation for Public Health Wales); I'm waiting to hear about a large MRC grant (not optimistic but something to look forward to).
Long may this continue!

Tuesday, February 17, 2015

Another use for clickstream data

In the previous post (Clickstream data and results reordering) I highlighted how the clickstream data could be used to easily surface articles that are not picked up by usual keyword searches.  That post highlighted how it could be used to improve search results.  In my mind I was thinking this could help surface documents to improve a clinician trying to answer their clinical questions.

But what about in systematic reviews (or similar comprehensive searches)?  A couple of scenarios spring to mind:
  1. A user conducts a search and find, say 15, controlled trials.  We could create a system that highlights the most connected clinical trials that have not been selected already.  So, possibly an in-built safety check to ensure that no trials are missed.
  2. Related concepts.  You see some spectacularly complex search terms, no doubt human generated.  There may be other systems but we could surface related concepts.  A simple example was shown in the early post (Clickstream data and results reordering) where it highlighted that obesity is related to diet.  OK, we all know that - but the computer didn't, it spontaneously highlighted it.  Doing this on a large scale using Trip's 'big data' will generate more obscure relationships - potentially very useful in generating a comprehensive search strategy!

If there are any systematic reviewers/searchers I'd love to hear what you think!

Saturday, February 14, 2015

Clickstream data and results reordering

Recently I've been discussing the potential for using our clickstream data (our earliest post on the subject being from October 2013).  After a post earlier this year Ok, I admit it, I'm stuck I have been contacted by two separate people who have both been very generous with their time and on Friday I met with one of them who talked me what they had found.

Before I share the results there are a few points to consider:
  • This really is early days and it needs some imagination to see how it would work on Trip.
  • The image below is one trial, simply to illustrate a point.  The results are not based on the full Trip index, just a very small sample.
  • The search is using a very simple text matching for title words only.  So, as you will see in the image below all the articles in the left-hand column have the search term - diet - in the title.



So, what's going on?

The left hand side are the results in this mock-up search.  However, those on the right-hand side have been reordered using simple clickstream data.  Those articles that are surrounded by the light blue colour have been boosted (so appear higher) due to lots of people clicking on them.  Those results surrounded by orange are arguably more interesting - as they don't include the search term in the title!

What this signifies is that users of Trip, while searching the actual Trip, have clicked on the orange articles in the same search session as one of the articles on the left-hand side.  So, it's telling us that the orange articles are related to the normal results - and being inserted into the results - even though they were not matched in our search test by having the word diet in the title.

Trying to describe this in the blog is slightly difficult as I'm not sure if I've explained it particularly well.  I suppose there are two take homes:
  • Clickstream data, even using a small sample, can undercover some really useful articles that a standard keyword search might miss.
  • I am very excited by this, so have faith in that!



Thursday, January 08, 2015

People who looked at this article, also looked at...

In my previous post Ok, I admit it, I'm stuck (a title people seem to really like) I highlighted the difficulty in finding meaning in our clickstream data (the data generated by users interacting with the site).  One thing that I had thought about and a couple of people have subsequently raised is an Amazon style 'People who looked at this article, also looked at this one..', a feature I find really interesting and frequently useful.

So, taking some earlier work on mapping UTI data  I started doing further analysis but it was based on this graph.






I started with an article that looked in an interesting place and picked document 2056462 (Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? from the publication Tools for Practice 2013) and then followed the links from there.  Some have since been removed or updated.  But, we can say that 'People who looked at Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? also looked at...

  • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010)
  • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012)
  • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010)
  • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009)
  • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012)
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
  • Urinary tract infection (lower) - women (NICE Clinical Knowledge Summaries, 2009)
I then, as a way of snowballing, took the last article in the list and did a similar thing, which results in 'People that looked at Urinary tract infection (lower) - women also looked at...

  • Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? (Tools for Practice 2013)
  • Urological infections (European Association of Urology, 2013)
  • Recurrent Urinary Tract Infection (Society of Obstetricians and Gynaecologists of Canada, 2010)
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
  • Urinary tract infection (lower) - men (NICE Clinical Knowledge Summaries, 2010)

Anyway, I hope it's clear what's going on!  On one level it all seems good and interesting in that all the articles seem relevant.  But does it add anything that the initial search wouldn't have found?  To help I've gone through the top list and shown where each of the results appears in the search results (coincidentally the Tools for Practice article came 5th in the results list for a search of urinary tract infection and cranberry):

  • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010) = Result #38
  • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012) = Result #18
  • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010) = Result #7
  • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009) = Result #14
  • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012) = Result #2
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013) = Result #13
  • Urinary tract infection (lower) - women (NICE Clinical Knowledge Summaries, 2009) = Result #54
To me these results are interesting!  The clear 'outliers' are the top and bottom results which appeared in result number 38 and 54 respectively.  This is important as it means that they are much less likely to be seen - especially the latter one which would be on the third page of results.

Is this useful?

It will highlight different articles than found from browsing the search results, but is there a cost?  Will users look less at our algorithmic results (the normal results) and rely on these 'human' results?  If so, is that good or bad?  I actually think it'll encourage people to explore more and spend longer on the site - so I don't think it'll have a negative consequence.

This is really interesting!

I'm really tempted to open a can of worms by asking if there is any coherence/rationality as to how the linked articles list is generated.  However, as the above list is based on only a sample of data it'd be wrong to place too much weight on things.  Also, even if it is random, so what!?

Finally, I've even graphed this out (in not too an appealing way):