Saturday, February 21, 2015

The light at the end of the tunnel...

...is, I hope, not the light of an oncoming train. I've nabbed that line from my favourite band - Half Man Half Biscuit (HMHB) who wrote The Light At The End Of The Tunnel (Is The Light Of An Oncoming Train) a good few years ago! My love for HMHB aside, I keep reflecting on how things seem to be going really well for Trip and I'm desperately hoping we've turned a corner.  So, why the optimism:

  • 2014 was pretty good.
  • We're working on the new Freemium version of Trip.  What's going to come out is going to be impressively good and some of the premium upgrades will be great.
  • We're involved in the really interesting EU funded project which will be doing some really innovative things.  I'll blog about that more when the final specifications are agreed, but we'll be looking at making Trip more multi-lingual, we're going to be improving the Trip Rapid Review system and loads of work around similarity which is useful for the next point.
  • Relatedness/similarity is looking very useful for what we want to do with regard developing our financial viability.  The measures we're developing will allow us to do all sorts of interesting things, for instance we can highlight a new book that's useful to a particular clinician, we can highlight a new trial that's pertinent to an existing systematic review.  Many more uses on top of that, but I've got to keep some secrets.
  • I'm starting to realise the value in our clickstream data (helped by two separate teams and soon to be joined by a PhD student as part of the EU project).  You only have to look at most of this year's blog posts to see I'm working hard on this.  This can help with the relatedness work but it can do other useful things, such as improving the search results and better predicting new articles that are of use to a Trip user.  If our mission is to ensure health professionals get the right evidence to support their care - using clickstream data will make it so much more effective.  The advantage of the clickstream data is that it's Trip's data to utilise, it's our IP.  It's at the heart of our future.  I actually think it's this point that's making me so happy/optimistic.
  • Lots of other nice bits and bobs e.g. I've just been invited to lecture in the USA in Autumn/Fall; I'm part of a large consortium bidding to be a support team for complex reviews; I'm presenting at the wonderful Evidence Live; I'm making headway in my new NHS job (I am lead for Knowledge Mobilisation for Public Health Wales); I'm waiting to hear about a large MRC grant (not optimistic but something to look forward to).
Long may this continue!

Tuesday, February 17, 2015

Another use for clickstream data

In the previous post (Clickstream data and results reordering) I highlighted how the clickstream data could be used to easily surface articles that are not picked up by usual keyword searches.  That post highlighted how it could be used to improve search results.  In my mind I was thinking this could help surface documents to improve a clinician trying to answer their clinical questions.

But what about in systematic reviews (or similar comprehensive searches)?  A couple of scenarios spring to mind:
  1. A user conducts a search and find, say 15, controlled trials.  We could create a system that highlights the most connected clinical trials that have not been selected already.  So, possibly an in-built safety check to ensure that no trials are missed.
  2. Related concepts.  You see some spectacularly complex search terms, no doubt human generated.  There may be other systems but we could surface related concepts.  A simple example was shown in the early post (Clickstream data and results reordering) where it highlighted that obesity is related to diet.  OK, we all know that - but the computer didn't, it spontaneously highlighted it.  Doing this on a large scale using Trip's 'big data' will generate more obscure relationships - potentially very useful in generating a comprehensive search strategy!

If there are any systematic reviewers/searchers I'd love to hear what you think!

Saturday, February 14, 2015

Clickstream data and results reordering

Recently I've been discussing the potential for using our clickstream data (our earliest post on the subject being from October 2013).  After a post earlier this year Ok, I admit it, I'm stuck I have been contacted by two separate people who have both been very generous with their time and on Friday I met with one of them who talked me what they had found.

Before I share the results there are a few points to consider:
  • This really is early days and it needs some imagination to see how it would work on Trip.
  • The image below is one trial, simply to illustrate a point.  The results are not based on the full Trip index, just a very small sample.
  • The search is using a very simple text matching for title words only.  So, as you will see in the image below all the articles in the left-hand column have the search term - diet - in the title.



So, what's going on?

The left hand side are the results in this mock-up search.  However, those on the right-hand side have been reordered using simple clickstream data.  Those articles that are surrounded by the light blue colour have been boosted (so appear higher) due to lots of people clicking on them.  Those results surrounded by orange are arguably more interesting - as they don't include the search term in the title!

What this signifies is that users of Trip, while searching the actual Trip, have clicked on the orange articles in the same search session as one of the articles on the left-hand side.  So, it's telling us that the orange articles are related to the normal results - and being inserted into the results - even though they were not matched in our search test by having the word diet in the title.

Trying to describe this in the blog is slightly difficult as I'm not sure if I've explained it particularly well.  I suppose there are two take homes:
  • Clickstream data, even using a small sample, can undercover some really useful articles that a standard keyword search might miss.
  • I am very excited by this, so have faith in that!



Thursday, January 08, 2015

People who looked at this article, also looked at...

In my previous post Ok, I admit it, I'm stuck (a title people seem to really like) I highlighted the difficulty in finding meaning in our clickstream data (the data generated by users interacting with the site).  One thing that I had thought about and a couple of people have subsequently raised is an Amazon style 'People who looked at this article, also looked at this one..', a feature I find really interesting and frequently useful.

So, taking some earlier work on mapping UTI data  I started doing further analysis but it was based on this graph.






I started with an article that looked in an interesting place and picked document 2056462 (Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? from the publication Tools for Practice 2013) and then followed the links from there.  Some have since been removed or updated.  But, we can say that 'People who looked at Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? also looked at...

  • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010)
  • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012)
  • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010)
  • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009)
  • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012)
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
  • Urinary tract infection (lower) - women (NICE Clinical Knowledge Summaries, 2009)
I then, as a way of snowballing, took the last article in the list and did a similar thing, which results in 'People that looked at Urinary tract infection (lower) - women also looked at...

  • Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? (Tools for Practice 2013)
  • Urological infections (European Association of Urology, 2013)
  • Recurrent Urinary Tract Infection (Society of Obstetricians and Gynaecologists of Canada, 2010)
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
  • Urinary tract infection (lower) - men (NICE Clinical Knowledge Summaries, 2010)

Anyway, I hope it's clear what's going on!  On one level it all seems good and interesting in that all the articles seem relevant.  But does it add anything that the initial search wouldn't have found?  To help I've gone through the top list and shown where each of the results appears in the search results (coincidentally the Tools for Practice article came 5th in the results list for a search of urinary tract infection and cranberry):

  • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010) = Result #38
  • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012) = Result #18
  • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010) = Result #7
  • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009) = Result #14
  • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012) = Result #2
  • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013) = Result #13
  • Urinary tract infection (lower) - women (NICE Clinical Knowledge Summaries, 2009) = Result #54
To me these results are interesting!  The clear 'outliers' are the top and bottom results which appeared in result number 38 and 54 respectively.  This is important as it means that they are much less likely to be seen - especially the latter one which would be on the third page of results.

Is this useful?

It will highlight different articles than found from browsing the search results, but is there a cost?  Will users look less at our algorithmic results (the normal results) and rely on these 'human' results?  If so, is that good or bad?  I actually think it'll encourage people to explore more and spend longer on the site - so I don't think it'll have a negative consequence.

This is really interesting!

I'm really tempted to open a can of worms by asking if there is any coherence/rationality as to how the linked articles list is generated.  However, as the above list is based on only a sample of data it'd be wrong to place too much weight on things.  Also, even if it is random, so what!?

Finally, I've even graphed this out (in not too an appealing way):

Saturday, January 03, 2015

Ok, I admit it, I'm stuck

I've been talking about article social networks for a while, and last August I wrote 'Beauty is in the eye of the beholder' which contained the image below.

I've continued to be fascinated by them and below are two more images - focused on defined areas of the above graph






These are beautiful - but is there more to it?

Both images show definite structure.  So, our users, simply by using the site are adding structure and energy.  I keep getting drawn to the principle of entropy.  I'm absolutely sure that our users are ordering the articles in Trip but does that have any value?

I admit to being relatively clueless - part of the purpose of the post is to see if the wisdom of the Trip users can be brought to bear to try and help me figure out what the above might mean and what might the next steps be!




The above image (taken from Article social networks, meaning and redundancy) shows distinct clusters as well.  In the bottom left is a cluster of articles on UTI and cranberry and it consists of 19 articles.  If you do a search of Trip you find many more than this.  So, our users are not clicking on many articles - so as well as adding structure are they giving us clues as to articles that aren't worthwhile (based on their collective judgements)?

If you click on one article in that cluster, is it likely that the others will be worthwhile?  What about if a new article is published and joins the cluster based on another person searching and effectively adding the article to the cluster - is that useful?  I'm sure there are no absolutes, but these appear to be hints - surely?

A final thought - the graphs are based on all users.  I imagine the above graph would look different if the user had been a general/family practitioner compared with, say, a urologist.  Stronger clues?

I would be absolutely delighted if anyone can help me figure out the value/meaning of the data.  And, if you can think of ways of working together I'd be delighted to see how we can share the data!

Friday, January 02, 2015

2014, looking back with pride

At the end of 2013 I did a review of the year and now, in early 2015, I thought I'd repeat the exercise for 2014!

First, the stats:

  • We had over 3,600,000 page views.
  • We are up to registered user number 140,000.  However, the standard view is to discount the number by 10-20% for users who no longer use the account and/or spam accounts.  So, we probably have 115-125,000 registered users.
  • The average duration on the site continues to increase 5.08 minutes compared with 4.11 in 2013. This is mirrored in the number of pages per session, increasing from 3.26 to 3.89
  • The bounce rate (people who just visit one page and then leave without engaging) has decreased by 20%
The above represents an ongoing trend which is seeing less unique users but the 'quality' is higher in that the users are more engaged and making better use of the site.  It is this engagement that is so satisfying, much more important than some - ego boosting - headline of number of unique visitors (although 3.6 million page views is quite impressive)!

Financial insecurity has been a recurring theme for Trip and I'm really pleased as I think we're fine for now and this is based on two facts:
  • We've secured a couple of grants recently which help in any number of ways.
  • We've finally arrived at a business model (freemium) which we will roll out in March (I hope).  I'm optimistic as we'll be offering a great premium offering and hopefully a number of users and institutions will sign-up.
At the end of 2013 I reported on the disappointment of missing out on an honorary professorship but I was very pleased to be given an honorary fellowship at the Centre for Evidence-Based Medicine (CEBM) at Oxford University.  The CEBM runs the wonderful Evidence Live series of conferences and I'll be involved again in the session 'EBM into Practice: Future of evidence synthesis: a new paradigm' which will be alongside Carl Heneghan, Martin Burton and Tom Jefferson.

Other bits and bobs from the year:
  • One of the grants was from the EU Horizon 2020 funding and will see me getting involved in lots of interesting research relating to multi-lingual search as well as a big chunk of machine reading and learning, including an overhaul and enhancement to our rapid review system.
  • My role in Public Health Wales (PHW) seems to be working itself out as I was given the role of lead for knowledge mobilisation (a term I dislike) and I've just finished a draft strategy on making PHW more 'evidence-based'.  I believe my role will then move into delivering on the strategy - which should be a nice challenge.
  • I've continued to conduct work in the social networks of articles with the huge support of the wonderful Valdis Krebs.  As a little treat I've added two images of further analysis below - happy to share more if anyone is interested!







Other than the above there have been so many other things but many are important to me but probably less so to others.

There is also another, really major, project I'm starting to explore but for various reasons I can't share now.  But it builds on the answer engine concept but there is the potential for Trip to work with a huge commercial partner.

Finally, a very large thank you to:
  • The users, without you Trip would be nothing!
  • Those users that completed the various surveys.
  • The members of the Trip advisory board for being very generous with your time and your collective knowledge/wisdom.
  • The many incredible people who I have interacted with - I really am lucky. 

2014 has been great and I hope - given the reduced financial stress - 2015 will be even better.

Tuesday, December 16, 2014

Creating a Q&A environment in Trip

For those of you who've followed this blog for a while will see that I'm always revisiting the answer engine concept, most recently two months ago. A month before that I mentioned it in the context of a a Journal of Clinical Q&A

This all stems from my belief that Trip is a wonderful tool to answer clinical questions but a also belief that it could be even better!  After all, it was the reason I started it in the first place - to help me answer clinical questions via the ATTRACT Q&A serviceSurveys have shown that many clinicians agree, with over 70% of questions, supporting clinical care, are helped by using Trip.

Recapping briefly on the answer engine and the Journal of Clinical Q&A:

  • The answer engine will try to predict questions from the search terms and insert an answer above the search results.  Users will get an answer in one click.
  • Journal of Clinical Q&A is a journal idea - radically different from any other journal.  It will be a structured answer to a clinical question, posted on the site (and helping populate the answer engine) which will be peer-reviewed and given a citation.
So far, fairly radical and fairly good.

Now, another variable to consider - the PICO search system.  In the forthcoming upgrade we'll be enhancing this feature in the premium version.  It will be more guided than the existing version and it could work like this:
  1. Users types in their full-text question.
  2. Users then select the PICO elements from the question.
  3. Users view relevant results.
  4. Users are given the option to write up an answer. If they write up the answer we will show them the articles they've looked at and they can indicate which were useful (and thereby form the reference list).
  5. They can choose to keep it private or share it - feeding the answer engine.
Another powerful component for a Q&A environment, what could go wrong (I ask tentatively!)?