Search

Trip Database Blog

Liberating the literature

Category

clickstream

Search safety net

As we move forward after the introduction of our Premium product we can start to plan future developments and one is a search safety net!

Using our click stream data we can see what articles you’re looking at suggest other documents that you should consider.  Take this network map (click to enlarge):

This is based on searches on Trip for urinary tract infections.  Each blue square (node) represents an article and the lines linking them are created when a user clicks on two (or more) articles in the same search session.

If we have this information we can build a really useful system.  A user comes and does a search of Trip for UTI and finds, from articles in the bottom right of the above image, a number of articles (marked in red in the image below):

It is clear that they may have overlooked four articles (marked as blue nodes) so we alert the user.  It gives them a chance to double check the results.  They may have deliberately excluded them or they may have simply made a mistake.  If it’s the latter then the system will have served its function as a safety net.

I’ve started using the phrase ‘Trip makes finding evidence easy’ but with this technique we could also claim that ‘Trip easily helps you not miss key evidence’.  Not quite as succinct, but you get the picture!

This is a value added service so I envisage it only being available to Premium users of Trip.  Hopefully another reason to upgrade.

Trip tiles

We’ve been live, as a Freemium service, for a little over two weeks.  In my more pessimistic (pre-launch) moments I was thinking that at this stage I may be having to abandon the whole idea as no-one was purchasing Trip.  However, I’m delighted that this is not the case!  We’re massively ahead of schedule and as such we’re accelerating various upgrades that we’d hoped to do towards the end of 2015.

Even more exciting, we’re thinking of new ideas!

One idea springs from my desire to do something interesting with the Timeline.  The Timeline records your searches and articles viewed on Trip and not much else.  So, one idea is to create something called Trip Tiles!  A fresh tile would be created with every new search and at the top of the tile would be the search terms and underneath would be the articles viewed.  In many ways this is what the timeline currently does.  But I think there’s the potential to link other people’s searches.  So, you might search and find three articles and as part of that process we highlight that 1 or more of the articles has been viewed in someone else’s timeline and offer you the chance to see their tile.

Best illustrate that with one of my legendary attempts at a picture (if we roll out this feature we’ll get them properly designed):

You could go from tile to tile both browsing and looking to see if you’ve missed any useful articles that someone else has already found.  Not only that you can see what search terms they’ve used – again possibly useful.
How we’d implement this would be a challenge, but I’d see that as an interesting challenge not a particularly tough one!  Any feedback on the idea would be appreciated – comments on the blog or via email: jon.brassey@tripdatabase.com

Dental evidence

As part of a wider piece of work I’ve been looking through the search logs and clickstream data associated with the dental specialty.  It’s interesting, and the following information is based on the 1,100+ registered users on Trip who have ticked the clinical specialty of dentistry. 

Top twenty search terms, most frequently used at the top

  • caries
  • gingivitis
  • periodontitis
  • dentistry
  • orthodontics
  • dental caries
  • dental implants
  • restorative dentistry
  • periodontal disease
  • oral cancer
  • Caries Risk Assessment
  • Medical errors
  • dental public health
  • fluoride
  • hypnosis
  • dental materials
  • endodontic outcome
  • pit and fissure sealants
  • pediatric dentistry
  • Patient Safety

And the top twenty articles are as follows:

  1. Pregnancy and gingival inflammation – Dental Elf
  2. Patients with Amalgam Restorations Are Not at a Significantly Greater Risk for Developing Health Complications Than Those With Composite Restorations – UTHSCSA Dental CATs
  3. The Use of Dental Crowns for Vital and Endodontically Treated Teeth: A Review of the Clinical and Cost-Effectiveness and Guidelines – CADTH
  4. Dental interventions to prevent caries in children – SIGN
  5. Dental Implants and Conventional Prosthetics: Comparative Clinical Effectiveness and Safety – CADTH
  6. Composite Resin and Amalgam Dental Filling Materials: A Review of Safety, Clinical Effectiveness and Cost-effectiveness – CADTH
  7. Conscious (Moderate) Sedation Can Be Used Safely On Patients With Obstructive Sleep Apnea – UTHSCSA Dental CATs
  8. Diagnosis and Treatment of Obstructive Sleep Apnea in Adults – AHRQ
  9. 12-Year Survival of Composite vs. Amalgam Restorations – J Dent Res.
  10. Methods of Diagnosis and Treatment in Endodontics – SBU
  11. A Randomized Clinical Trial Comparing At-Home and In-Office Tooth Whitening Techniques: A Nine-Month Follow-up – J Am Dent Assoc.
  12. Composite resin and amalgam dental filling materials: a review of safety, clinical effectiveness and cost-effectiveness – NHS CRD (HTA record of number 6 above!)
  13. Fluoride varnishes for preventing dental caries in children and adolescents – Cochrane
  14. Community Water Fluoridation in Canada ? Trends, Benefits, and Risks – National Collaborating Centre for Environmental Health
  15. Flossing for the management of periodontal diseases and dental caries in adults – Cochrane
  16. Interventions for replacing missing teeth: antibiotics at dental implant placement to prevent complications – Cochrane
  17. Oral Appliance Therapy and Continuous Positive Airway Pressure Demonstrate Similar Improvements in the Treatment of Mild/ Moderate Obstructive Sleep Apnea – UTHSCSA Dental CATs
  18. Cost-effectiveness of a long-term dental health education program for the prevention of early childhood caries – NHS EED
  19. Prosthetic rehabilitation of partially dentate or edentulous patients – SBU
  20. Primary clinical care manual (7th edition, 2011) – The State of Queensland (Queensland Health) and the Royal Flying Doctor Service (Queensland Section)

     

The important breakthrough

Trip has been operating for over 15 years and I can easily say we have arrived at the most significant breakthrough yet.  It is still in our ‘labs’ section and still has much work to do before being rolled out.  But, the path is clear and, finance aside, there is no reason why we can’t produce a significant increase in search performance.

In search a really important concept is intention.  So, when a user searches they may add 2-3 search terms but what are they thinking about when they use those terms?  For instance, and this is a true story, I showed Trip to a Professor of Anaesthesiology  and asked for his views on the site.  He came back saying that he was unimpressed!  The reason – his interest was in awareness (as in, when a person is under anaesthetic are they truly anaesthetised or may they be aware) and when you search Trip for awareness you get lots of results, mostly on things like the awareness of public health messages! Another example I use to illustrate the point is the search pain.  We return the same results whether the person is an oncologist or a rheumatologist – which to me is ridiculous – as the intention is likely to be significantly different.  But, to date, there has been no good solution.

The below image (click to enlarge) shows a breakthrough.

In the image above there are 4 sets of results for the same search antibiotics.  This is a test system and not based on the real Trip results.  However, on the left-hand side we have the normal/natural results for the search antibiotics in the test system.  In the top right set of results the natural results have been reordered based on the clickstream activity of the users of Trip, those who have not logged in (85%).  At the simplest level this promotes results that have been clicked on and relegates those that have not been clicked.  It really is more complex than that – but I hope you get the point!

But the bottom right is where the magic it.  Even though it only accounts for 0.2% of the activity, we have reordered the results based on the clickthrough activity of dentists.  There are a few erroneous results, but I’d like to think you can see the effect – dental articles are promoted.

So, the effect of this is that – when we eventually roll out the system – and we know the user is a dentist we improve their results based on the previous activity of other dentists.  The reality is that this technique will work with any speciality and profession.

There are a few issues, the paucity of data is the biggest and we have two significant ways of tackling this:

  • When we roll out the new Trip we will – to a large extent – make login/registration obligatory.  This will mean we get lots more clickstream data which will make the results even better.
  • Machine learning.  We’ve already worked on machine learning and will bring these techniques to the system to enhance/compliment the clickstream work.

Oh yes, we’ve even figured out a way to mitigate the effects of filter bubbles.

This really has been a good few weeks.

The light at the end of the tunnel…

…is, I hope, not the light of an oncoming train. I’ve nabbed that line from my favourite band – Half Man Half Biscuit (HMHB) who wrote The Light At The End Of The Tunnel (Is The Light Of An Oncoming Train) a good few years ago! My love for HMHB aside, I keep reflecting on how things seem to be going really well for Trip and I’m desperately hoping we’ve turned a corner.  So, why the optimism:

  • 2014 was pretty good.
  • We’re working on the new Freemium version of Trip.  What’s going to come out is going to be impressively good and some of the premium upgrades will be great.
  • We’re involved in the really interesting EU funded project which will be doing some really innovative things.  I’ll blog about that more when the final specifications are agreed, but we’ll be looking at making Trip more multi-lingual, we’re going to be improving the Trip Rapid Review system and loads of work around similarity which is useful for the next point.
  • Relatedness/similarity is looking very useful for what we want to do with regard developing our financial viability.  The measures we’re developing will allow us to do all sorts of interesting things, for instance we can highlight a new book that’s useful to a particular clinician, we can highlight a new trial that’s pertinent to an existing systematic review.  Many more uses on top of that, but I’ve got to keep some secrets.
  • I’m starting to realise the value in our clickstream data (helped by two separate teams and soon to be joined by a PhD student as part of the EU project).  You only have to look at most of this year’s blog posts to see I’m working hard on this.  This can help with the relatedness work but it can do other useful things, such as improving the search results and better predicting new articles that are of use to a Trip user.  If our mission is to ensure health professionals get the right evidence to support their care – using clickstream data will make it so much more effective.  The advantage of the clickstream data is that it’s Trip’s data to utilise, it’s our IP.  It’s at the heart of our future.  I actually think it’s this point that’s making me so happy/optimistic.
  • Lots of other nice bits and bobs e.g. I’ve just been invited to lecture in the USA in Autumn/Fall; I’m part of a large consortium bidding to be a support team for complex reviews; I’m presenting at the wonderful Evidence Live; I’m making headway in my new NHS job (I am lead for Knowledge Mobilisation for Public Health Wales); I’m waiting to hear about a large MRC grant (not optimistic but something to look forward to).

Long may this continue!

Another use for clickstream data

In the previous post (Clickstream data and results reordering) I highlighted how the clickstream data could be used to easily surface articles that are not picked up by usual keyword searches.  That post highlighted how it could be used to improve search results.  In my mind I was thinking this could help surface documents to improve a clinician trying to answer their clinical questions.

But what about in systematic reviews (or similar comprehensive searches)?  A couple of scenarios spring to mind:

  1. A user conducts a search and find, say 15, controlled trials.  We could create a system that highlights the most connected clinical trials that have not been selected already.  So, possibly an in-built safety check to ensure that no trials are missed.
  2. Related concepts.  You see some spectacularly complex search terms, no doubt human generated.  There may be other systems but we could surface related concepts.  A simple example was shown in the early post (Clickstream data and results reordering) where it highlighted that obesity is related to diet.  OK, we all know that – but the computer didn’t, it spontaneously highlighted it.  Doing this on a large scale using Trip’s ‘big data’ will generate more obscure relationships – potentially very useful in generating a comprehensive search strategy!

If there are any systematic reviewers/searchers I’d love to hear what you think!

Clickstream data and results reordering

Recently I’ve been discussing the potential for using our clickstream data (our earliest post on the subject being from October 2013).  After a post earlier this year Ok, I admit it, I’m stuck I have been contacted by two separate people who have both been very generous with their time and on Friday I met with one of them who talked me what they had found.

Before I share the results there are a few points to consider:

  • This really is early days and it needs some imagination to see how it would work on Trip.
  • The image below is one trial, simply to illustrate a point.  The results are not based on the full Trip index, just a very small sample.
  • The search is using a very simple text matching for title words only.  So, as you will see in the image below all the articles in the left-hand column have the search term – diet – in the title.

So, what’s going on?

The left hand side are the results in this mock-up search.  However, those on the right-hand side have been reordered using simple clickstream data.  Those articles that are surrounded by the light blue colour have been boosted (so appear higher) due to lots of people clicking on them.  Those results surrounded by orange are arguably more interesting – as they don’t include the search term in the title!

What this signifies is that users of Trip, while searching the actual Trip, have clicked on the orange articles in the same search session as one of the articles on the left-hand side.  So, it’s telling us that the orange articles are related to the normal results – and being inserted into the results – even though they were not matched in our search test by having the word diet in the title.

Trying to describe this in the blog is slightly difficult as I’m not sure if I’ve explained it particularly well.  I suppose there are two take homes:

  • Clickstream data, even using a small sample, can undercover some really useful articles that a standard keyword search might miss.
  • I am very excited by this, so have faith in that!

    People who looked at this article, also looked at…

    In my previous post Ok, I admit it, I’m stuck (a title people seem to really like) I highlighted the difficulty in finding meaning in our clickstream data (the data generated by users interacting with the site).  One thing that I had thought about and a couple of people have subsequently raised is an Amazon style ‘People who looked at this article, also looked at this one..’, a feature I find really interesting and frequently useful.

    So, taking some earlier work on mapping UTI data  I started doing further analysis but it was based on this graph.

    I started with an article that looked in an interesting place and picked document 2056462 (Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? from the publication Tools for Practice 2013) and then followed the links from there.  Some have since been removed or updated.  But, we can say that ‘People who looked at Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? also looked at…

    • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010)
    • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012)
    • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010)
    • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009)
    • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012)
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
    • Urinary tract infection (lower) – women (NICE Clinical Knowledge Summaries, 2009)

    I then, as a way of snowballing, took the last article in the list and did a similar thing, which results in ‘People that looked at Urinary tract infection (lower) – women also looked at…

    • Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? (Tools for Practice 2013)
    • Urological infections (European Association of Urology, 2013)
    • Recurrent Urinary Tract Infection (Society of Obstetricians and Gynaecologists of Canada, 2010)
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
    • Urinary tract infection (lower) – men (NICE Clinical Knowledge Summaries, 2010)

    Anyway, I hope it’s clear what’s going on!  On one level it all seems good and interesting in that all the articles seem relevant.  But does it add anything that the initial search wouldn’t have found?  To help I’ve gone through the top list and shown where each of the results appears in the search results (coincidentally the Tools for Practice article came 5th in the results list for a search of urinary tract infection and cranberry):

    • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010) = Result #38
    • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012) = Result #18
    • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010) = Result #7
    • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009) = Result #14
    • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012) = Result #2
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013) = Result #13
    • Urinary tract infection (lower) – women (NICE Clinical Knowledge Summaries, 2009) = Result #54

    To me these results are interesting!  The clear ‘outliers’ are the top and bottom results which appeared in result number 38 and 54 respectively.  This is important as it means that they are much less likely to be seen – especially the latter one which would be on the third page of results.

    Is this useful?

    It will highlight different articles than found from browsing the search results, but is there a cost?  Will users look less at our algorithmic results (the normal results) and rely on these ‘human’ results?  If so, is that good or bad?  I actually think it’ll encourage people to explore more and spend longer on the site – so I don’t think it’ll have a negative consequence.

    This is really interesting!

    I’m really tempted to open a can of worms by asking if there is any coherence/rationality as to how the linked articles list is generated.  However, as the above list is based on only a sample of data it’d be wrong to place too much weight on things.  Also, even if it is random, so what!?

    Finally, I’ve even graphed this out (in not too an appealing way):

    Ok, I admit it, I’m stuck

    I’ve been talking about article social networks for a while, and last August I wrote ‘Beauty is in the eye of the beholder‘ which contained the image below.

    I’ve continued to be fascinated by them and below are two more images – focused on defined areas of the above graph

    These are beautiful – but is there more to it?

    Both images show definite structure.  So, our users, simply by using the site are adding structure and energy.  I keep getting drawn to the principle of entropy.  I’m absolutely sure that our users are ordering the articles in Trip but does that have any value?

    I admit to being relatively clueless – part of the purpose of the post is to see if the wisdom of the Trip users can be brought to bear to try and help me figure out what the above might mean and what might the next steps be!

    The above image (taken from Article social networks, meaning and redundancy) shows distinct clusters as well.  In the bottom left is a cluster of articles on UTI and cranberry and it consists of 19 articles.  If you do a search of Trip you find many more than this.  So, our users are not clicking on many articles – so as well as adding structure are they giving us clues as to articles that aren’t worthwhile (based on their collective judgements)?

    If you click on one article in that cluster, is it likely that the others will be worthwhile?  What about if a new article is published and joins the cluster based on another person searching and effectively adding the article to the cluster – is that useful?  I’m sure there are no absolutes, but these appear to be hints – surely?

    A final thought – the graphs are based on all users.  I imagine the above graph would look different if the user had been a general/family practitioner compared with, say, a urologist.  Stronger clues?

    I would be absolutely delighted if anyone can help me figure out the value/meaning of the data.  And, if you can think of ways of working together I’d be delighted to see how we can share the data!

    Blog at WordPress.com.

    Up ↑