Search

Trip Database Blog

Liberating the literature

Another use for clickstream data

In the previous post (Clickstream data and results reordering) I highlighted how the clickstream data could be used to easily surface articles that are not picked up by usual keyword searches.  That post highlighted how it could be used to improve search results.  In my mind I was thinking this could help surface documents to improve a clinician trying to answer their clinical questions.

But what about in systematic reviews (or similar comprehensive searches)?  A couple of scenarios spring to mind:

  1. A user conducts a search and find, say 15, controlled trials.  We could create a system that highlights the most connected clinical trials that have not been selected already.  So, possibly an in-built safety check to ensure that no trials are missed.
  2. Related concepts.  You see some spectacularly complex search terms, no doubt human generated.  There may be other systems but we could surface related concepts.  A simple example was shown in the early post (Clickstream data and results reordering) where it highlighted that obesity is related to diet.  OK, we all know that – but the computer didn’t, it spontaneously highlighted it.  Doing this on a large scale using Trip’s ‘big data’ will generate more obscure relationships – potentially very useful in generating a comprehensive search strategy!

If there are any systematic reviewers/searchers I’d love to hear what you think!

Clickstream data and results reordering

Recently I’ve been discussing the potential for using our clickstream data (our earliest post on the subject being from October 2013).  After a post earlier this year Ok, I admit it, I’m stuck I have been contacted by two separate people who have both been very generous with their time and on Friday I met with one of them who talked me what they had found.

Before I share the results there are a few points to consider:

  • This really is early days and it needs some imagination to see how it would work on Trip.
  • The image below is one trial, simply to illustrate a point.  The results are not based on the full Trip index, just a very small sample.
  • The search is using a very simple text matching for title words only.  So, as you will see in the image below all the articles in the left-hand column have the search term – diet – in the title.

So, what’s going on?

The left hand side are the results in this mock-up search.  However, those on the right-hand side have been reordered using simple clickstream data.  Those articles that are surrounded by the light blue colour have been boosted (so appear higher) due to lots of people clicking on them.  Those results surrounded by orange are arguably more interesting – as they don’t include the search term in the title!

What this signifies is that users of Trip, while searching the actual Trip, have clicked on the orange articles in the same search session as one of the articles on the left-hand side.  So, it’s telling us that the orange articles are related to the normal results – and being inserted into the results – even though they were not matched in our search test by having the word diet in the title.

Trying to describe this in the blog is slightly difficult as I’m not sure if I’ve explained it particularly well.  I suppose there are two take homes:

  • Clickstream data, even using a small sample, can undercover some really useful articles that a standard keyword search might miss.
  • I am very excited by this, so have faith in that!

    People who looked at this article, also looked at…

    In my previous post Ok, I admit it, I’m stuck (a title people seem to really like) I highlighted the difficulty in finding meaning in our clickstream data (the data generated by users interacting with the site).  One thing that I had thought about and a couple of people have subsequently raised is an Amazon style ‘People who looked at this article, also looked at this one..’, a feature I find really interesting and frequently useful.

    So, taking some earlier work on mapping UTI data  I started doing further analysis but it was based on this graph.

    I started with an article that looked in an interesting place and picked document 2056462 (Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? from the publication Tools for Practice 2013) and then followed the links from there.  Some have since been removed or updated.  But, we can say that ‘People who looked at Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? also looked at…

    • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010)
    • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012)
    • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010)
    • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009)
    • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012)
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
    • Urinary tract infection (lower) – women (NICE Clinical Knowledge Summaries, 2009)

    I then, as a way of snowballing, took the last article in the list and did a similar thing, which results in ‘People that looked at Urinary tract infection (lower) – women also looked at…

    • Cranberry juice/tablets for the prevention of urinary tract infection: Naturally the best? (Tools for Practice 2013)
    • Urological infections (European Association of Urology, 2013)
    • Recurrent Urinary Tract Infection (Society of Obstetricians and Gynaecologists of Canada, 2010)
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013)
    • Urinary tract infection (lower) – men (NICE Clinical Knowledge Summaries, 2010)

    Anyway, I hope it’s clear what’s going on!  On one level it all seems good and interesting in that all the articles seem relevant.  But does it add anything that the initial search wouldn’t have found?  To help I’ve gone through the top list and shown where each of the results appears in the search results (coincidentally the Tools for Practice article came 5th in the results list for a search of urinary tract infection and cranberry):

    • Novel Concentrated Cranberry Liquid Blend, UTI-STAT With Proantinox, Might Help Prevent Recurrent Urinary Tract Infections in Women (Urology, 2010) = Result #38
    • Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial (Mayo Clinic proceedings, 2012) = Result #18
    • Cranberry is not effective for the prevention or treatment of urinary tract infections in individuals with spinal cord injury (DARE, 2010) = Result #7
    • Cranberries for preventing urinary tract infections (Cochrane Database of Systematic Reviews, 2009) = Result #14
    • Cranberry-containing products for prevention of urinary tract infections in susceptible populations (CRD 2012) = Result #2
    • A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection (Journal of infection and chemotherapy, 2013) = Result #13
    • Urinary tract infection (lower) – women (NICE Clinical Knowledge Summaries, 2009) = Result #54

    To me these results are interesting!  The clear ‘outliers’ are the top and bottom results which appeared in result number 38 and 54 respectively.  This is important as it means that they are much less likely to be seen – especially the latter one which would be on the third page of results.

    Is this useful?

    It will highlight different articles than found from browsing the search results, but is there a cost?  Will users look less at our algorithmic results (the normal results) and rely on these ‘human’ results?  If so, is that good or bad?  I actually think it’ll encourage people to explore more and spend longer on the site – so I don’t think it’ll have a negative consequence.

    This is really interesting!

    I’m really tempted to open a can of worms by asking if there is any coherence/rationality as to how the linked articles list is generated.  However, as the above list is based on only a sample of data it’d be wrong to place too much weight on things.  Also, even if it is random, so what!?

    Finally, I’ve even graphed this out (in not too an appealing way):

    Ok, I admit it, I’m stuck

    I’ve been talking about article social networks for a while, and last August I wrote ‘Beauty is in the eye of the beholder‘ which contained the image below.

    I’ve continued to be fascinated by them and below are two more images – focused on defined areas of the above graph

    These are beautiful – but is there more to it?

    Both images show definite structure.  So, our users, simply by using the site are adding structure and energy.  I keep getting drawn to the principle of entropy.  I’m absolutely sure that our users are ordering the articles in Trip but does that have any value?

    I admit to being relatively clueless – part of the purpose of the post is to see if the wisdom of the Trip users can be brought to bear to try and help me figure out what the above might mean and what might the next steps be!

    The above image (taken from Article social networks, meaning and redundancy) shows distinct clusters as well.  In the bottom left is a cluster of articles on UTI and cranberry and it consists of 19 articles.  If you do a search of Trip you find many more than this.  So, our users are not clicking on many articles – so as well as adding structure are they giving us clues as to articles that aren’t worthwhile (based on their collective judgements)?

    If you click on one article in that cluster, is it likely that the others will be worthwhile?  What about if a new article is published and joins the cluster based on another person searching and effectively adding the article to the cluster – is that useful?  I’m sure there are no absolutes, but these appear to be hints – surely?

    A final thought – the graphs are based on all users.  I imagine the above graph would look different if the user had been a general/family practitioner compared with, say, a urologist.  Stronger clues?

    I would be absolutely delighted if anyone can help me figure out the value/meaning of the data.  And, if you can think of ways of working together I’d be delighted to see how we can share the data!

    2014, looking back with pride

    At the end of 2013 I did a review of the year and now, in early 2015, I thought I’d repeat the exercise for 2014!

    First, the stats:

    • We had over 3,600,000 page views.
    • We are up to registered user number 140,000.  However, the standard view is to discount the number by 10-20% for users who no longer use the account and/or spam accounts.  So, we probably have 115-125,000 registered users.
    • The average duration on the site continues to increase 5.08 minutes compared with 4.11 in 2013. This is mirrored in the number of pages per session, increasing from 3.26 to 3.89
    • The bounce rate (people who just visit one page and then leave without engaging) has decreased by 20%

    The above represents an ongoing trend which is seeing less unique users but the ‘quality’ is higher in that the users are more engaged and making better use of the site.  It is this engagement that is so satisfying, much more important than some – ego boosting – headline of number of unique visitors (although 3.6 million page views is quite impressive)!

    Financial insecurity has been a recurring theme for Trip and I’m really pleased as I think we’re fine for now and this is based on two facts:

    • We’ve secured a couple of grants recently which help in any number of ways.
    • We’ve finally arrived at a business model (freemium) which we will roll out in March (I hope).  I’m optimistic as we’ll be offering a great premium offering and hopefully a number of users and institutions will sign-up.

    At the end of 2013 I reported on the disappointment of missing out on an honorary professorship but I was very pleased to be given an honorary fellowship at the Centre for Evidence-Based Medicine (CEBM) at Oxford University.  The CEBM runs the wonderful Evidence Live series of conferences and I’ll be involved again in the session ‘EBM into Practice: Future of evidence synthesis: a new paradigm’ which will be alongside Carl Heneghan, Martin Burton and Tom Jefferson.

    Other bits and bobs from the year:

    • One of the grants was from the EU Horizon 2020 funding and will see me getting involved in lots of interesting research relating to multi-lingual search as well as a big chunk of machine reading and learning, including an overhaul and enhancement to our rapid review system.
    • My role in Public Health Wales (PHW) seems to be working itself out as I was given the role of lead for knowledge mobilisation (a term I dislike) and I’ve just finished a draft strategy on making PHW more ‘evidence-based’.  I believe my role will then move into delivering on the strategy – which should be a nice challenge.
    • I’ve continued to conduct work in the social networks of articles with the huge support of the wonderful Valdis Krebs.  As a little treat I’ve added two images of further analysis below – happy to share more if anyone is interested!

    Other than the above there have been so many other things but many are important to me but probably less so to others.

    There is also another, really major, project I’m starting to explore but for various reasons I can’t share now.  But it builds on the answer engine concept but there is the potential for Trip to work with a huge commercial partner.

    Finally, a very large thank you to:

    • The users, without you Trip would be nothing!
    • Those users that completed the various surveys.
    • The members of the Trip advisory board for being very generous with your time and your collective knowledge/wisdom.
    • The many incredible people who I have interacted with – I really am lucky. 

    2014 has been great and I hope – given the reduced financial stress – 2015 will be even better.

    Creating a Q&A environment in Trip

    For those of you who’ve followed this blog for a while will see that I’m always revisiting the answer engine concept, most recently two months ago. A month before that I mentioned it in the context of a a Journal of Clinical Q&A

    This all stems from my belief that Trip is a wonderful tool to answer clinical questions but a also belief that it could be even better!  After all, it was the reason I started it in the first place – to help me answer clinical questions via the ATTRACT Q&A serviceSurveys have shown that many clinicians agree, with over 70% of questions, supporting clinical care, are helped by using Trip.

    Recapping briefly on the answer engine and the Journal of Clinical Q&A:

    • The answer engine will try to predict questions from the search terms and insert an answer above the search results.  Users will get an answer in one click.
    • Journal of Clinical Q&A is a journal idea – radically different from any other journal.  It will be a structured answer to a clinical question, posted on the site (and helping populate the answer engine) which will be peer-reviewed and given a citation.

    So far, fairly radical and fairly good.

    Now, another variable to consider – the PICO search system.  In the forthcoming upgrade we’ll be enhancing this feature in the premium version.  It will be more guided than the existing version and it could work like this:

    1. Users types in their full-text question.
    2. Users then select the PICO elements from the question.
    3. Users view relevant results.
    4. Users are given the option to write up an answer. If they write up the answer we will show them the articles they’ve looked at and they can indicate which were useful (and thereby form the reference list).
    5. They can choose to keep it private or share it – feeding the answer engine.

    Another powerful component for a Q&A environment, what could go wrong (I ask tentatively!)?

    Professions in Trip Profile

    When you register with Trip you are asked to select your profession, the current list is shown below:

    The above 9 options are simply not good enough as around a quarter of users select the ‘Other’ option (and I can’t imagine these users are made to feel particularly special!).  Also, as we want to offer increasingly personalised information, the more granular the detail we have on a person the better.  So, in our recent surveys I asked people to tell us their profession and from that I have come up with a more comprehensive list:

    Academic researcher
    Dentist
    Dental – other
    Dietician/nutritionist
    Doctor/physician – other
    Doctor/physician – primary care
    Doctor/physician – secondary care
    Educator
    Librarian/Information specialist
    Medical laboratory scientist
    Midwife
    Nurse
    Nurse lecturer
    Nurse practitioner
    Nurse, clinical specialist
    Ophthalmologist
    Optometrist
    Paramedic
    Patient/carer
    Public health professional
    Pharmacist
    Physical therapist/physiotherapist
    Physician assistant
    Retired
    Speech and language
    Student
    Other

    Expect to see the changes in early 2015

    Communicating the evidence ‘types’

    Those who use Trip will possibly have noticed small thumbnails to the right of each search result (see image 1 below).  The idea is that they are a small screenshot of the actual page which people can rollover to see a preview of the actual result  They are problematic as it’s currently broken so we only have screenshots for around half of them.  Also, they are moderately resource intensive.

    So, we need to decide to fix them or remove them or replace them with something else – hence this post.

    One idea I’ve got is to use the space to give additional information to users to help them understand the evidence they’re looking at.  For instance, we could use it to give a clearer idea of the likely strength of evidence.  We currently do this via the use of colour flashes but unfortunately many people miss this.  The colour flashes link the individual article to the colours used in the filter section (so green indicates higher quality evidence etc.).  Below are some images that are an attempt to show what it might look like.  I’d appreciate you looking at them (click on the image to enlarge it) and then go to this survey to let us know what you think.  There are only 4 questions so it shouldn’t take long.

    Thank you in advance.

    Strange results

    Barb, one of our volunteers on our Twitter accounts, commented a while ago about seeing some strange results on Trip, so I asked her to send any news ones she found to help me understand what was going on.  She was looking for new articles in Trip that are returned for the search ‘immunisations’.  Many were fine but a few weren’t, for instance:

    • Multiple sclerosis: management of multiple sclerosis in primary and secondary care
    • Health visiting
    • Economic Evaluation of Complex Health System Interventions: A Discussion Paper
    • British Guideline on the management of asthma
    • Developing and Evaluating Methods for Record Linkage and Reducing Bias in Patient Registries

    Now, these are not specifically about immunisations but they’ll all make reference to it.  For instance the top result has the following:

    Vaccinations
    1.4.2 Be aware that live vaccinations may be contraindicated in people with MS who are being treated with disease-modifying therapies.

    We return all the results that match the search terms (and/or synonyms).  However, our algorithm is designed to emphasise those results which are more relevant.  So, ordinarily, if you do a search with lots of results the relatively irrelevant results don’t appear (well they do, but not till way down the results).  However, if you look for things with few results (perhaps an unusual condition or you heavily restrict the results) you are more likely to see ‘strange’ results.

    So, what can we do? I see three options:

    1. Leave it ‘as is’ and hope people don’t get put off by the occasional result they find strange.
    2. We allow users to set a relevancy cut-off themselves.  Each search result gets a score from 0 to 1 (with 1 being very relevant) and every result that matches the search term gets at least 0.0001 and therefore can be shown in the results.  We could give users a ‘slider’ to allow them to chose what cut-off they want, So some might chose 0.1 while others might chose 0.3.
    3. We effectively borrow a concept from PubMed’s Clinical Queries which has a narrow and broad search.  The narrow search returns fewer results, they’re more relevant but you may miss a few (it’s a specific search) while the broad search gets more results but more irrelevant results (it’s a sensitive search).  So, in effect, Trip currently does a highly sensitive search.  You can see the effects in PubMed for a broad and narrow search for prostate cancer screening:

     My ‘gut’ instinct is the third option.  We, at Trip, experiment to try and arrive at a reasonable relevancy cut-off which is introduced by default on all searches. On the result’s page we highlight that the search is narrow and to make it broad simply press a button.

    Feedback please and thank you – again – Barb for the input 🙂

    Blog at WordPress.com.

    Up ↑