Search

Trip Database Blog

Liberating the literature

Category

clickstream

Analytics

Trip is involved in the extremely interesting KConnect project, via the EU’s Horizon 2020 funding stream.  One output of this has been a nice interface to explore our clickstream data.  Clickstream data being a record of how people interact with the site eg search terms used, articles viewed etc.

This gives us a glimpse of clinical uncertainties, ‘hot topics’, useful sites etc.  Below are three images to highlight potential uses, but we know there are more.  Imagination is the limitation with this data!

Image One – this is a record of searches for breast cancer and a drug.  So, which drugs are most popular when used in conjunction with the term breast cancer?  In this case, it’s trastuzumab!

Image Two – As we record the date and time of the search we can plot search term popularity.  So, you can clearly see a peak of searching for ebola around October last year.

Image Three – for searches which publications have been most visited?  Top is PubMed, followed by NHS CRD, then NICE.  http://dx.doi…. refers to Cochrane and the final entry to the top five is CADTH.

 

A new analytic ‘toy’

I’m very happy to have been given access to a new analytic tool on Trip.  It analyses the near 100 million bits of clickstream data (sounds like ‘big data’ to me).

I simply enter a search term and it exports every instance of a person using that search term, what date and time, what they clicked on and if they used any additional search terms.  For example, a quick search on an incomplete data set (still only half way through full indexing the 100 million records) for Fingolimod (or the brand name Gilenya) revealed it had been searched over 350 times in the last five years.  The top five articles viewed were:

  1. Fingolimod for the treatment of highly active relapsing-remitting multiple sclerosis. National Institute for Health and Clinical Excellence – Technology Appraisals (47 views)
  2. Fingolimod – a potential new oral treatment for multiple sclerosis? NPC Rapid Reviews (40)
  3. Fingolimod – Multiple Sclerosis. Canadian Agency for Drugs and Technologies in Health – Common Drug Review (35)
  4. Fingolimod (Gilenya) – in highly active relapsing remitting multiple sclerosis. Scottish Medicines Consortium (30)
  5. Fingolimod versus Glatiramer for Adults with Relapsing Remitting Multiple Sclerosis: Clinical and Cost-Effectiveness. Canadian Agency for Drugs and Technologies in Health – Rapid Review (26)

And below is a tag cloud of the additional terms added to the search:

 NOTE: the tag cloud formation software capped the number of words displayed to something like the top fifty.

Search safety net

The search safety net is a novel feature to help improve searching; helping users not miss important papers.  I wanted to explain it – simply – but have failed on that score.  It’s important so I hope you can make sense of what I’ve written.  If you have any questions, my email is jon.brassey@tripdatabase.com.

After a search you will see a new ‘Search Safety Net’ button

If you click that it’ll bring up a list of related search terms.  It does this by looking at the top 250 search results and analysing the search terms people have previously used when clicking on these results.  This works on the notion that a single document can be clicked on after numerous searches.  For instance, in the example above search terms might have been ‘prostate cancer screening’, ‘MRI screening’ etc.

The next section of the search safety net happens AFTER you’re conducted your search and found a number of documents you like AND looked at (or simply clicked the ‘check box’ to the left of the result).  If you click on the Search Safety Net button again you see three columns of results:

The first column is closely related articles, the second is other related articles and the third is the related search terms.  The latter column is similar to the description of related search terms above, but is based purely on the documents clicked (as opposed to the top 250 results).  However, to understand the process behind the other two columns you need to understand a clickstream data.

Paper 1 ———- Paper 2 ———- Paper 3

In the above there are three papers (1-3).  A user, in the same session, clicks on Paper 1 and Paper 2, therefore we can make a link between the two.  Another user might click on Paper 2 and Paper 3, again making a link.  So, Paper 1 is connected to Paper 2 (a single step, using network language) while Paper 3 is two-steps away from Paper 1.  We have this data for all articles in Trip.

Slightly simplifying things (!) the first column is the most popular related articles based on documents that are one step away from the documents clicked.  So, we look at all the articles clicked by the user and pull back all the documents that are one step away, displaying most ‘popular’ at the top.  The second column are all the documents that are two steps away.  This is likely to find less focused results, but the occasional really interesting study that might have been missed.

Two important issues:

  • For this to work requires clicks, if the documents you’ve looked at has no clicks, then you’ll get no results.
  • This is being released as a ‘beta’ bit of software, as in we’re still developing it.  At present it is available to both free and Premium users of Trip.  However, this is likely to change in the near future.

Search safety net (or ‘what have I missed?’)

We’re continuing to discover uses for the clickstream data Trip has and the new use is the search safety net – which we’re currently testing.

The idea is that as you search Trip we note which articles you’ve clicked on and then, using clickstream data, predict other articles that might be related, or those you might have missed (hence the name). See the video embedded below.

There are two issues/problems/challenges:

  1. Lack of data. This only works on clickstream data, so it requires clicks!  Very new articles or obscure articles will not have the data.  As it happens, in the tests we’ve done, it’s been – broadly – really good.  But when we roll it out, it’s something to consider.  
  2. User experience.  This is the biggest challenge is how will users interact with the ‘service’?  In other words where do we put the results?  Do we automatically show them somewhere on the results page?  If so, will that annoy people who don’t want the service?  Alternatively, we create some sort of ‘search safety net’ button which would require a user to click the button.  This means many people will simply not see it and miss out.

Once we solve the user interaction side of things, we’ll roll it out.  In the interim, if you want to give it a try (and be one of the first people to use it) then drop me a line (jon.brassey@tripdatabase.com).

Related article test

For many years I’ve admired PubMed’s related articles feature.  If I was searching for an answer to a clinical question and found a useful article, related articles was a great way to see similar articles.  These similar articles had a good chance of being useful as they were so similar.  PubMed has no renamed the feature Similar Articles and this is what it does:

The Similar Articles link is as straightforward as it sounds. PubMed uses a powerful word-weighted algorithm to compare words from the Title and Abstract of each citation, as well as the MeSH headings assigned. The best matches for each citation are pre-calculated and stored as a set.

Trip’s related articles use a completely different approach – clickstream data.  Does it matter?  Does it work as well, worse or better?

Below are three comparisons.  But these are not necessarily fair. For instance, Trip’s approach relies on users clicking on the articles – so it won’t work on brand new articles.  Also, as you’ll see below a couple of the examples only have 4 related articles.  This is down to paucity of data.

In the examples below I believe that Trip’s approach is superior but I’m not sure with the other two examples, I’d call it close! But I’d value any input from others – those less biased than me!

Bottom line: it’s a really powerful demonstration of the potential of clickstream data but requires data, another reason to log in to Trip!

One final point, this approach is phase 1.  Phase 2 will be to start to use an approach closer to PubMed’s – using linguistic and semantic approaches.

Paper 1: Screening for prostate cancer. Cochrane 2013

PubMed’s related articles

  • Screening for prostate cancer. Cochrane Database Syst Rev. 2013
  • Screening for prostate cancer. Cochrane Database Syst Rev. 2006
  • Lycopene for the prevention of prostate cancer. Cochrane Database Syst Rev. 2011
  • Prophylactic platelet transfusion for prevention of bleeding in patients with haematological disorders after chemotherapy and stem cell transplantation. Cochrane Database Syst Rev. 2012
  • Chemoprevention of colorectal cancer: systematic review and economic evaluation. Health Technol Assess. 2010

Trip’s related articles

  • Screening for prostate cancer: a review of the evidence for the U.S. Preventive Services Task Force DARE. 2011
  • Population screening for prostate cancer: an overview of available studies and meta-analysis. DARE. 2012
  • PSA Test to Screen for Prostate Cancer. theNNT 2011
  • Update of evidence for prostate-specific antigen (PSA) testing in asymptomatic men. New Zealand Guidelines Group 2010
  • Focal therapy using high-intensity focused ultrasound (HIFU) for localised prostate cancer. National Institute for Health and Clinical Excellence – Interventional Procedures 2012

Paper 2: Comparison of conventional pulmonary rehabilitation and high-frequency chest wall oscillation in primary ciliary dyskinesia. Pediatric pulmonology 2014

PubMed

  • Comparison of conventional pulmonary rehabilitation and high-frequency chest wall oscillation in primary ciliary dyskinesia. Pediatr Pulmonol. 2014
  • Short-term comparative study of high frequency chest wall oscillation and European airway clearance techniques in patients with cystic fibrosis. Thorax. 2010
  • Effectiveness of treatment with high-frequency chest wall oscillation in patients with bronchiectasis. BMC Pulm Med. 2013
  • A pilot study of the impact of high-frequency chest wall oscillation in chronic obstructive pulmonary disease patients with mucus hypersecretion. Int J Chron Obstruct Pulmon Dis. 2011
  • Comparison of high-frequency chest wall oscillation with differing waveforms for airway clearance in cystic fibrosis. Chest. 2007

Trip

  • High frequency oscillation in patients with acute lung injury and acute respiratory distress syndrome (ARDS): systematic review and meta-analysis DARE. 2010
  • Effect of high-frequency chest wall oscillation on the central and peripheral distribution of aerosolized diethylene triamine penta-acetic acid as compared to standard chest physiotherapy in cystic fibrosis. Chest 2006
  • CNE article: pain after lung transplant: high-frequency chest wall oscillation vs chest physiotherapy. American journal of critical care.  2013
  • Effect of high-frequency chest wall oscillation versus chest physiotherapy on lung function after lung transplant. Applied nursing research. 2014


Paper 3: Glibenclamide, metformin, and insulin for the treatment of gestational diabetes: a systematic review and meta-analysis. BMJ 2015

PubMed

  • Glibenclamide, metformin, and insulin for the treatment of gestational diabetes: a systematic review and meta-analysis. BMJ. 2015
  • Metformin vs insulin in the management of gestational diabetes: a systematic review and meta-analysis. Diabetes Res Clin Pract. 2014
  • The use of oral hypoglycaemic agents in pregnancy. Diabet Med. 2014
  • Screening and diagnosing gestational diabetes mellitus. Evid Rep Technol Assess (Full Rep). 2012
  • Benefits and risks of oral diabetes agents compared with insulin in women with gestational diabetes: a systematic review. Obstet Gynecol. 2009

Trip

  • Effect comparison of metformin with insulin treatment for gestational diabetes: a meta-analysis based on RCTs. Archives of gynecology and obstetrics. 2014
  • The efficacy and safety of DPP4 inhibitors compared to sulfonylureas as add-on therapy to metformin in patients with Type 2 diabetes: A systematic review and meta-analysis. Diabetes research and clinical practice 2015
  • Evaluation of the potential for pharmacokinetic and pharmacodynamic interactions between dutogliptin, a novel DPP4 inhibitor, and metformin, in type 2 diabetic patients. Current medical research and opinion 2010
  • Metformin vs insulin in the management of gestational diabetes: a meta-analysis. PloS one 2013

Article analytics, again

Earlier today in the post Article analytics I said “This latest feature will be released soon.”  Little did I realise it would be live by the end of the day!

In the above image I’ve highlighted four key areas:

  • Analytics – appears under every link (for Premium users only), this is clicked to generate the data below.
  • Related by viewer – these are articles that have been clicked on during the same search session as they had clicked on the main article (Canadian clinical practice guidelines for the management of anxiety, posttraumatic stress and obsessive-compulsive disorders).
  • Viewers by country – this highlights where the users originate from who did the clicking!
  • Viewers by profession – as above but broken down by profession

NOTE: the above example is very rich as it’s clearly a very popular article.  Others will have considerably less data, another reason why we’re keen to get users to login!

Article analytics

This latest feature will be released soon.  For a given article premium users will be able to see related articles (based on clickstream data) as well as information on total views, views by country and views by profession…

Clever stuff with the help of QSPectral

At the start of the year I posted Ok, I admit it, I’m stuck, which was a cry for help from the Trip community to help me make sense of all our lovely clickstream data.  We had a few responses and one was from an Australian research and management consultancy QSPectral, a company specialising in providing strategic insights and predictions through advanced data science and analytics. They have been working with us to help us make sense of our clickstream data.

Article Association
QSPectral used their data science expertise to investigate the connections between the articles based on the user access data contained within the Trip Database. 

Figure 1 Snapshot of articles accessed across a session.  The colours represent user professions (doctor, nurse, etc.)

In the above image the Y-axis represents individual search sessions and the X-axis is the documentID (each article in Trip has a unique document ID).  So, we can see what professions are looking at which articles.  We can actually see what articles individuals are looking at, but the above image shows it on a profession basis.

Figure 2 A  more focused snapshot of the previous image

As a user do you want to see what other articles are similar to the one you are reading?
Do you want to know what others like you thought were similar?

To provide answers to these questions, QSPectral developed an algorithm based on association rules to explore the relationships between articles on a per session basis. We intended to identify links between articles based on different criteria of interest. 

The strength of the links was measured by statistical measures such as confidence and support factors.  These led to association rules, which were of the form if {article x is accessed then articles y and z} were also accessed were further enhanced by including additional user characteristics – information such as the profession (nurse, doctor..) as well as country of origin were used to moderate the previously established article relationships.

Figure 3 Snapshot of related article numbers  – if the articles on the y axis are accessed it implies those on the x axis would also be of interest.

The data can be further augmented by adding clickstream data that includes the area of speciality (such as cardiology) for a user, where the for example, if you are a doctor from Spain only relationships between articles that doctors from Spain accessed could be isolated and uncovered.  It was also possible to group the related articles in clusters based on this multi-dimensional relationship – defined by colour in the figure.

Figure 4 clusters of articles based on relationships

The purpose of this initial investigation was to set the stage for providing users with recommendations based on their initial article of interest and their particular user characteristic.  A slightly different approach to PubMed’s ‘related articles’ feature.

As well as finding closely related articles QSPectral have helped us explore recommendations of new articles.  So, if we know a user’s activity on Trip we can start to understand them and then – with QSPectral’s help – recommend new articles that should be of interest.

Article Recommendation

How will  TRIP recommend articles for you?

Machine learning methods based on clustering and classification are being investigated for providing reliable recommendations. 

We believe that initial article clusters should be identified using an algorithm known as k-means clustering.  Each user will then be classified as being interested in articles within a cluster based on attributes such as their first choice of article and user attributes (profession, country etc.) using a method where a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility is created.

Figure 5 Example of a Decision Tree where the top node could represents you and the other nodes represent related articles based on branch criteria.

QSPectral determined that decision trees are the most appropriate concept for meeting the requirements.  Decision tree methods can accommodate more data inputs over time. Various other transformations of inputs are possible and are robust to inclusion of irrelevant fields in the data, and produces transparent models for on-going analysis.

Further, we will use other methods that take a number of simple decision trees and combine them in some way to yield a final overall picture.  We propose techniques for iteratively averaging multiple deep decision trees, trained on different parts of the collected data, with the goal of reducing the variance.  Each iteration creates a simple decision tree on randomly selected subsets of input variables and input data. The final result where recommendations are provided will be formed through classifying a user through the aggregation of all such trees.

Logging in to Trip

One change we introduced recently is the increased user ‘pressure’ to log in.  A few people have contacted me to raise this as an issue and it made me realise we’ve added a barrier to use of Trip but we’ve not communicated why.  So, here goes…

Ultimately it’s part of a longer-term strategy to improve Trip and this requires us to better understand our users (which requires the user to be logged in).

Some background; my partners Dad was an eminent Professor of Anaesthetics (now retired) and I showed him Trip, and he said he’d use it for a bit.  He came back unimpressed!  His interest was in awareness, and a search for awareness on Trip (click here) returns no articles on awareness under anaesthesia, which was his interest/intention (see for yourself).

While this is an extreme example it does highlight that, without knowing the user, how can we optimise the search results?  Our system should have realised that the user was an anaesthetist and adjusted the results accordingly.  We’re doing lots of work on this area and are making real strides.  I blogged about in March with the article The important breakthrough which contained the following image:

As you can see from the results (in this experimental test system) we have detected the example user as a dentist and adjusted the results accordingly.  For an information retrieval ‘nerd’ (like myself) this is amazing.  I can think of no other innovation Trip has introduced that will come close to improving the search results as this. 

And there are loads more things we can do if we know the user. For instance improved email alerts – better linking users with evidence that is likely to be interesting and useful, as opposed to our current crude efforts!

But for it to work we need to know the user, which requires logging in.

Blog at WordPress.com.

Up ↑