Trip Database Blog

Liberating the literature


September 2016

Document summarisation

Complete stab in the dark, stimulated by Google’s release of their cutting edge TensorFlow product, is our adventure in to document summarisation.  The work below does not use TensorFlow, we’re starting gently with something a little easier to implement!  But the general idea is you take long documents and summarise them into something shorter and easier to digest.  All the work below involves automated methods and the summarisation is pretty much instant.

I’ve long held the idea (see Article social networks, meaning and redundancy) of trying to make sense of document clusters and this work is another exploration of this area.  So, I took 5 articles from the UTI and cranberry cluster mentioned in the article above, focusing on the prevention of UTIs and placed them through our test system.  Below are the results for 5 articles, with the title (with embedded URL to the actual abstract) and then the summary as generated by our system.

1) Cranberry juice fails to prevent recurrent urinary tract infection: results from a randomized placebo-controlled trial.
Summary: we conducted a double-blind, placebo-controlled trial of the effects of cranberry on risk of recurring uti among 319 college women presenting with an acute uti. conclusions.: among otherwise healthy college women with an acute uti, those drinking 8 oz of 27% cranberry juice twice daily did not experience a decrease in the 6-month incidence of a second uti, compared with those drinking a placebo.

2) Cranberry-Containing Products for Prevention of Urinary Tract Infections in Susceptible Populations: A Systematic Review and Meta-analysis of Randomized Controlled Trials
Summary: the aims of this study were to evaluate cranberry-containing products for the prevention of uti and to examine the factors influencing their effectiveness. medline, embase, and the cochrane central register of controlled trials were systemically searched from inception to november 2011 for randomized controlled trials that compared prevention of utis in users of cranberry-containing products vs placebo or nonplacebo controls.

3) A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection
Summary: the subjects drank 1 bottle (125 ml) of cranberry juice or the placebo beverage once daily, before going to sleep, for 24 weeks. in the group of females aged 50 years or more, there was a significant difference in the rate of relapse of uti between groups a and p (log-rank test; p = 0.0425).

4) Cranberries for preventing urinary tract infections
Summary: cranberry products significantly reduced the incidence of utis at 12 months (rr 0.65, 95% ci 0.46 to 0.90) compared with placebo/control. cranberry products were more effective reducing the incidence of utis in women with recurrent utis, than elderly men and women or people requiring catheterisation.

5) Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial.
Summary:  to compare the time to urinary tract infection (uti) and the rates of asymptomatic bacteriuria and urinary p-fimbriated escherichia coli during a 6-month period in women ingesting cranberry vs placebo juice daily. the proportion of women with p-fimbriated urinary e coli isolates during the intervention phase was 10 of 23 (43.5%) in the cranberry juice group and 8 of 10 (80.0%) in the placebo group (p=.07). minor adverse effects were reported by 24.2% of those in the cranberry juice group and 12.5% in the placebo group (p=.07).

Then, as a further twist I took the 5 summaries above and placed them through the summarisation software to give this ‘meta-summary’:

Among otherwise healthy college women with an acute uti, those drinking 8 oz of 27% cranberry juice twice daily did not experience a decrease in the 6-month incidence of a second uti, compared with those drinking a placebo. to compare the time to urinary tract infection (uti) and the rates of asymptomatic bacteriuria and urinary p-fimbriated escherichia coli during a 6-month period in women ingesting cranberry vs placebo juice daily.

Not sure it makes much sense!

So, as a further experiment I took all the conclusions from the 5 articles (as opposed to our summaries) and this is the result:

the potential protective effect we observed is consistent with previous studies and warrants confirmation in larger, well-powered studies of women with recurrent uti. there is some evidence that cranberry juice may decrease the number of symptomatic utis over a 12 month period, particularly for women with recurrent utis.

Perhaps it is just me, but this seems interesting.  Interesting as it’s not too bad and interesting because this could be fully automatic – we just need to automated the clustering and improve on the summarisation.  So, when we get Google’s cutting edge TensorFlow working (which uses deep learning) I’ll repeat the experiment and see where that takes us.

One small step along a winding and interesting journey!


Child health, autism and data analytics

Further to my post of yesterday (What do people look for on Trip?) I wanted to look in more depth at a topic, as much as to familiarise myself with what’s possible with out analytics.  Below is some analysis based on child health and subsequently exploring autism (the most common issue relating to child health).  NOTE: All data based on the most recent 4 weeks worth of data AND most users of Trip are health professionals!

Topics of interest

blog child tag cloud

Based on the titles of the top 50 articles that people have clicked we can explore what topics are of interest.

Autism time trend – showing how the use of the term changes over time

blog child autism timeline

Based on search terms used and plotted daily.  As we add in more historical data a weekly recording would smooth things out.  I added croup data for a comparison.

Autism drugs

  • acetaminophen (paracetamol)
  • aripiprazole
  • melatonin
  • mmr

Based on searches that included autism and a drug, revealing the top drugs searched for in relation to autism

Sources on information

blog child top publications

Based on the documents users clicked on.  We aggregate this on a ‘by publisher’ basis.

What do people look for on Trip?

Another output from the Horizon 2020 funded KConnect project, this time led by the Vienna University of Technology.  This new system allows us to see what people are looking at based on clinical area.  Below are the top results from three separate clinical areas (based on 2-3 weeks worth of data):


  • Flossing for the management of periodontal diseases and dental caries in adults
  • The efficacy of dental floss in addition to a toothbrush on plaque and parameters of gingival inflammation: a systematic review
  • The Efficacy of Brushing and Flossing Sequence on Control of Plaque and Gingival Inflammation.


  • Management of patients with stroke: rehabilitation, prevention and management of complications, and discharge planning
  • Blood pressure monitoring
  • Chronic Heart Failure – Diagnosis and Management

Mental Health

  • A systematic review of the clinical effectiveness and cost-effectiveness of sensory, psychological and behavioural interventions for managing agitation in older adults with dementia
  • Comorbidity of mental disorders and substance use
  • Evidence based guidelines for the pharmacological management of substance abuse, harmful use, addiction and comorbidity

This data is important as it indicates what clinicians are looking for; it indicates what clinician’s uncertainties are.  Often people plan new research, reviews or educational products based on assumptions.  With this data it can be more evidence-based!

One graphic to finish with.  Take the data for cardiology (not just the top three) and transform it to a tag-cloud:

Cardiology tag cloud

Steps away from better search results

When users interact with Trip we capture what they’re doing – the search terms, articles clicked etc.  Previously I have shown how we can map this data using this stored (clickstream data).  Below is a map of articles relating to urinary tract infection (UTI):

UTI large map annotated

You can see, from the annotation, that similar articles cluster (bottom left is a cluster of articles on UTI and cranberry).  To better understand how we create these graphs see these two articles:

I’ve been working with this data for a while and uses keep appearing.  One that is very attractive is in improving search results.  For the sake of argument let’s say the articles in the image above (indicated by individual nodes in the image above) are evenly spread in the top 2-3 pages of search results in Trip.

As soon as a users makes their first click they are telling us where they are, in  relation to their interest/intention, in the map of articles (see below):

UTI large map for Sept 2016 blog

Using the above example a user clicks on an article in the bottom left of the image (in a cluster of articles on UTI and cranberry) the chances are they are likely to be interested in others articles that are close by (1-2 ‘steps’ away).  This works on the same principle as normal maps – if you’re looking at a street map of New York and you’re looking at a particular road in, say, Brooklyn it’s likely that your immediate interest is in the area close by to that road as opposed to say the Mission in San Francisco.

So, could we create  a system that can allow users to re-order results as soon as they click on their first result?  Could we do this dynamically (no clicking)?  The principals seems sensible but as with most of these things it’s how to operationalise them that’s the key…!

Blog at

Up ↑