Search

Trip Database Blog

Liberating the literature

The Answer Engine, first glimpse

We have got a number of developments that we want to get out before the end of the year, the most important being the answer engine.

The answer engine works by interpreting the search terms to infer the question and then using a variety of techniques to find the best answer.  This is then displayed at the top of the results.  An example is shown below:

answer-engine-acne-and-minocycline

In this example the user has searched for acne and minocycline and our system has interpreted this as being a question about the efficacy of minocycline in treating acne.  The system then looks for the best answer, in this case a Cochrane Systematic Review, and pulls through the conclusion.

The above is a mock-up and is likely to change somewhat, but it gives you an idea of what it looks like.

Initially, the system will be modest in scope and will be semi-automatic.  Our aim is to harness feedback and then make the appropriate changes.  By the middle of 2017 we hope to dramatically increase the scope and be fully automatic.

 

Dementia networks

We’ve been playing with our clickstream data – this time visualising it.  We’ve taken a single document Comorbidity and dementia: a mixed-method study on improving health care for people with dementia (CoDem) and mapped all the connected articles.  A connection is made if a user clicks on the above article and then, within the same search session, clicks on any other article(s).  We then use these connections to make some beautiful images, an example is below.  The article above is the big ‘blob’ towards the bottom of the image!

nihr-dementia

Search patterns in Trip

In preparation for the release of our answer engine (inferring clinical questions from the search terms and showing the ‘best’ answer) we’ve been analysing search terms. An area of particular interest are searches with a disease/condition and intervention (or similarly complex search). So, the top five search terms in Trip that follow this pattern are:

  • children paracetamol ibuprofen temperature
  • child cancer sibling parent
  • osteoarthritis glucosamine
  • pelvic floor strength
  • pressure ulcer prevention

I imagine few could have guessed that list!

But to illustrate the answer engine idea, if someone searches for osteoarthritis and glucosamine we’ll show – at the top of the results – this answer:

“Pooled results from studies using a non-Rotta preparation or adequate allocation concealment failed to show benefit in pain and WOMAC function while those studies evaluating the Rotta preparation showed that glucosamine was superior to placebo in the treatment of pain and functional impairment resulting from symptomatic OA.”

Which is taken from this Cochrane systematic review.

Marketing Trip

The Trip Database is just amazing. I love how it works and the features that it offers. But from my experience, it just doesn’t seem as though it is well-known or is getting the recognition from the scientific community that it deserves. What efforts are being done for marketing the Trip Database?

Sincerely,
Isaac M. E. Dodd
MD Student at Howard University College of Medicine

The above is not an uncommon type of email.  Users find Trip, love it and contemplate that it was perhaps accidental that they found it, that few of their colleagues know about it and that it should be more widely known.

One can rely on word of mouth, which works to an extent as we get hundreds of thousands of searches per month.  But to push on probably requires marketing!  Unfortunately, Trip’s marketing budget has historically been virtually zero.  I say virtually zero as I’m not sure if our various Twitter accounts count as marketing or not.

While marketing is not my strength I’m increasingly drawn to the need to do some!  The main aim being to raise awareness of Trip which will hopefully lead to more subscriptions. Historically, if we had money I’d put it towards product development not marketing.  But this is sort of self-defeating.  So, when confronted with something as vast as marketing – where does one start?

Do we:

  • Go down the social media route, embracing Twitter more (for instance)?
  • Try and use adverts?  Surely not as I doubt the engagement is there.
  • Work with 3rd parties in some mutually beneficial way? They get some product from Trip in return for raised profile of Trip.
  • Write more papers about the findings of Trip in peer-reviewed journals?

There we go, my marketing thoughts – completely unsophisticated – in one go.  I can think of variants of the above but nothing much more than that.

Clearly we need some help.  So, with a finite budget, what brings the best return on investment?

HELP!

 

Document summarisation

Complete stab in the dark, stimulated by Google’s release of their cutting edge TensorFlow product, is our adventure in to document summarisation.  The work below does not use TensorFlow, we’re starting gently with something a little easier to implement!  But the general idea is you take long documents and summarise them into something shorter and easier to digest.  All the work below involves automated methods and the summarisation is pretty much instant.

I’ve long held the idea (see Article social networks, meaning and redundancy) of trying to make sense of document clusters and this work is another exploration of this area.  So, I took 5 articles from the UTI and cranberry cluster mentioned in the article above, focusing on the prevention of UTIs and placed them through our test system.  Below are the results for 5 articles, with the title (with embedded URL to the actual abstract) and then the summary as generated by our system.

1) Cranberry juice fails to prevent recurrent urinary tract infection: results from a randomized placebo-controlled trial.
Summary: we conducted a double-blind, placebo-controlled trial of the effects of cranberry on risk of recurring uti among 319 college women presenting with an acute uti. conclusions.: among otherwise healthy college women with an acute uti, those drinking 8 oz of 27% cranberry juice twice daily did not experience a decrease in the 6-month incidence of a second uti, compared with those drinking a placebo.

2) Cranberry-Containing Products for Prevention of Urinary Tract Infections in Susceptible Populations: A Systematic Review and Meta-analysis of Randomized Controlled Trials
Summary: the aims of this study were to evaluate cranberry-containing products for the prevention of uti and to examine the factors influencing their effectiveness. medline, embase, and the cochrane central register of controlled trials were systemically searched from inception to november 2011 for randomized controlled trials that compared prevention of utis in users of cranberry-containing products vs placebo or nonplacebo controls.

3) A randomized clinical trial to evaluate the preventive effect of cranberry juice (UR65) for patients with recurrent urinary tract infection
Summary: the subjects drank 1 bottle (125 ml) of cranberry juice or the placebo beverage once daily, before going to sleep, for 24 weeks. in the group of females aged 50 years or more, there was a significant difference in the rate of relapse of uti between groups a and p (log-rank test; p = 0.0425).

4) Cranberries for preventing urinary tract infections
Summary: cranberry products significantly reduced the incidence of utis at 12 months (rr 0.65, 95% ci 0.46 to 0.90) compared with placebo/control. cranberry products were more effective reducing the incidence of utis in women with recurrent utis, than elderly men and women or people requiring catheterisation.

5) Recurrent urinary tract infection and urinary Escherichia coli in women ingesting cranberry juice daily: a randomized controlled trial.
Summary:  to compare the time to urinary tract infection (uti) and the rates of asymptomatic bacteriuria and urinary p-fimbriated escherichia coli during a 6-month period in women ingesting cranberry vs placebo juice daily. the proportion of women with p-fimbriated urinary e coli isolates during the intervention phase was 10 of 23 (43.5%) in the cranberry juice group and 8 of 10 (80.0%) in the placebo group (p=.07). minor adverse effects were reported by 24.2% of those in the cranberry juice group and 12.5% in the placebo group (p=.07).

Then, as a further twist I took the 5 summaries above and placed them through the summarisation software to give this ‘meta-summary’:

Among otherwise healthy college women with an acute uti, those drinking 8 oz of 27% cranberry juice twice daily did not experience a decrease in the 6-month incidence of a second uti, compared with those drinking a placebo. to compare the time to urinary tract infection (uti) and the rates of asymptomatic bacteriuria and urinary p-fimbriated escherichia coli during a 6-month period in women ingesting cranberry vs placebo juice daily.

Not sure it makes much sense!

So, as a further experiment I took all the conclusions from the 5 articles (as opposed to our summaries) and this is the result:

the potential protective effect we observed is consistent with previous studies and warrants confirmation in larger, well-powered studies of women with recurrent uti. there is some evidence that cranberry juice may decrease the number of symptomatic utis over a 12 month period, particularly for women with recurrent utis.

Perhaps it is just me, but this seems interesting.  Interesting as it’s not too bad and interesting because this could be fully automatic – we just need to automated the clustering and improve on the summarisation.  So, when we get Google’s cutting edge TensorFlow working (which uses deep learning) I’ll repeat the experiment and see where that takes us.

One small step along a winding and interesting journey!

 

Child health, autism and data analytics

Further to my post of yesterday (What do people look for on Trip?) I wanted to look in more depth at a topic, as much as to familiarise myself with what’s possible with out analytics.  Below is some analysis based on child health and subsequently exploring autism (the most common issue relating to child health).  NOTE: All data based on the most recent 4 weeks worth of data AND most users of Trip are health professionals!

Topics of interest

blog child tag cloud

Based on the titles of the top 50 articles that people have clicked we can explore what topics are of interest.

Autism time trend – showing how the use of the term changes over time

blog child autism timeline

Based on search terms used and plotted daily.  As we add in more historical data a weekly recording would smooth things out.  I added croup data for a comparison.

Autism drugs

  • acetaminophen (paracetamol)
  • aripiprazole
  • melatonin
  • mmr

Based on searches that included autism and a drug, revealing the top drugs searched for in relation to autism

Sources on information

blog child top publications

Based on the documents users clicked on.  We aggregate this on a ‘by publisher’ basis.

What do people look for on Trip?

Another output from the Horizon 2020 funded KConnect project, this time led by the Vienna University of Technology.  This new system allows us to see what people are looking at based on clinical area.  Below are the top results from three separate clinical areas (based on 2-3 weeks worth of data):

Dentistry

  • Flossing for the management of periodontal diseases and dental caries in adults
  • The efficacy of dental floss in addition to a toothbrush on plaque and parameters of gingival inflammation: a systematic review
  • The Efficacy of Brushing and Flossing Sequence on Control of Plaque and Gingival Inflammation.

Cardiology

  • Management of patients with stroke: rehabilitation, prevention and management of complications, and discharge planning
  • Blood pressure monitoring
  • Chronic Heart Failure – Diagnosis and Management

Mental Health

  • A systematic review of the clinical effectiveness and cost-effectiveness of sensory, psychological and behavioural interventions for managing agitation in older adults with dementia
  • Comorbidity of mental disorders and substance use
  • Evidence based guidelines for the pharmacological management of substance abuse, harmful use, addiction and comorbidity

This data is important as it indicates what clinicians are looking for; it indicates what clinician’s uncertainties are.  Often people plan new research, reviews or educational products based on assumptions.  With this data it can be more evidence-based!

One graphic to finish with.  Take the data for cardiology (not just the top three) and transform it to a tag-cloud:

Cardiology tag cloud

Steps away from better search results

When users interact with Trip we capture what they’re doing – the search terms, articles clicked etc.  Previously I have shown how we can map this data using this stored (clickstream data).  Below is a map of articles relating to urinary tract infection (UTI):

UTI large map annotated

You can see, from the annotation, that similar articles cluster (bottom left is a cluster of articles on UTI and cranberry).  To better understand how we create these graphs see these two articles:

I’ve been working with this data for a while and uses keep appearing.  One that is very attractive is in improving search results.  For the sake of argument let’s say the articles in the image above (indicated by individual nodes in the image above) are evenly spread in the top 2-3 pages of search results in Trip.

As soon as a users makes their first click they are telling us where they are, in  relation to their interest/intention, in the map of articles (see below):

UTI large map for Sept 2016 blog

Using the above example a user clicks on an article in the bottom left of the image (in a cluster of articles on UTI and cranberry) the chances are they are likely to be interested in others articles that are close by (1-2 ‘steps’ away).  This works on the same principle as normal maps – if you’re looking at a street map of New York and you’re looking at a particular road in, say, Brooklyn it’s likely that your immediate interest is in the area close by to that road as opposed to say the Mission in San Francisco.

So, could we create  a system that can allow users to re-order results as soon as they click on their first result?  Could we do this dynamically (no clicking)?  The principals seems sensible but as with most of these things it’s how to operationalise them that’s the key…!

Experiments in machine learning at Trip

At Trip we like to ‘muck around’ with new techniques to make the site even better.  Sometimes there is a clear reason and other times it’s just to explore these techniques to see what they can offer.  Currently we’re doing lots of work involving machine learning and recently we released our work on the automated assessment of bias in RCTs.  But a few other things we’re involved in:

Word2Vec: Completely speculative and I have no idea what the output will be (I believe that it looks for similarities and relationships between words/concepts).  This is working with Vienna University of Technology (TUW) as part of our Horizon 2020 funded KConnect project.  There is loads of hype around this technique so we thought it was too good an opportunity to not get involved.

Learning to Rank: Again with TUW this is a much more understandable technique.  It is a machine learning technique used to improve the search results.  It’s one of a number of algorithm tweaks we’re attempting and all will be thoroughly tested using interleaving or A/B testing (probably the former).

Document summarisation: Another speculative venture.  Yesterday I saw that Google have opened up something called TensorFlow to support document summarisation.  This is something I’ve been interested in for a while so I contacted my freelance machine learning contact and we agreed to give it a go (he did most of the work on our 5 minute systematic review system).  I’m not sure how document summarisation fits in with Trip but seeing outputs can only help me figure it out.

Hopefully we’ll start seeing results on all these projects before the end of the year.

One important thing to point out (and something I relish) is Trip’s ability to get involved in these projects and get things moving quickly.  The document summarisation work was set up within 12 hours of seeing the announcement of the TensorFlow being opened up (I’d never even heard of it before).  One can only imagine the bureaucratic steps a large organisation would need to go through to even start considering these ground-breaking initiatives.

Trip plays an important role in the health information retrieval ecosystem as we are so innovative.  Larger, better funded, members of the ecosystem observe and copy/adopt where we succeed. It’s classic diffusion of innovations.   I much prefer being at the front of the adoption curve!

Blog at WordPress.com.

Up ↑