Monday, December 24, 2012

Evidence collections

This is one of the most important challenges facing Trip, one which I hope I can rely on your help.

How to help people use Trip to capture and publish evidence collections!?  Collections of evidence already exist and are typically time consuming.  Four examples:
  • ATTRACT - this is part of my NHS work.  Our team receives questions and finds appropriate evidence with which to answer it.  Relatively unstructured.
  • ATTRACT CME - a good example might be this review of obesity.  In this example it's a mixed collection of the latest evidence and background information.
  • BestBETs - these are reviews, based on questions arriving in emergency medicine, that tackle a single question.  In many ways these are similar to ATTRACT but are more structured.
  • Cochrane systematic reviews - these are highly structured collections of clinical trails. 
Non-health related collections are important and available, a few examples:

Summary: collections are everywhere and are clearly useful.

I see two issues in relation to Trip:
  1. Would Trip users like to make collections?
  2. If they do, what might it look like?
I think the answer to 1. is 'yes, assuming you can make it a rewarding and easy exercise'. 

But 2. is really problematic; how to create a product that looks great, is easy to use and facilitates the production of robust and useful reviews?  We can do a few clever things such as making it easy to group articles together, auto-reference and even suggest related articles.  But you're still left with the core problem - the middle of the collection - the actual content (sandwiched between title and references)?

I like the visual impact of something like pinterest (see another example from Doctors Without Borders).  Highly visual, so engaging.  The downside being there's not much space for text.   But again, I could see us allowing a user to pull in their documents of interest, annotating each article with the key point and then pulling it together with a summary and/or clinical bottom line.

At the top of the post I said this was the most important challenges to Trip, I believe it and I also believe if we get it right we will have created something hugely useful. 

So, if you read this and have any suggestions, no matter how silly/random you may feel they are, please let me know (via comment below or emailing me - jon.brassey@tripdatabase.com).  Often it just takes a few novel thoughts to unblock the creative process.  This perspective is exemplified by a comment I received at a Trip training session where a user said they would love to be able to 'tag' an article (or articles) saying these helped her answer a particular question.  In other words, she wanted to group articles together around answering a clinical question.  That simple request started all this thinking...!

Saturday, December 15, 2012

Relevancy in Trip

In Trip our search algorithm (the magic that decides which order articles appear on the results page) is made up of three main components:
  • Publication score - the higher quality the publication (think Cochrane, NICE, AHRQ) the higher the score.
  • Year score - a document from 2012 scores more highly than a document from 2011.
  • Text score - this analyses documents and assigns a score based on location of matches (e.g. if the search term appears in the title it scores more highly than if it only appears in the body of the text).
These separate scores are combined and the article with the highest score appears at the top and the rest of the results appear in descending score order. This typically works very well but there can be problems.  If a document scores lowly on one component and high on two others it can appear quite highly in the results.  This is typically not a problem expect, I think, in the case of text relevancy.

When someone does a search on Trip we retrieve every document that mentions the search term(s) and each of these documents are given a text score.  If we have a big document that mentions the search term once it will still be found and still get a score, even though it is obvious that the document isn't really about the subject.

So, what I'm thinking of doing is introducing a relevancy cut-off. If someone searches on Trip and the search generates a large number of results (say over 100) we introduce a text score cut-off.  This text relevancy score would still be quite low but enough to remove the really irrelevant results.  For example the text relevancy score ranges from 1 to 0.  In my mind the cut-off might be at around 0.1. 

Now, the issue with this is that the results are now being restricted, which I know makes many uncomfortable.  I think this depends on reason for searching Trip.  If you're a busy clinician wanting to just get really quick results it'd be no big deal.  However, if you're an information specialist wanting to ensure you've checked everything - it'd be seen less favourably.

Therefore, the compromise might be some sort of button/warning that says something like 'We have removed all articles Trip considers of low relevance to the search, click here to show all results'.  I'd like to think that's the best of both worlds.

Thursday, December 06, 2012

RCTs, Trip and the developing world

One thing that struck me recently was that we don't have a filter for RCTs in Trip.  Given the importance of these it seems remiss of us.  So, why not create one?  Well, I now plan to in early 2013.  We can build this by scraping content from PubMed using a suitable RCT filter (the current count is 438,900 trials) and I hope to work with Mendeley to highlight even more.  So, we should have a RCT collection of over 500,000 trials - which is hugely impressive.

While I was working through this idea I saw an email about the lack of systematic reviews suitable for the developing world (low and middle income countries, or LMIC).  As you may know we have done some work in this area (see our crowdsourcing initiative) but that's never really taken off. 

So, a thought occurred, why not use a filter for LMIC content?  A filter is a series of terms designed to highlight focussed content.  We'll use one for identifying RCTs (see here for further details) and it looks for terms/phrases such as randomised in the title.  There are a number of validated filters for RCTs, which is great.  A link from the Norwegian Satellite of the Cochrane Effective Practice and Organisation of Care Group highlights a LMIC filter.  It's unvalidated but looks like a great start.

So, pulling these two tales together:
  • We create a wonderful RCT database
  • We can tag RCTs if they're suitable for LMIC
  • We can tag the rest of the Trip content as being suitable for LMIC
This is massive!