Search

Trip Database Blog

Liberating the literature

Month

August 2006

Teething problems

Well the site is now live, which is a relief. Alas, not all the content decided to join us! Therefore, we’re currently re-indexing things. As it stands all the titles, URLs, publication information are there. However, we’re missing a small (but significant) proportion of text from the body of some publications.

Still it should all be there in time for our scheduled launch date of tomorrow!

Countdown to free-access – 1 day to go

We set a target of making TRIP free-access at the start of the year. By March/April the decision had been made. We then needed to give 6 months notice to our distributor (Update-Software) and here we are. Three-months before the end of the year with free-access and a radical re-design of the site.

In previous blog entries (as part of the ‘Countdown to free-access’ series) I have highlighted new features. The last post, in this series, is to highlight a simple fact – we will be free-access. There will be no subscription charges to search TRIP. Clinicians (and non-clinicians) will be able to go to www.tripdatabase.com and have free use of the TRIP Database.

It’s free!

Countdown to free-access – 2 days to go

New search algorithm

The biggest change to the new, free TRIP is the search algorithm. For the last 5+ years the TRIP search has been dominated by the distinction between a ‘title’ and ‘title and text’ search. This allowed for great searching. The rational being that if the document was about asthma it would be mentioned in the title and the vast majority of searches were on title only. This presents a couple of principle problems.

Firstly, if you do a search on asthma you would generate (even as a title search) a large number of results. This makes the task of identifying relevant material difficult. Why? Because users rarely want information about asthma. They may be interested in asthma and steroids, or asthma and allergies – rarely just asthma. This over-simplified search was highlighted in Professor Paul Glasziou’s evaluation which showed most people just searched for the actual disease. So if you wanted to look at asthma and steroids the best search would be:

1) Title search for asthma
2) Title and text search for steroids
3) Combine the results
4) Click on a results categories to see any results

So 4 steps to see any results – in hindsight that seems ludicrous!

Secondly, Google – well it’s a nice problem. But most people who use TRIP will invariably be more familiar with Google. So they’re used to adding any number of terms and letting Google quickly return results, which it does very skilfully! Also Google tends to be searched using multiple search terms. The average number of search terms used per search is gradually increasing over time, surely a reflection that users are becoming more sophisticated/discerning. We’re hoping this increased use of terms will be reflected in the new TRIP.

So, the challenge was to try and mimic the Google search interface (i.e. no ‘title’, ‘title and text’ distinction) yet still return good results. To a large extent we’ve produced a system that works well. We’re not saying it’s perfect and our role, from now, is to continue to improve on the search algorithm. The actual algorithm is based on three main variables:

1) Publication date – more recent articles score more highly than older documents
2) Publication – each publication (e.g. Cochrane, Bandolier etc) are given a score based on their rigour and clinical usefulness. This is based on our experience of answering 5,000+ clinical questions – we tend to know which publications answer clinical questions more than others. Our scores reflect this experience.
3) Textual analysis. The main issue is where the search terms appear. If you do a search for asthma and steroids if a document has both terms in the title it gets the highest score, if one term is in the title a lesser score while if the terms only appear in the text it scores lowest. Another, lesser, component is term density. If asthma is mentioned 50 times in a document it scores more highly than a document which only mentions it once.

The above variables are then combined to produce the results.

Given the nature of the search system good results for one person might be bad results to another and in testing we occasionally get results which surprise us. However, on the whole we are getting excellent results, this is our experience and from feedback from our external testers. But, we’ll continue to refine and enhance the search – feedback welcome!

Countdown to free-access – 3 days to go

Drug Box

One of the most frequent question types we get at our various clinical question answering services relate to drug information e.g. does this drug interact with that drug, what are the impliactions for pregnanct women etc. For this reason we have created the Drug Box. To start with we have used the most frequently prescribed drugs (around 200 of them). Anyone searching for information on the drug will be presented with the usual search results. However, where the sponsored links usually are, will be the Drug Box, see below:

We’re very pleased with this new service and new drugs will be added over the next few weeks and months.

Countdown to free-access – 4 days to go

Linking to TRIP

Moving to free-access opens up all sorts of opportunities to ‘distribute’ TRIP. We’ve got a number of ways:

Incorporation of search box into third-party websites. We’ll be supplying HTML for webmasters to incorporate a TRIP searchbox into their own web-pages. We had this feature previously (before we went subscription-based) and this proved very popular.

Web-services. This allows third-party resource to search TRIP via a SOAP interface, the results are returned in an XML format allowing the third-party resource to seamlessly link the TRIP results into their own application.

URL searches. Allow webmasters to add a specific URL with the search term embedded e.g. www.tripdatabase.com/…….criteria=measles will result (if someone clicks on the URL) in an automatic search of TRIP for measles.

Countdown to free-access – 5 days to go

Adverts/sponsored link.

While TRIP was closed to all but subscribers, the subscriptions helped us develop the site. However, if you remove the subscriptions you remove the revenue. The bottom line is that for TRIP to survive and continue to improve it needs a revenue, hence the adverts and sponsored links.

Adverts via Google ads. These are positioned in a relatively minor part of the site and are unlikely to generate a significant income. However, as we’re testing the business model we thought we should try this method. We are currently being searched around 15-20,000 times per day (likely to increase significantly) and if 0.1% click on adverts per day that should make for an interesting income.

Sponsored links. This system allows users to purchase a keyword so that when someone searches on, say, hypertension the sponsors messages gets displayed. This is an interesting experiment to see if anyone likes this method. We’re hoping it’ll be of some interest, especially given our volume of usage.

In an ideal world we’d prefer not to have adverts at all. We’re hoping it’s a small price to pay for free-access

Countdown to free-access – 6 days to go

EBM, Medical Images and Patient Information Tabs have been introduced, previously all the content was mixed into one search interface. As the number of results categories increased so the usability started to suffer.

An analysis showed that there were distinct search types, represented by our three search tabs. These tabs allow for easy movement between these domains.

EBM – our core material.

Medical images – an area we’re not renowned for but this is an excellent feature and is probably the largest, free, searchable collection of medical images on the internet.

Patient information leaflets – The principle aim in TRIP is to support clinicians in answering their clinical questions. However, these same clinicians frequently have a desire to locate patient information leaflets to give to their patients.

Tabbing is seen in most general search engines and the inclusion in TRIP will further enhance usability.

Countdown to free-access – 7 days to go

Each day between now and launch I will be highlighting a new feature on the site. With 7 days to go I’ll highlight the inclusion of selected peer-reviewed journal articles. These are being taken from 2 routes:

1) The big five general internal medicine journals – NEJM, JAMA, Lancet, BMJ and Annals of Internal Medicine. All articles, published within the last 5 years, will be included. We’re harvesting the content automatically from PubMed via the eUtilities.

2) BMJ Updates. I never fully appreciated the scale of this project. Basically, they scan over 100 ‘premier’ clinical journals and extract only those of high quality and of clinical relevance and interest (as judged by at least 3 clinicians from around the globe).

TRIP has historically focussed on secondary-review material and that will remain to be our focus. Our search algorithm will give precedence to secondary material but the system will ensure that highly relevant articles from these two sources are near the top of the results. The two sources will offer highly relevant material for our clinical users.

All systems go

We have now solved the last major issue with the new version of TRIP. Only a few really minor issues are outstanding. Not wanting to tempt fate but getting it all sorted with more than a week to spare – I think that’s a record for us!

We’re starting to get excited as, with the new tweaks, the search algorithm is behaving wonderfully. We genuinely believe this will be a significant ‘new’ tool for clinicians seeking answers to their clinical questions.

Blog at WordPress.com.

Up ↑