Search

Trip Database Blog

Liberating the literature

Month

May 2008

The sad end of the NLH Q&A Service

The saddest thing for me about the ending of the NLH Q&A Service is the reaction of the users. We’ve just placed a notice on the site alerting users to the ending of the service. Two comments received within 35 minutes of it the notice going live:

“I have just found this answering service through a colleague. I think it is absolutely brilliant. I have just found the answer to a question within 1 minute which would normally have taken me hours, maybe days of research. I am paid £40k p.a. so this time saving is bound to be very valuable to the NHS when multiplied by all the people who use it. Why is the service ending???”

“Hello. As an EBM practitiopner I am disappointed to read that as of June 27th the NLH Q&A Service will cease operations and be unable to answer any new questions. Could you kindly inform me of the reasons behind this decision, and what will replace this very important and I think successful primary care service. Thanks,”

I’m so proud of what we’re achieved and the fact that clinicians are prepared to contact us to express surprise and disappointment is impressive. The NLH suffers from a lack of engagement with primary care health professionals, an area that needs more support that secondary care (that has access to librarians). By removing of this key service it further hampers engagement.

Still the secondary care clinicians will still have their services…

Dramatic increase in search speed

I’ve been worried about the search speed for some time. The removal of the number of results for each individual category was meant to improve things, it didn’t.

Yesterday (see previous post) I introduced a system that removed search results which we considered not relevant. As well as improving the search relevancy it reduced – dramatically – the number of search results returned. By accident this has resulted in a huge increase in search speed.

Not all accidents are bad!

Improving search on TRIP

We have just released a major improvement in the TRIP search results.

This has been brought about by the refinement of our ‘text cutoff tool’. Prior to this introduction a search would return every result that contained the search terms. So a search of acute kidney injury returned the CKS guideline on ankles and sprains! I imagine most people would agree that that is not even an average result – it simply shouldn’t be there. But how has it got there? It contains all the individual terms, so is returned.

So how does the text cutoff tool work?

Each search on TRIP looks for matches to the search term(s) used and these are ordered based on our algorithm. One component of the algorithm is a text score. The higher the text score the more likelt the result is to be pertinent to the search terms. Some factors that affect the text score include:

  • Location – where the term occurs (e.g. if someone searches on asthma, documents with asthma in the title tend to get a text score higher than if they are only mentioned in the text).
  • Term density. If you have two documents, both 1000 words long and without the search term in the title and one mentions the search term once and the other ten times, the latter will receive a higher text score.

The idea behind the text cutoff is simply to say that all results with a text score of lower than X do not get returned.

This sounds nice and simply but it fairly arbitary and if you set the score too high you may miss important documents out, too low and you allow too many in. It’s also complication that there is no magic score that defines pertinence. We have tested lots of combinations of search terms and text cutoff points and we’ve managed to remove an awful lot of noise from the searches. The cutoff point we’ve used has reduced the number of results from the acute kidney injury from 2,609 to 440.

Ironically, the CKS guideline on ankles and sprains still remains. Lowering the score to remove that caused the removal of other documents from other searches.

We’re not claiming its perfect, but it’s pretty good!

Health and Second Life

I’ve been on Facebook for a while, for most of that time I have not understood the appeal. I joined due to the buzz and I wanted to understand it.

Similarly I joined Second Life, hoping to understand it. In many ways I can see the appeal of Second Life more than Facebook. I don’t use it but can appreciate the ability to immerse yourself in an alternative reality. Perhaps if I was 20 and had lots of time on my hands I’d be hooked.

But this report Providing Consumer Health Outreach and Library Programs to Virtual World Residents in Second Life has just been released:

“The major accomplishments of the project include: the careful and thorough development of an island in the virtual world called Second Life, replete with buildings, grounds, meeting spaces, exhibits, collections, and other information resources and services; development and deployment of a variety of informational exhibits and displays; and numerous contacts and collaborative efforts with health-related groups and organizations. Overall, HealthInfo Island has become the focal point for many health-related initiatives in Second Life. It has been very successful and well-received by the general public in Second Life, healthcare professionals, and health sciences librarians.”

Situated Question Answering in the Clinical Domain

I’m finding myself increasingly drawn to computer/statistical techniques for helping in the Q&A service. Many papers, such as highlighted below, look at answering the question; my main interested – currently – is in using sematic analysis in updating clinical questions.

Situated Question Answering in the Clinical Domain: Selecting the Best Drug Treatment for Diseases

Abstract: Unlike open-domain factoid questions, clinical information needs arise within the rich context of patient treatment. This environment establishes a number of constraints on the design of systems aimed at physicians in real-world settings. In this paper, we describe a clinical question answering system that focuses on a class of commonly-occurring questions: “What is the best drug treatment for X?”, here X can be any disease. To evaluate our system, we built a test collection consisting of thirty randomly-selected diseases from an existing secondary source. Both an automatic and a manual evaluation demonstrate that our system compares favorably to PubMed, the search system most commonly-used by physicians today.

Oops, we did it again

Just to show our million plus searches wasn’t a fluke for March, we had 1,077,190 searches in April. As April only contains 3o days this is more of an acheivement. Perhaps more meanigful is the average daily searches:

  • March – 32,302
  • April – 35,906

So roughly a 10% increase…

Blog at WordPress.com.

Up ↑