The TRIP Database search algorithm works pretty well most of the time. But the biggest annoyance to me is the ‘over promotion’ of eTextbook records. eTextbooks have their place, but at TRIP we try to offer the highest quality material first and then users can work down the ‘quality’ scale.
We’ve identified the cause – the overly high weighting caused by title term density – I will explain!
Our general view is that if a document contains the search term(s) in the title it is likely to be more relevant than a document that mentions it only in the text. As a result we give a higher weighting to the title score. The problem we have is that our underlying software (Lucene) incorporates a title word density score. So if you have two documents:
- Prostate cancer
- Blah blah blah prostate cancer
The first gets a very high score (100% match) while the second gets a lower score (40% match). Typically users search using 1-2 terms and eTextbooks, typically, have 1-2 terms title. While resources such as Cochrane, Bandolier etc have much longer titles. This has caused much frustration and we’ve even considered creating our own, bespoke, search mechanism (which would be costly and much slower).
However, a reading of Alf Eaton’s HubLog show’s he has more than a working knowledge of Lucene. A quick e-mail to Alf (he’s helped with advice on TRIP in the past) and he’s suggested a couple of fixes. We’re currently creating a testing system to test these. With any luck these alterations will be in place shortly.
Although we may be tempted to wait until the current round of upgrading is over and roll everything out in one go!