We have just released a major improvement in the TRIP search results.
This has been brought about by the refinement of our ‘text cutoff tool’. Prior to this introduction a search would return every result that contained the search terms. So a search of acute kidney injury returned the CKS guideline on ankles and sprains! I imagine most people would agree that that is not even an average result – it simply shouldn’t be there. But how has it got there? It contains all the individual terms, so is returned.
So how does the text cutoff tool work?
Each search on TRIP looks for matches to the search term(s) used and these are ordered based on our algorithm. One component of the algorithm is a text score. The higher the text score the more likelt the result is to be pertinent to the search terms. Some factors that affect the text score include:
- Location – where the term occurs (e.g. if someone searches on asthma, documents with asthma in the title tend to get a text score higher than if they are only mentioned in the text).
- Term density. If you have two documents, both 1000 words long and without the search term in the title and one mentions the search term once and the other ten times, the latter will receive a higher text score.
The idea behind the text cutoff is simply to say that all results with a text score of lower than X do not get returned.
This sounds nice and simply but it fairly arbitary and if you set the score too high you may miss important documents out, too low and you allow too many in. It’s also complication that there is no magic score that defines pertinence. We have tested lots of combinations of search terms and text cutoff points and we’ve managed to remove an awful lot of noise from the searches. The cutoff point we’ve used has reduced the number of results from the acute kidney injury from 2,609 to 440.
Ironically, the CKS guideline on ankles and sprains still remains. Lowering the score to remove that caused the removal of other documents from other searches.
We’re not claiming its perfect, but it’s pretty good!