Automated reviews – internal testing

This morning, we released the fully automated Q&A system internally. We’ve already asked it a number of questions, and responses are coming back in 10–30 seconds, which is promising.

That said, it hasn’t all been plain sailing—maybe my expectations were a bit too high. Two main issues cropped up:

Format – The answers didn’t look great. They lacked flow and polish. Thankfully, this is a relatively easy fix.
Content – Moving from the web interface of the LLMs (e.g., ChatGPT) to their APIs (so we can access the data without visiting the site) and then ‘stitching’ all the steps together, introduced some problems. In previous tests, we didn’t have a single bad answer. This new system, though, delivered a few that just weren’t up to scratch. I don’t mean disastrously wrong – just not good enough.

We’re now working through these issues, refining the prompts (the instructions we give to the LLMs), and tightening things up.

It’s a bit disappointing not to get it right first time, but we don’t think the fixes will be too onerous. The next version should be much stronger—and hopefully ready for external testing soon.

	Help us shape the ne… on Learning from user feedback: h…
	When good evidence g… on HTML Scissors
	A Research Agenda Bu… on Turning Research Into Practice…
	Turning Research Int… on What 10,000 Clinical Questions…
	Bookmarks and AskTri… on A fresh new look for Bookmarks…

	Help us shape the ne… on Learning from user feedback: h…
	When good evidence g… on HTML Scissors
	A Research Agenda Bu… on Turning Research Into Practice…
	Turning Research Int… on What 10,000 Clinical Questions…
	Bookmarks and AskTri… on A fresh new look for Bookmarks…

Trip Database Blog

Liberating the literature