I love generating these network maps and I keep returning to these over the years.

The above is a map based on a sample of articles on Covid-19. Each article is represented by a node (grey circle) and the edges (lines of various colours) represent connections between them. The project was to explore ways of grouping documents with a view to speeding up evidence reviews. In this work the connections were (a) semantic and (b) citation.

What is clear is that the articles group around topics. But given the experimental nature of the work, the small sample and imperfect data I’m loathed to draw any firms conclusions but I am taking it as another endorsement of this approach, one I want to explore next year.

But how might such knowledge be useful? Here are a few and I’d be delighted to hear of any other suggestions:

  • Improve search 1 – if a user clicks on an article in a distinct cluster you can immediately highlight the closest other articles.
  • Improve search 2 – when someone searches you could highlight the distinct clusters and use it as a form of search refinement. So, using the above diagram, a user might have searched for Covid and we could highlight the three clusters.
  • Improve search 3 – a user might select 10 of the articles in a cluster but miss an article – we could flag this up.
  • Better intelligence – we could monitor the clusters and see when new articles become joined. We could then alert users who had interacted, previously, with the cluster.
  • Rapid reviews – we could highlight all the RCTs and/or systematic reviews in a cluster and start to extract value from each trial (e.g. risk of bias, sample size).

When we roll it out we will be able to include a third type of connection – clickstream data – which we’ve previously demonstrated to be incredibly powerful. It’s at times like this I wish we had a sizeable R&D budget