Saturday, March 24, 2007

Gwagle, TRIP and the semantic web

An interesting blog post that reports that TRIP and Gwagle are helping to usher in the semantic web by having a narrow focus.

My interpretation of this is that, as people using clinical terms there is a significant loss of ambiguity. In Google, if you type weight, it might be about the weight of a whale, yacht, car, person, building - pretty much anything. However, in TRIP or Gwagle, a search for weight is pretty much likely to refer to the weight of a person (or other biological entity) or something like low-molecular weight heparin.

1 comment:

Patrick said...

Thank you for the kind mention Jon.

You wrote,

"In Google, if you type weight, it might be about the weight of a whale, yacht, car, person, building - pretty much anything. However, in TRIP or Gwagle, a search for weight is pretty much likely to refer to the weight of a person (or other biological entity) or something like low-molecular weight heparin"

What you're talking about here is restricting the document set over which the term "weight" is relevant. What I meant is slightly different, but the slightness will prove crucial particularly as the web matures.

We are both surely talking about the relation of search terms and document relevance in IR, but specifically I was talking about the details of that relationship--namely, user context and intent, and how semantics can be represented to a non-trivial extent simply by sticking solely to a particular context and set of users.

Useful semantics are accidental or emergent in this way. In the process of restricting document set domain, you also reduce the number of semantically-related terms to "weight" (to keep with your example).

For example, in Google "weight" is simply a token, a dumb term, a string of six characters in a particular sequence. You may intend to search for the molecular weight of heparin when using a search string like "weight heparin" on google but you might receive in return a clinical anecdote by a patient retelling his experience being treated with heparin and describing the events as having great weight. Google does not know the difference.

Obviously Trip doesn't know the difference between the senses of "weight" either. But by using Trip rather than google means you're far more likely to get terms that match and, transitively, relevant results.

After all, "weight" might have five synonyms and hundreds of holonyms (I'm guessing) in a broad universal sense, but in the medical domain, that might be two synonyms and 40 holonyms.

While the semantics in the Trip database are not completely explicit, terms wielded by users in searches will be more likely to return documents containing the user's intended meaning for the term. And the medical context is of course rather guaranteed.

Now this isn't semantic in the sense that document representations and search queries are expanded by semantic neighbors of the terms. but it's a big start to be sure.