The Semantic Web - Fee Fi FOAF, um…

Okay, so this has less than rampant application right now within the mainstream, but I would be remiss to not write something about the Semantic Web. It is a topic that interest me tremendously because of the potential it holds for so many areas of our lives and the way we utilize data, learn, and use the internet. I happen to believe in the Semantic Web’s implications for health care, knowledge exchange, and many other applications. I think it can, could, and hopefully will - change the world.

Honestly.

The concept of a Semantic Web is easy enough to explain poorly. It is harder to explain well, particularly without getting into things that might leave behind a non-technical reader.

The easiest representation of the Semantic Web is the following:

The Semantic Web in Picture Form

But this may not mean a heck of a lot to you, and it isnt the easiest thing to understand.

There is some difference between an URI and a URL, and it has something to do with the fact that a URL *locates* a resource while a URL *identifies* a resource, but I think that almost academic. A URI is the location of a resource. It is the “Uniform Resource Identifier” and that means that it is *the* identifier for a given resource. In something as robust and untamed as the web, we need to have a “final say-so” when it comes to defining things. There would just be too many interpretations otherwise. Mind you, there can be more than one URI for a given resource, but it is important to agree upon your foundation across the elements of your application, query, or Semantic Effort.

Unicode is a standard for text. It is the encoding standard used in the Semantic Web, and it allows a wide variety of data sources to be used, provided that they stick to the spec. W3C is a massive and amazing organization, and I expect an invite from them any day now that says they want me, need me, gotta have me on one of their boards. Standards are a good idea. Standards allow for coherence and reliability. Just like your development staff should follow standards to ensure the quality of what they produce, the web employs a variety of standards to ensure some measure of control and therefore, progress.

The foundation of the Semantic Web is the ability to locate resources that give us information above and beyond the classic database “datatypes.” A URI such as foaf:knows will tell us what the “knows” relationship means. This is a good start.

The language that runs through the veins of the Semantic Web is RDF. RDF looks a heck of a lot like XML, but with one or two imporant differences that are not apparent visually:

If I want to feed the Semantic Web a fact, I structure that fact. Let’s say, “Josh knows Susan.” It would look something like this:

<person>

<name>Josh</name>

<knows>

<person>

<name>Susan</name>

</person>

</knows>

</person>

Of course, Josh and Susan have many more properties than just the “knows” property. There are an infinite number of relationships between people, and there are just as many properties of people. The first thing we need to do is define what they are. Now, I can tell you what they are and you can tell me what they are, but none of that arguing will mean anything until we come up with something definitive and point to it as the authority. If the Semantic Web is going to function, it must have a recognized authority that defines the various relationships within it.

The URI is a convienent way of naming authorities and telling people (or machines) where a relationship or property is defined. URI = Uniform Resource Identifier, and it is important that the authoritative URI defines these relationships and properties clearly and uniquely.

This is really cool stuff to me as an ex-Philosphy major, and this is what the Semantic Web is built upon.

A URL (URI for our purposes here) such as http://xmlns.com/foaf/spec/#term_knows tells us what “knows” means, according to xmlns (the domain in the URL). At some point, we will need to have a recognized authority for all property and relationship definitions. I am interested to see who gets entrusted with this. Of couse, this is all Open Source and you would like to think that people love people and want to better humankind, but there is always the opportunity for businesses (or bad folks) to stick their nose into something that worked quite well without them but, in their minds, is so much better off making money for them. Google did not launch Adwords right away. But this is another topic for another time.

If you go to a browser window and enter http://xmlns.com/foaf/spec/#term_knows you will get a page that people can understand. This is key. Give it a shot. I have gone through the trouble of making a hyperlink so you can check it out for yourself. The reason this is key is that the Semantic Web is not about computers understanding things just to understand them, but about computers being able to act as agents of ours that we set up to go out and do our bidding.

Semantic Secret Agent

 

Semantic Agent

“The FOAF:knows”

Resources are important. I am no authority on medicine, but if someone called me and said they had a headache, I might recommend Tylenol. I would be their resource for the information, and they would be in sad shape if they had a brain tumor or something less than benign. Resources are important, and they can function as authorities.

If I was going to create a document that stated who the resource for information regarding headaches should be, I might name my hospital, Beth Israel Deaconess. They would be a better resource than me.

And so we must have a defined resource for everything that we are going to look towards for meaning.

The Semantic Web is an internet where text, data, and information is understandable by machines through the utilization of markup, resources, and ontology.

To clarify: right now I can do a Google search on “Poison” and I will get back a list of everything from cyanide to an 80’s Hair Band. In order for me to find what I am looking for, I have to read. I understand, as a web surfer, that Google returns a set of possible matches and I get to choose the one that looks the most like what I want.

In the Semantic Web, Poison would be understood in its correct context and have dimension. This is where RDF and XML diverge.

XML is good at organizing data, but it makes no claims as to meaning within a context. We can say that a bus has a route and a route contains streets, and we can represent that quite well with XML, but XML does not tell us that a route is the same as a path, or that a route is something that gets us from one place to another. XML can provide schema, but not ontology. Ontology is again, the way things relate to each other. It is a term borrowed from philosophy.

RDF adds these ontologies and resources for the definition of relationships. “Triples” have an XML syntax but consist of subject, predicate, object structure and are what make RDF capable of inferences or drawing relationships.

Josh = Subject

Knows = Predicate

Susan = Object

The Semantic Web aims to organize data that would otherwise be known as metadata in a way that allows agents (software we send on missions) to discover things for us and obviate what would otherwise be unknown.

The semantic web is about making machine-readable web pages, or really, machine understandable pages out of HTML and other data sources. Instead of just indexed text, the web becomes organized, meaningful, providing useful data that we can create agents for and utilize without the agonizing process of searching, hunting, and doing “deep diving” of or own.

A Semantic Web would be a Web that understands the things within it and can reveal relationships that we would not otherwise be aware of.

If you lay out data or web pages in a way that allows machines to understand their content and context, you can create an environment where computers can learn to infer (indeed, learn to learn) and do comparitive analysis as well as other cool stuff. A computer could infer that if Josh Milane is on the W3C Semantic Web Board, and Susan Jones is on the W3C Semantic Web Board, we share a common interest and may know each other. If Susan’s son has an address in Boston and an employment status of “looking”, the Semantic Web could tell him “Hey, ask your Mom to talk to Josh.” This is a very simple example, but extend it to health care (for example) and you can begin to see dramatic potential.

We can feed data from clinical trials into the Semantic Web and all kinds of relationships can emerge simply by virtue of the data itself. We might find that everyone who took Medication X along with Medication Y and had diabetes suffered a minor stroke.

Metadata is just data. If we give it meaning and structure, data becomes meaningful on it’s face

We might also use the Semantic Web to help us corall the enormous amount of information that is pouring into areas of research like nanomedicine. Scientists are collecting data regarding mitochondrial receptors such as: the functions that they perform, how they function in relation to each other, how they respond to quantum dots in therapeutic testing environments, their size, the characteristics a drug needs to enter a specific receptor, the location of a specific receptor, relationships that receptors share (such as signal cascades), expression profiles and their relation to disease states, age-related expression, free radical production… and a lot of other things that make my head want to explode.

An obvious advantage of allowing machines to understand and arrange data is that we can establish agents to infer, present us with conculsions, and follow logical models. We can share the world’s knowledge (Josh knows Susan), but we can also draw conclusions and see data that has already been structured for us, thereby learning (for instance) that the ANT receptor, the Mitochondrial Permeability Transition Pore, and Cyclophilin D all shrink when Seramide is introduced. This would become obviated through ontologies.

Sound crazy? A little, maybe, but dont tell that to the folks at Neurocommons.

 

The main things that make this all possible are syntax, semantics, and ontology. SPARQL is the query language of the Semantic Web and it was designed to look a lot like SQL. While you might not be ready to structure all your data so that it is SPARQL-friendly, you can provide a SPARQL endpoint on your more traditional relational databases and allow machines to query your databases. While your databases might only contain data, we are going to see data be structured within ontology.

Ontonogy is very interesting. OWL is unique in that it is an acroynm that takes it’s parent words out of order. OWL = Web Ontology Language. It is unique in other ways, too (as amazing as that might be).

OWL is based on RDF. It depends on the notion of triplets (again: subject, predicate, object or domain, property, range, where the domain is the URI, the property is the relationship, and the range is the set of values that the property may have)

Color may be a predictate/property, and red may be an object/range. We could see that my car and your toenails have a relationship. This may be meaningless, or it may be meaningful. The point is, machines can understand it and we can create agents to look for things that are meaningful. Besides Neurocommons, other folks are taking stabs at this for medicine and biosciences and many other things as well.

It is important to understand that while OWL looks like RDF and XML, it is only relevant within a specific context: it’s own. This allows for highly-specialized ontology that describes highly-specialized data and niches. It is also a nice example of what I would liken to Object Oriented Logic. Before learning about the Semantic Web, I only knew OWL as a kind of bird. It is all about context.

The scenario I painted earlier, where a computer learns about mitochondrial receptors and draws inferrences is not quite a reality yet. As it stands, the Semantic Web is pretty good at understanding that Josh knows Susan and a bit more than that, but there is a lot of work that remains to be done. Some of that work is technical, and some is plain old labor. We need to implement Semantic hooks and links (RDF and Ontologies) into the web. We can begin using RDF before the encryption and higher levels of the layer cake are figured out. I encourage you to stay tuned to developments within the SemWeb and to read about it, think of ways in which it could benefit you, the world, humankind.

Josh Milane

MIT Technical, Boston


2 Responses to “The Semantic Web - Fee Fi FOAF, um…”

  1. Greta says:

    Fascinating.

    ~G

  2. Blog of a Boston-based IT Consultant » Shift Happens, Kinda. says:

    [...] enough that they built Google, right? Google is a tool, not the end all be all of information. The Semantic Web will be [...]

Leave a Reply

You must be logged in to post a comment.