Freebasing

How can we make computers smarter?

Well, that's not exactly the purpose of the semantic web. Wikipedia explains: "The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content."

If you think of the Internet as a library whose documents you can access using keywords, the semantic web is more like a concordance for that library.

To use the Internet now, you enter a search term and a search engine will bring you back places (websites) on the internet where that term shows up. And that's about as far as it goes. The semantic web aims to do more. It wants to provide more information to the computer so that it can extract more meaningful information from all the data that is literally "hidden" in a typical web page. Here is an example provided by http://logicerror.com.

"Here's an example of a document in plain text:

- I just got a new pet dog. -

As far as your computer is concerned, this is just text. It has no particular meaning to the computer. But now consider this same passage marked up using an XML-based markup language (we'll make one up for this example):

"I just got a new pet dog."

Notice that this has the same content, but that parts of that content are labeled. Each label consists of two "tags": an opening tag (e.g., ) and a closing tag (e.g., ). The name of the tag ("sentence") is the label for the content enclosed by the tags. We call this collection of tags and content an "element." Thus, the sentence element in the above document contains the sentence, "I just got a new pet dog." This tells the computer that "I just got a new pet dog" is a "sentence," but -- importantly -- it does not tell the computer what a sentence is. Still, the computer now has some information about the document, and we can put this information to use.

Similarly, the computer now knows that "I" is a "person" (whatever that is) and that "dog" is an "animal."

The idea, in short, it to permit regular human statements to become machine processable. Here's an example provide by Logic Error: "For example, I could search the Web for all book reviews and create an average rating for each book. Then, I could put that information back on the Web. Another website could take that information (the list of book rating averages) and create a "Top Ten Highest Rated Books" page."

There are companies already hard at work attempting to make the web even more meaningful to computers and humans alike. One of them is Metaweb, a company which recently introduced Freebase, an "open, semantically marked up database of information."

According to its creators, Freebase is an "open shared database of the world's knowledge." While that may sound a bit like Wikipedia, the idea behind Freebase is that it performs its feats not via human sweat and toil, but by linking concepts and relationships into a gigantic network. That is, the computer will ultimately be doing the heavy lifting.

As we are told by ReadWriteWeb: "Any information contained inside the database is accessible and can be retrieved via queries. In addition, the data in Freebase is under a Creative Commons license - meaning that is readily exportable and useful by others."

Moreover, "when it comes to defining the meanings of things, Freebase is focused on community, with collective editing, attribution, and collaboratively built semantics. This last point is quite crucial - the founders of Freebase believe that meaning has to emerge from the collaboration between users. As such, Freebase is one of the first experiments of web-scale social contracts. The site is really focused on the notion that information is not encumbered by licenses and is free to use."

If Freebase is a huge database of information gleaned from the Internet, you search it by writing typical database queries: "To query Freebase you use the Metaweb Query Language (MQL), which is based on JSON. The language is meant to be very simple and it is actually very interesting as well. The idea is that you fill out a tree which represents a partial graph with pieces that you know and then the system basically fills in all the slots that you left blank and delivers back all possible subgraphs.

"For example, say you are watching a movie and you can't tell what it is. You know that the movie stars Patrick Swayze and an actress who was also in "Tank Girl." So you create a movie query and express all these facts, using JSON-style syntax. And when you run the query you get back that the actress is Lory Petty and the movie is "Point Break" and you also get links to IMDB. So the query and the results have the same structure and to find matches you simply traverse the set of results that is returned.

Building on this example, Freebase is really meant for complex inferencing queries, the sorts of questions that Google has no way of answering using its statistical frequency algorithms. For example, what US senators took money from a foreign entity? Turns out that both Barak Obama and Hillary Clinton received donations from UBS AG, based in Switzerland. That is a complex inferencing query that needs to be expressed in a query language before it can be answered and so questions of this nature are outside of the reach of any search engine -- and Wikipedia too, for that matter." (From ReadWriteWeb)

Comments

Popular Posts