Go Back

Web 3.0 Moving from Documents to Information

The web is just a bunch of interconnected documentsAlmost everyone has now heard of Web 2.0…and is probably tired of hearing it.  This phrase gets tossed around a lot and has largely been turned into a marketing gimmick.  Many products add the Web 2.0 descriptor simply to make the product appear cutting edge. 

For all the abuse of this descriptor, Web 2.0 really did signify a meaningful transition.  It wasn’t a sudden transition, but rather a gradual transition.  Web 2.0 represents the shift from static brochure web sites to interactive web sites that facilitate participation & information sharing. 

Technologies such as forums, blogs, profiles, comments, wiki’s, user reviews, etc. are all products of the Web 2.0 era.  Web 2.0 also birthed many highly visible companies such as MySpace, Facebook, Wikipedia, Youtube, etc.  All of these technologies & companies enable people to participate online. 

The web is just a bunch of documents

Despite all these innovations, the web remains a very document-centric platform.  The web is simply a huge collection of interlinked & interactive documents.  These documents are designed for humans and consequently, terribly difficult for computers to understand.  For example:

Is Fargo a town or a movie?  Is Paris Hilton a person or a hotel in Paris?

This is only part of the problem though.  The bigger problem is linking this data with other sources to create a web of information.  Consider the following task:

Find movies playing in your area, read the reviews, choose a movie, purchase tickets, get directions and then find a good restaurant along the way.

The combination of these tasks represents a very difficult & involved process.  All of this data exists in individual information silos, but correlating this data to create relevant information remains a task only a human can perform.

Welcome to Web 3.0

Converting the web of documents to a web of information If others are to be believed, we now stand on the edge of a new evolution.  This next evolution claims to be as meaningful as Web 2.0 and thus merits another symbolic version number.  Web 3.0 has been branded the Semantic Web

The goal of the Semantic Web is to help computers understand the meaning of information.  Once computers understand the meaning of information, diverse sets of information (your location + movies in your area + movie genres you enjoy + movie reviewers you trust) can be cross referenced to create highly relevant information.  That’s the dream anyway.

Just like Web 2.0, I suspect the transition towards Web 3.0 will be slow & gradual.  There will be a steady stream of incremental adopters & technologies that will eventually culminate into a massive evolution for the web

Or maybe not…  There are several problems that I haven’t touched on.  The biggest problem is how does advertising get injected into a web of information?  We all hate advertising but advertising fuels the current web.  How do content creators benefit from being a source of raw semantic data? 

Adding Semantics to Web Documents

Ready for the revolution?  Great, let’s get started.  Here is a quick example for how to add semantic meaning to content. 

<div>Adding Semantics to Web Documents</div>

This content can be made semantically rich by changing it to the following:

<h1>Adding Semantics to Web Documents</h1>

Those <h1> tags tell Google (and other computers) that this is a very important heading.  In other words, <h1> helps Google understand the meaning of the content.

“That’s it?  I’m already doing this.”

I would certainly hope so. The <h1> tag has been around forever.  Just as <ul> and <ol> has long since described a list of information.  A lot of information can be made semantically rich through clean & valid HTML.  However, HTML does have its limitations.  For example, there isn’t an HTML tag to designate a person, event or a place.

Microformats and RDF are potential solutions to describing content when HTML isn’t sufficiently semantically rich.  Microformats are very easy to implement, while RDF appears to be intended for genetically enhanced super geniuses (which I clearly am not).

Here is an example of a calendar event:

<div> The microformats.org site was launched on 2005-06-20 at the Supernova Conference  in San Francisco, CA, USA.</div>

Here is this same example made semantically rich using the hCalendar Microformat:

<div class="vevent"> <span class="summary">The microformats.org site was launched</span> on <span class="dtstart">2005-06-20</span>  at the Supernova Conference  in <span class="location">San Francisco, CA, USA</span>.</div>

The only difference is that pre-defined style classes have been applied to add semantics to specific information.  This extra information enables Yahoo! Upcoming, Google Calendar and other services to easily identify events on your web site.

In my next blog post I’ll demonstrate how to configure Sitefinity’s Events module to export hCalendar Events.

Comments  4

  • Anton 08 Feb

    Great post, Gabe! Though, I think that web 3.0 will become the status quo much faster than web 2.0 did.
  • David 08 Feb

    Hi Gabe,

       I guess Web 3.0 has been around awhile. with Microformats and semantics.  Just like 2.0 and Ajax was around until they label it 2.0 and Ajax.  Looks forward to the next Post, aready have a use for it.

    Thanks,
    David
  • Neil 09 Feb

    Hi Gabe,

    Those who have IIS7 can install the iis.net exstension Search Engine Optimization Toolkit that also highlights the missing <h1> as a violation and suprised how many existing sites out there are failing to implement these semantics when analyzed.

    Regards, Neil
  • Gabe Sumner 09 Feb

    Hey Neil,

    Thanks for the tip!

    I'm personally not surprised that lots of web sites contain bad HTML.  I'm not surprised because most web content is created by people who don't know HTML.  Nor should they need to learn it...

    It's unrealistic for each web content creator to learn HTML.  Content creators shouldn't have to worry about it.  If they are creating bad HTML then it's a failure of their CMS and/or the developers who implemented the CMS. 

    For the <h1> example, the CMS should have a Title field (or widget).  Content editors don't need to worry about the HTML, they merely need to type the title.  The CMS will ensure that the correct HTML (and semantics) get generated.

    In the end, the only way any of these technologies matter is if we can put them in the hands of non-programmers.
Post a comment!
  1. Formatting options
       
     
     
     
     
       
  2. I'm sorry for the CAPTCHA. You have spammers to thank for this: