| Google
SiteMaps and You
RSS (Really Simple Syndication)
is the current heavyweight of so-called "disruptive technologies"
(loosely defined as those that have the effect, if not developed
with the intention, of changing the way we use technology in general)
and its use is skyrocketing among content providers looking for
a way to get their content in front of more eyes and ears. But RSS
originally stood for Rich Site Summary, a standard way of cataloging
your site's content for third-party aggregators.
Google Sitemaps have a similar function, in that they are an XML-based
way to describe website content in a standard, predictable way;
but they differ in that Sitemaps are intended for the Googlebot's
eyes only, rather than for any third-party. Think of them as an
automated way to make sure Google knows about your site's content
(please note, however, that Google does not guarantee inclusion
of your content based solely on the presence of a Sitemap file).
This sounds like a very specific undertaking, but the importance
of Google to getting your site's content noticed can simply not
be overstated. And with Google's expanding reach into more and more
areas of Web content presentation, chances are that you can be assured
that the information your Sitemap provides will eventually find
some use you haven't yet thought about. That's what disruptive technology
is all about, and Google has become one of the more innovative champions
of such technological advances.
Where To Start
The first thing you should do as a website developer is create a
Google Account for yourself or your company. This will allow you
to do other things besides access the Sitemaps infrastructure; but
we'll leave that for another day. Create the account here and
then proceed to the Sitemaps area at this link. Once you've logged
in, you'll see the sparse Sitemaps interface. Don't be fooled, however,
because like the simple interface to its search engine, this one
hides quite a bit of information regarding the creation and use
of Sitemaps, presenting it in digestible bites as you walk through
the process.
There's probably more there than you need to know at this point,
provided you don't have a huge site with a need for multiple Sitemaps
and so on. But if you do have such a site, the information is there
for creating truly complex Sitemaps and Sitemap Indices referencing
many Sitemaps and you can familiarize yourself with that as needed.
For now, we'll concentrate on what's required to establish a Sitemap
for our site at Cafe ID.
Like creating RSS feeds, creating a Google Sitemap is as simple
as putting together an XML file at the root level of your site that
describes the site according to the instructions that Google has
laid out. You can use any text editor for this purpose, but some
editors do a better job of helping you create properly formatted
XML files. We heartily recommend two that cost money, BBEdit
on Mac OS X and Macromedia's Homesite on Windows, but there are
excellent free alternatives out there and when it comes to
text editors, personal preferences take on an almost religious importance,
so we won't proselytize about that.
The Googlebot recognizes several Sitemap formats, ranging from a
simple list of URLs to Sitemaps already created using something
called the Open Archive Initiative protocol for metadata harvesting,
a format apparently popular with library collections. The OAI protocol
is an advanced XML specification that you don't need to worry about
if you don't already understand. An intermediate XML format is what
we recommend, over the simple URL list, because of the additional
information you can associate with each constituent URL of your
site.
If you do want to just get started quickly, simply create a text
file that looks like this:
http://www.example.com/catalog?item=1
http://www.example.com/catalog?item=11 ...
making sure that the
file in question does not include embedded newline characters and
uses the UTF-8 text encoding (check your text editor settings).
Also, your sitemap may not contain more than 50,000 URLs and all
URLs must me fully-formed since they will be used directly during
the Googlebot's crawl.
Getting Fancy
The more advanced format isn't much more difficult to create and
lets you specify additional information about each URL. The protocol
is described fully at Google and is too detailed to explain here.
Your finished file will look something like this, except (hopefully)
with more URLs specified:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
<url>
<loc>http://www.cafeid.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.cafeid.com/art-over.shtml</loc>
<changefreq>weekly</changefreq>
</url>
</urlset>
Your Sitemap's location dictates what URLs can be included in it.
A Sitemap placed at the root level of your site can specify any
URLs on that site, while a Sitemap placed at www.yoursite.com/images
can not include URLs under www.yoursite.com/banners, for example.
You can take as full or as little advantage of the availability
of the various additional XML tags available in this format. Each
<url> needs to include at least the <loc> specification,
but need not include the other three, and all URLs in a Sitemap
file must be encapsulated within the <urlset> tag. We recommend
using at least the <lastmod> tag and the <changefreq>
flag to let the Googlebot know how often it should check your
site for updated content. Be sure to change the date, and maybe
even the time, specified in the <lastmod> tag any time you
actually update your site.
One more caveat is that your URL specifications must be XML-encoded,
similarly to the way they're encoded under RSS. What this means
is spelled out in detail W3.org, but essentially, what you're doing
is converting a URL like
http://www.yoursite.com/view?widget=3&count>2
to look like this:
http://www.test.org/view?widget=3&count>2
(Note the substitution for the HTML entities & and > for
the "&" and ">" symbols.)
Done. Now What Do I Do With It?
You're almost home. Upload the Sitemap file you create to your server
and then add the URL to the file itself using your Google Sitemaps
account. You don't need to use the account, but doing so will allow
you to keep track of what you've uploaded. You're welcome to compress
your Sitemap file using gzip, found typically on Mac OS X, Linux
and BSD (normal PC zipping won't work, although you can certainly
find a third-party gzip program for your Windows box). Click
the "Add Your First Sitemap" link on the main Sitemaps
page after you've logged into your Google Sitemaps account, and
that's all there is to it!
You can use your Sitemaps account to keep track of and receive diagnostic
information about your Sitemap submissions. You don't need to create
a Sitemaps account, however, and if you already have a Google account
for receiving Alerts, for accessing the Web Developer APIs and so
on, your existing account will work as a Sitemaps account automatically.
Google has already played a significant role in shifting the paradigm
of discovering the Web from doing so by following links to doing
so by searching, and the company shows no signs of slowing down.
Subscribing may well be the next paradigm, based on the flexibility
of the protocols that put content syndication in the hands of mere
mortals, and getting your content cataloged in these formats should
be among your first priorities. The web browser and operating system
is adjusting quickly to this new paradigm, and you should be too.
|
 |
Latest
articles
Tips for Getting
Your Website Listed on Yahoo
SEO for Traffic with Content
vs. Ranking with Links
How
to promote your site!
Building
links with a song in your heart
Analyzing
the New Yahoo! RSS Report for Marketers
Google Analytics
Jagger Update
|
 |
 |