We can all agree that search engines like Google have really defined the way that we search for information on the internet. So the question is, why aren't we using Google to search for our geologic datasets? Do we have to do all that formal, structured metadata? What the hell is CSW anyways?
It acts as a proxy server. You ask it for http://metadata.usgin.org/record/a386a4ba-e892-11e0-9e4a-0024e880c1d2, and it reads that file identifier out and issues a CSW GetRecordByID request to a CSW server. The response to that CSW request is an XML document, and nobody (including Google) really cares about that, so the server formats the record as a very simple HTML page.
So, for any record in our CSW catalog, you can make the request to metadata.usgin.org and it will show you something nicer than XML. But still, if you want XML, you can tack ?f=xml onto the end of the URL and you'll get your XML. You can get JSON too: ?f=json.
But what we're really experimenting with here are Google search results.
For starters, the Node.js application automatically generates a sitemap.xml file for all the metadata records in the CSW catalog. Sitemaps help search-engine robots know what URLs on your site should be indexed. Right now, building the sitemap.xml file is a tightly-coupled process requiring that you're using an ESRI Geoportal to provide a CSW service. When a search-engine crawler bot goes to http://metadata.usgin.org/sitemap.xml (don't go there! you don't want to look at it!), Node.js queries the Geoportal database to find all the metadata file identifiers that are out there, and then automatically generates the sitemap.xml file from that information.
Maybe more interestingly, this is an opportunity to experiment with microdata, which is basically some additional information to plug into your HTML pages that helps search engine robots understand better what your page is about. To some extent, the microdata on your page can influence the way that your search results appear in Google. Because this environment gives me really tight and simple control over the HTML content, it is an ideal way to play with microdata and learn how it influences search results.
What is the end-game? Well for starters, I would like to be able to search the metadata catalog by going to google.com and searching for things like "site:metadata.usgin.org arizona thermal springs", or "site:metadata.usgin.org long valley caldera". Later, we may be able to implement a Custom Search Engine, or maybe even find out if we can use Google to provide structured responses to search queries...