Harvesting metadata from catalogs

The ability to 'harvest' or transfer collections of metadata records between catalogs to keep them synchronized is a desirable operation in a federated catalog system. The major reasons are performance and reliability. Although the OGC catalog service specification (CSW) includes provisions for federated catalogs to propagate requests to other servers, most people that we have talked to who actually have implemented such functionality report that perfomance is too slow and unreliable. If one of the servers a request is cascaded to is not functioning, it can freeze the entire process, the response is only as fast as the slowest server, and the client must determine how to identify duplicate result records. Harvest and cache of metadata records allows particular metadata registries to specialize on particular kinds of content, and to index records and create stored views to optimize performance with the records held in that registry.

Discussions with developers of Stratigraphy.net indicate that there are problems using the CSW metadata harvesting operations in the context of geoscience metadata resources. They recommend use of the Open Archives Initiative Profile for Metadata Harvesting (OAIPMH) for Harvesting services. They report that based on their experience and that of others, that a distributed metadata catalog architecture based on a collection of metadata providers and portal servers that harvest and cache metadata records is a more viable design than a real time distrubuted query system.

Link:

Open Archive protocol for metadata harvesting