Harvesting metadata from catalogs

The ability to 'harvest' or transfer collections of metadata records between catalogs to keep them synchronized is a desirable operation in a federated catalog system. The major reasons are performance and reliability. Although the OGC catalog service specification (CSW) includes provisions for federated catalogs to propagate requests to other servers, most people that we have talked to who actually have implemented such functionality report that perfomance is too slow and unreliable. If one of the servers a request is cascaded to is not functioning, it can freeze the entire process, the response is only as fast as the slowest server, and the client must determine how to identify duplicate result records.  Harvest and cache of metadata records allows particular metadata registries to specialize on particular kinds of content, and to index records and create stored views to optimize performance with the records held in that registry.

Discussions with developers of Stratigraphy.net indicate that there are problems using the CSW metadata harvesting operations in the context of geoscience metadata resources. They recommend use of the Open Archives Initiative Profile for Metadata Harvesting (OAIPMH) for Harvesting services. They report that based on their experience and that of others, that a distributed metadata catalog architecture based on a collection of metadata providers and portal servers that harvest and cache metadata records is a more viable design than a real time distrubuted query system.

Related Community Groups
CSW Debug Blog | 17 Posts | Join
A group blog to discuss metadata Catalog Service for the Web (CSW) implementation experiences
Building a GeoSciML WFS Server | 11 Posts | Join
Development, testing and implementation of a WFS service that returns GeoSciML documents
ETL Debug Blog | 12 Posts | Join
A group blog on implementing and debugging Extract-Transform-Load (ETL) efforts.
Presentations and Posters | 12 Posts | Join
Post your posters and presentations related to USGIN topics.
Metadata interest group | 13 Posts | Join
group for general posting on metadata content, standards, tools
USGIN Amazon Virtual Server Development | 18 Posts | Invite only
Documenting the process of development of a Web Server in the Amazon EC2 environment. Software installations tailored to the requirements for USGIN
GeoNetwork configuration and development | 7 Posts | Join
Discussion on GeoNetwork setup, configuration, and development.
Student Projects | 0 Posts | Join
Discussion of student projects related to USGIN
Drupal Development | 6 Posts | Join
All about bending Drupal to your needs
Geoportal on an Amazon Virtual Machine | 3 Posts | Closed
Installation, configuration, etc.
Using Django for USGIN | 7 Posts | Request membership
Thought and ideas about using Django to accomplish USGIN-related... things.
ArcGIS Server and OGC Services | 3 Posts | Join
Tips on using ArcGIS Server to provide OGC web services
Content model discussion | 0 Posts | Request membership
Community site for comments on development of content models and encoding for information intechange
Making Web Maps | 2 Posts | Request membership
For information about the myriad of mechanisms for showing service data on a web page.
Troubleshooting Web Service Deployment - Blog | 5 Posts | Join
This blog is for documenting our group's experiences with web service deployment.
Best Practices for USGIN Web Service Hosting | 10 Posts | Join
Tips, techniques, and frequently asked questions for hosting AASG Geothermal Data Web Map Services and Web Feature Services
Hub Disaster Recovery | 0 Posts | Request membership
Discussions around how to harden a distributed federated system against disaster; setting up a system to mirror hub VMs at other hubs.