Create a GeoNetwork OGC Harvest Node (WMS or WFS GetCapabilities to ISO 19139 metadata) through xml.harvesting.add request

GeoNetwork offers a harvest service that, among other, follows a OGC WMS or WFS GetCapabilities URL and transforms the response into an ISO 19139 metadata record. Bear in mind that the resulting ISO metadata record is only as good as the GetCapabilities response and that the metadata can never be entirely ISO 19139 conformant without dummy values due to limitations of the OGC GetCapabilities schema.

GeoNetwork offers a handy user interface to add those "GeoNetwork Harvesting Nodes" and also exposes their functionality through a even handier XML service. See chapter 19.3 on "Harvesting Services" of the GeoNetwork opensource V 2.4 The Complete Manual. Following are my notes on creating an OGC harvesting node through GeoNetwork's harvesting service.

I use the "Poster" Firefox add-on to experiment with GeoNetwork XML posts and to read the responses. You have to log in to GeoNetwork as an admin through Firefox for Poster to have the right access privileges.

GeoNetwork offers several XML services to access its exposed functions.

  • xml.info
    • "The xml.info service can be used to query the site about its configuration, services, status and so on. For example, it is used by the harvesting web interface to retrieve information about a remote node."
    • Example URL: http://localhost:8080/geonetwork/srv/en/xml.info
    • Content Type: application/xml
    • Post:
      <request>
      <type>site</type>
      <type>groups</type>
      </request>
    • Returns some GeoNetwork information
  • Harvesting Services
    • xml.harvesting.get
      • Example URL: http://localhost:8080/geonetwork/srv/en/xml.harvesting.get
      • Content Type: application/xml
      • Post:
        <request/>
      • Returns information on all harvest nodes - this is very useful for revers-engineering the the XML for a xml.harvesting.add request.
    • xml.harvesting.add
      • "Create a new harvesting node. The node can be of any type supported by GeoNetwork (GeoNetwork node, web folder etc...). When a new node is created, its status is set to inactive. A call to the xml.harvesting.start service is required to start harvesting."
      • Example URL: http://localhost:8080/geonetwork/srv/en/xml.harvesting.add
      • Content Type: application/xml
      • Post:
        <node type="ogcwxs">
            <!--  -->
            <site>
                <!-- Name of Harvest Node  -->
                <name>Test 1</name>
                <!-- Login information -->
                <account>
                    <use>false</use>
                    <username/>
                    <password/>
                </account>
                <!-- OGC WMS or WFS GetCapabilities URL - only used in conjunction with <node type="ogcwxs"> -->      
        <url>http://mrdata.usgs.gov/cgi-bin/mapserv?map=pr.map&amp;amp;service=WMS&amp;amp;request=GetCapabilities&amp;amp;version=1.1.1</url>
                <!-- Valid Values: {WMS1.1.1, WFS1.0.0, ... } -->
                <ogctype>WMS1.1.1</ogctype>
                <icon>default.gif</icon>
            </site>
            <!-- When and how often to run the harvest service -->
            <options>
                <!-- run every x minutes -->
                <every>90</every>
                <!-- only run once -->
                <oneRunOnly>true</oneRunOnly>
                <lang>eng</lang>
                <topic>geoscientificInformation</topic>
                <createThumbnails>false</createThumbnails>
                <useLayer>false</useLayer>
                <useLayerMd>false</useLayerMd>
                <datasetCategory>1</datasetCategory>
            </options>
            <!-- Give "All" read access  -->
            <privileges>
                <group id="1">
                    <operation name="view"/>
                    <operation name="dynamic"/>
                </group>
            </privileges>
            <categories>
                <category id="3"/>
            </categories>
        </node>
    • xml.harvesting.remove/start/stop/run
      • "These services are put together because they share a common request interface. Their purpose is
        obviously to remove, start, stop or run a harvesting node. In detail:
        1. start: When created, a node is in the inactive state. This operation makes it active, that is the
        countdown is started and the harvesting will be performed at the timeout.
        2. stop: Makes a node inactive. Inactive nodes are never harvested.
        3. run: Just start the harvester now. Used to test the harvesting."
      • xml.harvesting.run
        • Example URL: http://localhost:8080/geonetwork/srv/en/xml.harvesting.run
        • Content Type: application/xml
        • Post:
          <request>
          <id>866</id>
          </request>
        • The response may be"OK" but there might be still problems with duplicate metadata IDs such as when I had two Harvest Nodes go after the same WMS service.

See Converting FGDC XML metadata records for WMS and WFS services to ISO 19139 via Python and GeoNetwork for a Python automation script.