ETL Debug Blog
Post to this group blog about your Extract-Transform-Load (ETL) efforts.
I am currently using Pylint to check for bugs and formatting errors in my Python scripts. I used Brandon Corfman's nice Komodo Hacks: Integrating Pylint tutorial to set up Pylint and how to integrate it with the Komodo Edit IDE (see Easy setup of a Python 2.6 development environment on Windows about setting up Python).
Here is my updated Pylint execution macro for Komodo Edit 5 which includes a verbose and an errors only reporting command.
Converting FGDC XML metadata records for WMS and WFS services to ISO 19139 via Python and GeoNetwork
This post describes a series of Python scripts to convert FGDC XML metadata for WMS and WFS services to ISO 19139 service metadata via GeoNetwork's OGC harvest service. Here are the steps:
- Extract WMS and WFS GetCapabilities URLs from FGDC XML metadata and write them to a file.
- Create a GeoNetwork Harvest Node from extracted GetCapabilities URLs and let GeoNetwork's OGC WMS/WFS harvester create ISO 19139 metadata from GetCapabilities response. Note: the resulting ISO 19139 metadata is only as good as the one in the GetCapabilities response. In addition, GetCapabilities responses do not include all fields necessary for minimum ISO 19139 metadata.
- Copy some FGDC metadata entries (Title, Abstract, etc.) to newly created ISO 19139 metadata.
Our goal is to have working WMS and WFS metadata records in our CSW catalog that can be used to add the described services to analytical software such as ESRI's C-SW Client for ArcGIS Desktop. Following are the reasons why I am currently choosing this convoluted process instead of a direct FGDC to ISO 19139 metadata conversion:
Create a GeoNetwork OGC Harvest Node (WMS or WFS GetCapabilities to ISO 19139 metadata) through xml.harvesting.add request
GeoNetwork offers a harvest service that, among other, follows a OGC WMS or WFS GetCapabilities URL and transforms the response into an ISO 19139 metadata record. Bear in mind that the resulting ISO metadata record is only as good as the GetCapabilities response and that the metadata can never be entirely ISO 19139 conformant without dummy values due to limitations of the OGC GetCapabilities schema.
GeoNetwork offers a handy user interface to add those "GeoNetwork Harvesting Nodes" and also exposes their functionality through a even handier XML service. See chapter 19.3 on "Harvesting Services" of the GeoNetwork opensource V 2.4 The Complete Manual. Following are my notes on creating an OGC harvesting node through GeoNetwork's harvesting service.
Following are authoritative FGDC Content Standard for Digital Geospatial Metadata (CSDGM) XML schema document (XSD) and document type definition (DTD) URLs for FGDC-STD-001-1998. These documents are used to validate a FGDC XML metadata file.
- http://www.fgdc.gov/metadata/fgdc-std-001-1998.dtd - Usage:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE metadata SYSTEM "http://www.fgdc.gov/metadata/fgdc-std-001-1998.dtd">
- http://www.fgdc.gov/metadata/fgdc-std-001-1998.xsd - Usage:
There are also annotated and modified FGDC-STD-001-1998 schema available for download:
I am working on transforming FGDC XML metadata records to the USGIN 1.1 version of ISO 19139 metadata and having a hard time finding formal FGDC schema, name spaces, and schema locations. This is what I found so far:
This is an example script on how to do XSLT transformations in Python 2.6 through the Gnome libxml XML C parser and toolkit. The relative fast C library is available on multiple platforms but you will also need Python bindings for libxml2 and libxslt. There is a handy libxml2 and libxslt Python bindings installer for Windows which also includes the C libraries in DLL form.
This is a Python example script for importing GeoNetwork Metadata Exchange Format 1.1 (MEF) archives to GeoNetwork 2.4.2's mef.import service. The mef.import service requires a multipart/form-data POST through a modified MultipartPostHandler.py library which now supports Unicode (urllib2). It has been tested in Windows XP and Python 2.6.
This is an example Python script that showcases GeoNetwork authentication, session handling, and CSW transactions.
Tested with deegree-csw 2.3pre
Read the XSLT file for more information.
Attached is an example XSLT1 script to transform a WMS GetCapabilities 1.1.1 response to a CSW Insert transaction. The script is based on deegree's wms2iso19119.xsl (http://www.deegree.org/).
Note that currently it only supports WMS version 1.1.1 (<WMT_MS_Capabilities>) responses because it chokes on 1.3.0 (<WMS_Capabilities>) responses.
Following is a presentation on XML Extract-Transform-Load (ETL) we did at the Geoscience Data Preservation Techniques Workshop, Indiana Geological Survey, Bloomington, IN on July 2009.
Although this presentation was geared towards National Geological and Geophysical Data Preservation Program (NGGDPP) metadata, the same tools can be used for other ETL needs.