Creating a Document Repository in Drupal

We just finished putting together a Drupal based Document Repository site (http://repository.usgin.org) as a testbed and as an alternative for other Repository systems such as DSpace. The goal was to fulfilled the following requirements in a ca. 2 week development period (200+ man hours):

Initially, we were greatly inspired by FAO's AgriDrupal effort.
Following are some of the things that we still need to address or are at least dreaming of tackling:

Drupal Modules used in the USGIN Document Repository

Following is a break-down of the Drupal 6 modules used in the USGIN Document Repository site (you may want to familiarize yourself with general Drupal concepts):

Drupal Content Types used in the USGIN Document Repository

Following are the two Drupal 6 content types that define the USGIN Document Repository (you may want to familiarize yourself with general Drupal concepts):

Collection (ct_collection)

The Collection content type acts as an arbitrary logical unit (or container) that groups the Document-like Information Object (DLIO) nodes which hold the actual data resource. At this time, Collections fulfill 3 objectives at once. Perhaps, they need to be separated at a later point.

  1. Used to internally organize DLIOs into access groups for "Curators" (content providers and managers) as part of the work flow.
  2. Used for branding (organization names, logos, icons) and shared metadata (resource contact information) of DLIOs that belong to a collection.
  3. Allow for arbitrary bins (some sort of category) for browsing or searching for DLIOs. Think of it as special collections in a library or museum context. Hence, we called collection managers "curators."

Following are the main fields we are currently using in Collections. The field groups are used in conjunction with the Vertical Tabs module and, at this time, affects the Conditional Fields module.

Document-like Information Object (ct_dlio)

The Document-like Information Object (DLIO - coined by FAO's AgriDrupal) content type is a digital resource object that contains a bundle of digital files and/or links to human-consumable online-resources (not services for machines). This convenient hybrid object contains both metadata and data; and, it allows us to avoid the nighmare of having to create separate metadata entries for ech individual file or link in bundled resources. For example, one DLIO node may include multiple representations (Access DB, Shapefile, GeoTIFF, word report, etc.) for the same information object. To keep it simple, repository users are just dealing with "submissions" and we avoid complicating things with the DLIO concept.

Following are the main fields we are currently using in DLIOs. The field groups are used in conjunction with the Vertical Tabs module and, at this time, affects the Conditional Fields module.

Creating ISO 19139 metadata through Drupal Views and Views Bonus Pack

Following is how we modified the Views Bonus Pack module for Drupal 6 to generate ISO 19139 XML metadata files for the USGIN Document Repository. This hack can be be used to relatively easily generate other XML metadata (FGDC, CSW records) or most complex XML. Ideally, this hack will evolve into a separate module. If you are interested or have questions, please participate in the Generate ISO 19139 metadata record XML files Views Bonus Pack issue thread.

Overview

The goal is to generate minimum ISO 19139 dataset metadata XML files (that conform to the USGIN profile) from the core and CCK fields used in the repository's Collection (ct_collection) and Document-like Information Object (DLIO, ct_dlio) content types.

Drupal's Views module acts similar to a database "view" which extracts, arranges, and manipulates node field content into new representations (pages, blocks, feeds, etc.) and styles (unformatted, HTML lists, tables, etc.). One of these representations is an RSS feed - practically, some formatted XML page that contains a node's title, teaser, etc. The Views Bonus Pack module expands Views with, among other, the Export sub-module. The Export sub-module adds additional styles to Views' RSS representation that generate CSV, DOC, TXT, XLS, and XML formatted outputs. I added an "ISO 19139" style to the Export module that produces what we needed.

ISO 19139 Export Style

Views edit page for ISO 19139 hackThe trick is to use the label for a field in Views as the key for the field's value. The field values are then retrieved through the label key in the ISO 19139 template file (views-bonus-export-iso.tpl.php). This means that one has to add as many fields to Views as one wants to control element values or attribute values in the XML document. This (and the long machine-readable labels) can get cumbersome quite fast and would be handled differently in a custom module.

Note that Views (or CCK?) returns CCK multi-value fields as one value with <span> separated  sub-values (or what ever some template specifies).  I delimit multi-value fields in Views with a pipe (|) or semi-colon (;) and then strip out any HTML/XML tags in views_bonus_export_theme.inc. The template file views-bonus-export-iso.tpl.php is effectively an XML file with PHP code to populate XML element and attribute values. I also included logic to deal with element nesting dependencies and a bit of validation through conditional statements. Required metadata elements will show up empty if a value is missing. All optional elements are pruned if required values are missing.

You can find a patch file generated in Eclipse for Views Bonus Pack 6.x-dev (CVS trunk) with the discussed modifications at the bottom of the page.

views_bonus/export/views_bonus_export.theme.inc

Added a new preprocessor function to the existing file:

/**
* Preprocess ISO 19139 xml output template.
*/
function template_preprocess_views_bonus_export_iso(&$vars) {
_views_bonus_export_shared_preprocess($vars);

foreach ($vars['themed_rows'] as $num => $row) {
foreach ($row as $field => $content) {
// Add semicolon delimiter between multiple values seperated by DIV and SPAN tags
$content = str_replace(array('</div><div', '</span><span'), array('</div>; <div', '</span>; <span'), $content);
// Strip HTML tags (not supported ISO 19139 XML)
$content = strip_tags($content);
// Prevent double encoding of the ampersand. Look for the entities produced by check_plain().
$content = preg_replace('/&(?!(amp|quot|#039|lt|gt);)/', '&amp;', $content);
// Convert < and > to HTML entities.
$content = str_replace(
array('<', '>'),
array('&lt;', '&gt;'),
$content);
$vars['themed_rows'][$num][$field] = trim($content);
}
}
}

views_bonus/export/views_bonus_export.views.inc

Appended a new array to the existing file. I decided to stick with the existing XML icon.

      'views_iso' => array(
'title' => t('ISO 19139 XML file'),
'help' => t('Display the view as a txt file.'),
'path' => $path,
'handler' => 'views_bonus_plugin_style_export_iso',
'parent' => 'views_bonus_export',
'theme' => 'views_bonus_export_iso',
'theme file' => 'views_bonus_export.theme.inc',
'uses row plugin' => FALSE,
'uses fields' => TRUE,
'uses options' => TRUE,
'type' => 'feed',
'export headers' => array('Content-Type: text/xml'),
'export feed type' => 'xml',

views_bonus/export/views_bonus_plugin_style_export_iso.inc

Added a new style for Views. In following with Drupal practice, do not close the PHP tag in includes.

<?php
// $ $
/**
* @file
* Plugin include file for export style plugin.
*/

/**
* Generalized style plugin for export plugins.
*
* @ingroup views_style_plugins
*/
class views_bonus_plugin_style_export_iso extends views_bonus_plugin_style_export {
var $feed_text = 'XML';
var $feed_file = 'view-%view.xml';

/**
* Initialize plugin.
*
* Set feed image for shared rendering later.
*/
function init(&$view, &$display, $options = NULL) {
parent::init($view, $display, $options = NULL);
$this->feed_image = drupal_get_path('module', 'views_bonus_export') . '/images/xml.png';
}
}

views_bonus/export/views-bonus-export-iso.tpl.php

Added a new template file for the ISO 19139 XML output - this is where the main action takes place.