Really Generic Functional Requirements for Metadata

We write metadata so that we can search through it to find datasets we're interested in using for some purpose. I've recently started building a map-centric user-interface for doing this that has highlighted some simple functional requirements that metadata should meet in order to be helpful in accomplishing the task.

What I've done here is essentially listed the conceptual elements that seem neccessary in order to make a functional user-interface for searching. This has nothing to do with metadata syntax, just with metadata concepts.


  • Title: duh. This should clearly identify the resource for metadata users-- search results are commonly returned first as a list of titles, so the title must provide good guidance as to what the resource is.
  • Abstract: This is probably the one of the most important pieces of information used to distinguish between a dataset that you are interested in, and one that you are not interested in. The abstract should do a good job of describing the data resource, otherwise the metadata is useless.
  • Bounding Box: Of all the criteria that may be used to limit search results, in the geosciences location is probably the second most important. Generally we are looking for information about some particular theme about some specific location. In general we want to SEE the geographic extent of a dataset in order to evaluate whether or not it suits our interests.
  • Keywords: These allow for easy categorization of results, and provide a simple mechanism for futher filtering.
  • URLs: You need at least a couple --
    1. URL for human-readable, more detailed metadata: The search interface generally requires less information about a resource than people need to make judgements. In order to keep the interface simple, generally we would like to link to more detailed content instead of trying to squeeze it all onto the screen at once. The document returned by this URL should probably be in HTML format.
    2. URL for machine-readable, more detailed metadata: Maybe we want to standardize the portrayal of the details of each record, and we want to write a parser to do it. Should probably specify in what structure the machine-readable metadata is in, so a machine can determine whether or not it can deal with it.
    3. Special Case - URL for data access: Requires some attributes to qualify what the URL does and how it works... The tricky question is exactly what attributes.
      • The URL to actually access the resource
      • Some sort of definition of the service type through with the resource is accessed. Is it downloadable? Is it a service? What kind of service? What version? What profile? What is the information model/schema? The answers to one or more of these questions will be required in order for the user-interface to present the user with appropriate options for data access, or for software to automate connection to the resource.

Another result of this kind of thinking is that it starts to provide some very basic validation criteria for what metadata is good, and what metadata is bad. In order for a metadata record to provide the above elements, the metadata record must meet these criteria. I've also written out what I think might be a start towards how an automated process may check to see if the criteria is met.
  • Have a title: More than one word
  • Have an abstract: At least more than one sentence, probably more than four.
  • Have a valid bounding box: Lat/Long values are valid. East bounding longitude < west bounding longitude, north bounding latitude > south bounding latitude.
  • Have keywords: Probably at least two.
  • Have valid URLs: Requesting any URL in the document should return some acceptable HTTP status code (200, 301, 302, 303, 304 OK, 4xx or 5xx not OK).



I made some edits

srichardAzgs's picture

I made some minor edits to Ryan's post-- details on the title metadata element, and more about the attributes for 'URL for data access'