You are on page 1of 37

Title: Avenues for developing the UKs National Geospatial Metadata Service Authors: James K. Batcheller jk.batch@ed.ac.

uk Bruce M. Gittings bruce@ed.ac.uk

Institute of Geography School of GeoSciences University of Edinburgh Drummond Street Edinburgh EH8 9XP Tel: +44 (0) 131 650 2558 FAX: +44 (0) 131 650 2524

Corresponding author

Abstract: The state of public sector geospatial data sharing and exchange in the UK, as facilitated by the gigateway service, is currently at a crossroads. Ambiguities surrounding its purpose, direction, funding and custodianship continue to persist in the face of increasing demands placed upon the service, such as legal requirements (INSPIRE, PSI) and rising user expectations. A well-defined strategy addressing the political, commercial and technological considerations involved in advancing the service is therefore needed if these uncertainties are to be countered and demands met. The current work aims to provide for the technical aspects of such a strategy by considering potential avenues for development. Accordingly, proprietary and open source approaches are examined in the context of facilitating metadata publication (production, integrity, delivery), enhancing the service infrastructure (interoperability, futureproofing) as well as addressing end-user considerations (data visualisation, data access). The resulting roadmap outlines a technical evolution of gigateway, proposing a service better equipped to face the challenges of both the present and the future.

Keywords: gigateway, geospatial metadata, metadata service, SDI.

Introduction
The advent of the World Wide Web (WWW) and the Internet has revolutionised how all kinds of information can be accessed and exchanged - geospatial information no less than other forms. From modest beginnings as point-to-point transfer via FTP1 and email, through the origins of customised interactive web-based mapping, as seen in the postings of Xeroxs Palo Alto Research Centre (PARC) in 1993 (Putz, 1994; Harder, 1998) to distributed online metadata services and clearinghouses offering catalogues of records detailing geospatial dataset attributes and how to procure them, and geospatial one-stop shops offering an integrated access point to disparate geospatial data resources, widespread data dissemination is currently driven as never before. Sourcing, accessing and retrieving data for analysis and display have been made easier, with implications for public, private and academic sectors ranging from the stimulation of intellectual endeavours, improved data management practices and enhanced visibility of potentially marketable geospatial products.

In the public sector, efforts have been given further impetus through the introduction of legislation at both national and international level. In the United States for instance, President Clintons Executive Order 12906 (1994)2 demanded the creation of a coordinated National Spatial Data Infrastructure (NSDI) to support public and private sector applications of geospatial data with a key goals of avoiding wasteful duplication of effort and promoting effective and economical management of resources. More recently, European Union directives such as the sharing of Public
1

File Transfer Protocol http://govinfo.library.unt.edu/npr/library/direct/orders/20fa.html

Sector Information (PSI, 2003) and the INfrastructure for SPatial InfoRmation in Europe (INSPIRE, 2004) have formalised requirements that member states facilitate location of and access to geospatial assets for the purpose of formulation, implementation, monitoring and evaluation of Community policy-making3.

Such has been the perceived worth of web-enabling geospatial holdings that the forerunning national initiatives have in recent times been augmented by local, regional and international schemes, as well as those in the private and academic sectors (Guptill, 1999; Tulloch and Robinson, 2000; Higgins et al., 2003). Prime examples include the UKs public sector geospatial metadata portal gigateway, its academic counterpart GoGeo!4, Environmental Systems Research Institutes (ESRI) Geography Network5 and the Federal Geographic Data Committees (FGDC) National Geospatial Data Clearinghouse6 (precipitated by Clintons Executive Order).

The benefits of web-enabling data assets are nevertheless not without their own particular problems. Questions as to whether users can effectively find quality, compatible and appropriate data for their needs are balanced by resource, implementation and maintenance issues for data providers. Additional complications arise on consideration of the political issues involved in supporting a geospatial data sharing initiative, particularly in governmental sectors. Concerns as to where service ownership lies, its strategic goals, its sources of revenue, how it is promoted and who

http://www.ec-gis.org/inspire/ http://www.gogeo.ac.uk/ http://www.geographynetwork.com/ http://clearinghouse1.fdgc.gov/

constitute the target community are just some of the factors which impact upon a services performance.

Such are the challenges that currently face the UK's national geospatial data sharing initiative gigateway. With the rapid and ongoing evolution of spatially aware software and services offered over the Internet, it can be reasoned that end-user expectations have also evolved, arguably passed what the service can currently offer. From a data provider's perspective, active participation is arguably driven more by the desire to be seen to contribute or through some form of compulsion (e.g. contractual obligations, mandates from a higher authority, legislation) than the recognition of potential benefits that may be accrued. As for the gigateway service itself, it is currently at a crossroads. Ambiguities surrounding its purpose, technological expectations, ongoing source of funding (as currently enshrined within the NIMSA7 agreement), coupled with doubts as to whether the Association for Geographic Information (AGI) shall continue to act as custodian have led to the national geospatial metadata service facing a somewhat uncertain future.

It is in this light that the timeliness of a re-examination of how public geospatial metadata is published in the UK via the gigateway service is argued. If confidence in the service is to be maintained, particularly amongst those on whom gigateways ongoing success is dependent (i.e. the contributing community), it is crucial that the

The National Interest Mapping Services Agreement a contract between the Office of the Deputy Prime Minister

(ODPM now Department of Communities and Local Government) and the Ordnance Survey (OS) under which the ODPM funds, or part funds, (mapping) activities that meet established criteria for being in the national interest (NIMSA Review Group Report, 2004).

initiative be seen to move forward with vision and purpose. A well-defined strategy addressing the political, commercial and technical considerations involved in advancing gigateway is therefore imperative if the investment and goodwill already accrued by the national geospatial metadata service is to be maintained. It is the aim of the current work to provide a basis for the technical aspects of such a strategy by analysing the current service, identifying improvement opportunities and elaborating potential development paths. Each stage of the geospatial metadata lifecycle, from production to publication and beyond, is consequently investigated, with the goal of eliminating, circumventing or diminishing barriers to metadata delivery.

Background
Metadata The increased availability of geospatial computing technologies has not only fed the demand for geospatial data with which to perform required analyses (Guptill, 1999; Deng, 2002), it has resulted in large volumes of such data being produced - not only by GIS professionals and organisations, but also by those not traditionally considered as geodata producers (Schweitzer, 1998; Mathys, 2004). As data are clearly critical to the functioning of GIS, enough so to be referred to as its fuel (Vermeij, 2001; ESRI, 2002), this surfeit could be viewed positively. Nevertheless there are complications. As Tsou (2002) observes, the storage and management of geospatial data are in themselves major challenges. How data are located in what can amount to a needle in a geospatial haystack; whether such geodata, once if located, are fit for the desired purpose; whether they are compatible, up-to-date and of sufficient quality, all impart their own

particular issues, even without contemplating data accessibility, copyright, licensing, potential procurement costs and training.

Regardless of information medium or application domain, it is clearly important to document data assets so as to facilitate efficient storage and management (Gbel and Lutze, 1998). Geospatial data are documented by metadata, or data that describes data (Hart and Phillips, 2001; Vermeij, 2001; Tsou, 2002; Hobona et al., 2004). Just as geospatial data are abstractions of the real world, for requirements such as analyses and representation, geospatial metadata are similar abstractions of the data itself. Used not only to describe a range of dataset attributes, metadata also assist in the location, evaluation, comparison, access and exploitation of geographical datasets (Luo et al., 2003; OGC, 2005).

The gigateway metadata service Arising from several predecessors, most notably the National Geospatial Data Framework (NGDF) and askGIraffe, gigateways raisons dtre remain that of its forerunners: to increase the use of geospatial data; to facilitate development of markets for data and services; and to future proof investments and enhance decision-making through use of better information (Gigateway, 2003). The service works towards these objectives through the support of a distributed web-based network, focussed on serving discovery metadata a subset or profile of a more elaborate metadata standard, designed to provide a means of identifying where the data described might be found. Users query metadata through a web-based form on a central portal (see Figure 1.) using keywords and geographical extents. Queries are then transmitted to the clients (nodes)

of participating organisations, which execute searches on indexed metadata. Results are returned to the central gateway (portal) where they are collated and sent to the users browser. Retrieved metadata specify where the original data may be located.

Browser

Central Portal

Z39.50 Search Engine

Z39.50 Search Engine

Z39.50 Search Engine

Metadata Indexes

Metadata Indexes

Metadata Indexes

Metadata Management Systems

Figure 1. The distributed gigateway service architecture.

Context and rationale


The initial tenet of the NGDF was as a fully-fledged NSDI (Davey and Murray, 1996), but budgetary constraints, the lack of integrated GI-centric solutions and the need for progress led to the identification of a National Metadata Service as the priority technical deliverable8. The first tangible service created was askGIraffe in mid 2000, based on a distributed search standard developed by the library community and manifested in the

A fully-fledged NDSI is being revisited through the UK GI Strategy developed in 2006.

freely available Isite9 package previously deployed successfully in the US by the FGDC. While Isite still forms the core of gigateway, a number of proprietary and open source solutions have since become available, some of which were developed specifically with the geospatial community in mind. Consequently, some of the technical barriers to developing the UKs initiative beyond the basic metadata service currently in operation have been removed.

Despite these circumstances and the resources afforded to the service since its inception, there has been a notable deficit of comprehensive analyses aimed at reviewing the technical options open to what has now evolved into gigateway. The deficit may be considered even more curious in light of the aforementioned PSI and INSPIRE directives, which are predicted to place of significant demands on member states including the provision of metadata services (Rackham, 2004). Furthermore, as one of INSPIREs goals is the establishment of an EU-wide data framework based upon the SDIs of member states, there is a clear need to consider the technological options relating to not only how gigateway may be moved forward, but also to what can be done to address some of the challenges it currently faces.

The gigateway metadata publication workflow The provision of geospatial metadata sustains gigateway; the continued success of the service therefore relies on those who use it to publish their metadata records. To safeguard existing contributions and attract new ones, perceived or actual barriers to participation must be addressed. Currently the path from metadata production to
9

http://www.awcubed.com/Isite/index.html

publication is characterised by a series of distinct steps punctuated by extensive human intervention (Figure 2.). Whilst human input is important in assessing quality, opportunities for increased automation certainly exist, speeding the publication process and hence removing an obstacle to metadata contribution.

create, update

geospatial dataset

document, update

detailed metadata

store

metadata repository

store subset document, update subset, retrieve

discovery metadata

host

post

index,

local gigateway node

publish

remote gigateway node

Figure 2. The gigateway metadata publication workflow. Metadata are created / updated on creation / update of datasets. Datasets may be internally documented by detailed or discovery metadata, but only discovery metadata are indexed and exposed to the gigateway service. A formal metadata repository may be used but is not currently assumed to be exposed for query directly.

10

Gigateways communication infrastructure The ISO Z39.5010 communication protocol, embodied in Isite, remains central to gigateway. Influenced at the time by its success in the US and low cost of implementation, the choice of Isite was also motivated by the lack of any workable alternative. Subsequent developments have however seen an increase in the number of commercial and non-commercial solutions. These potentially offer the opportunity to reinvigorate the service, thus enhancing the number of metadata records available, as well as providing a future development path. Accordingly, means for advancing the service are examined in the context of metadata publication (production, integrity, delivery), the service infrastructure (interoperability, future-proofing) as well as enduser considerations (data visualisation, data access).

Metadata characteristics
The success of the service (or indeed any service which depends on metadata) relies on three critical aspects: quality, quantity and accessibility. Metadata quality refers not only to whether a metadata record is manifested in a way that is compliant with a specific standard (and hence is exchangeable) but whether it is unambiguously indicative of the dataset it depicts, is complete and up-to-date. Consistent provision of quality, fit for purpose metadata helps to assure user confidence in the service, providing impetus for return visits and in turn enhancing its reputation (Rackham, 2004).

10

ANSI/NIS Z39.50-1995 Information Retrieval (Z39.50): Application Service Definition and Protocol Specification.

Also known as the OGC Web Catalog Services protocol Version 1 or ISO 23950

11

For a service to be of any utility, the quantity of metadata records offered should meet users expectations. A paucity of records provides little motivation to use the service, as chances of locating appropriate data will be low.

Metadata records are of minimal utility if they are not accessible, regardless of quality or quantity. Metadata accessibility in this context not only relates to the ability to locate and retrieve the desired items, but that they are presented in a consistent format and conform to employed standards. A combination of a well designed user interface and effective underlying search engine are necessary to ensure that the user is presented with the best-fit records, ordered appropriately. Metadata that users find complicated or time-consuming to locate, access or understand will do little to popularise the hosting service.

The aforementioned factors are clearly inter-related. A vast quantity of metadata is pointless in the absence of assurance of quality, whilst a restricted set of high quality metadata is of limited value11.

Development approaches
Metadata generation The perception of metadata generation as being a tedious, expensive or unnecessary drain on time and resources presents a significant obstacle to the production process even where the need for quality metadata is recognised. Streamlining the overall process would serve to alleviate such concerns and help counter the human bottleneck.
11

Gigateway Advisory Group Meeting, 17th November 2004: http://www.gigateway.org.uk/aboutus/aboutus.html

12

The ability of modern GIS packages to handle metadata has enabled tighter integration of data editing and metadata composition into standard workflows. Once completed, metadata either resides with the data (easing the management and update of both), or it is copied to a central organisational repository or database. Metadata destined for exposure via gigateway should comply with the UK GEMINI12 standard a profile of the ISO standard 19115/19139 Geographic Information: Metadata and the UKs eGovernment Metadata Standard (eGMS). Currently, metadata stored in most GIS packages would need to be manually copied into an appropriate metadata editor (e.g. the gigateway-sponsored MetaGenie13), and / or manually augmented to achieve compliance. Preparation of discovery metadata thus represents at least a duplication of effort, as record elements existing elsewhere must be re-entered. The consequent requirement to populate even the minimum required fields manually clearly tends toward the tedious.

If geospatial datasets are created, manipulated and documented in proprietary GIS software, then the development environments included within such packages can be leveraged to programmatically populate metadata elements gleaned from the users computing environment on dataset creation or update. Completed metadata can then be output, validated automatically, complemented with human-mediated quality control measures and exported for eventual publication on an organisational, sectoral or national portal.

12

GEo-spatial Metadata INteroperability Initiative http://www.gigateway.org.uk/metadata/metagenie.html

13

13

As the UKs current market leading GIS software used extensively across the public, private and academic sectors, ESRIs ArcGIS suite is an obvious candidate for a solution, oriented towards the gigateway service. In an approach similar to that outlined by Vermeij (2001), the ArcCatalog component of ArcGIS can be tailored using of a custom metadata editing screen. Metadata elements displayed for completion are dictated by entries contained in an XML Stylesheet (XSL) conforming to a detailed metadata standard (ISO, eGMS). Mandatory GEMINI fields can be made compulsory to ensure that metadata later extracted for publication purposes comply with gigateways discovery format.

Metadata items may be automatically populated through the programmatic interpretation of dataset elements and system variables (inherent metadata), complemented by pre-prepared metadata templates for commonly-used values (author metadata) and completed manually by the metadata creator (descriptive metadata, necessitating human intervention). The conceptual steps are outlined in Figure 3. Completed metadata can then be validated against an appropriate schema that checks compliance and verifies that all mandatory elements are populated. Thus what was once was a time-consuming endeavour for the metadata creator can be reduced to a limited authoring step and performing quality control.

14

extract inherent metadata from dataset + complement with pre-defined author metadata + complete with descriptive metadata = a minimum set of mandatory fields Figure 3: Stages of automating metadata production. Elements requiring user input are reduced to those of recyclable author metadata and dataset specific descriptive metadata.

Metadata integrity Once preparation of standard compliant metadata is complete, the question of management arises. For a contributing organisation, the importance of aligning the provision of metadata to gigateway with internal metadata services is critical to ensure the initiatives continuing success14. Given the range of contributing organisations, this alignment is not trivial: with their own particular internal procedures, resources and guidelines, it is not surprising that storage techniques diverge from one organisation to the next (Tyler, 2002). Metadata can be stored alongside the data they describe, facilitating easy update; detached from the data within a DBMS in order to take advantage of inherent data management features; as text-based files to enable upkeep via simple text editors, or in any combination of the aforementioned. Additional difficulties appear as metadata are infrequently authored or edited where they are exposed, resulting in multiple metadata instances embodied in one or more standard. Here, metadata must not only be copied to where they are indexed and exposed but also transformed to conform to discovery metadata specifications. No matter the scenario,

14

Gigateway Advisory Group Meeting, 17th November 2004: http://www.gigateway.org.uk/aboutus/aboutus.html

15

redundancy results in potential loss of integrity and the formation of discrete information silos which suffer from update latency, requiring cascading updates and related version control management.

Metadata integrity issues can be addressed by migrating storage in its entirety to the database paradigm. By merging multiple metadata instances into one database repository, or a formal distributed database, potential sources of inconsistency are eliminated, while providing a secure, robust and manageable storage solution. With most GIS vendors offering DBMS-driven solutions, metadata composition could be closely integrated within data editing workflows.

Access to database-held discovery metadata necessary for participation in gigateway can be achieved in two principle ways. Where organisations wish to exercise the full benefits of formal database management (Date 2003), metadata can be exposed directly, although this will involve the provision of a Z39.50 interface15, with possible performance implications. Exporting database-held metadata as text files which may then be exposed remains more straightforward once the relevant database record is updated, a new file is exported, indexed and made available to the Z39.50 service as normal.

For organisations wishing to maintain current management practices based on a range of unconnected tools, metadata integrity problems and data silos may be addressed using a system based around formal synchronisation (Figure 4.). Developing the
15

For example Compusults MetaManager Toolkit, ESRIs ArcIMS Metadata Service and Intergraphs GeoConnect

Metadata Management Server

16

approach of Dunfey et al. (In press), a highly formalised procedure, a synchronisation file which acts as a road map for the system or synchronisation daemon provide means of reconciling otherwise unconnected metadata instances. However, complexities associated with the synchronisation of multiple files would suggest that a way forward based on a centralised DBMS is preferable.

database storage

import update

export transform

daemon synchronisation file

metadata source(s)

create update

copy / export transform

gigateway node
query / response

copy overwrite

copy transform

flat-file storage

Figure 4: Metadata synchronisation. A metadata master copy is updated / created and synchronisation is initiated. Pre-existing metadata is updated or overwritten according to storage strategy employed; new instances are imported or copied. Discovery metadata can be directly copied or exported to the gigateway node; detailed metadata must first be transformed.

Node hosting Organisations can contribute metadata to gigateway by transferring records to an existing node (e.g. gigateways centrally managed repository) or by exposing them on a node of their own. A distributed service architecture, where organisations are

17

encouraged to host their own node, was always a design goal aimed to foster a sense of proprietorship and participation amongst contributors (Nanson et al., 1995). Despite this encouragement, there remain institutions with significant data holdings that cite internal political problems and technical issues16 as the cause of their inability or unwillingness to host a node. While surmounting these political obstacles may well pose the greater challenge, options do exist to address the technological concerns relating to node installation and maintenance.

Currently, mounting a node involves the installation and configuration of a number of distinct software packages. Perceptions that the process requires a high-level of expertise result in setup being left to IT departments, outsourced to consultancies or indefinitely postponed where financial resources are insufficient. Adoption of the solutions proposed here will further exacerbate this. To circumvent these problems, the necessary components can be bundled into an automated installation, empowering nonspecialists to easily setup and configure contributory nodes. A barrier to contribution amongst potential contributors can thereby be lowered and provide for the exposure of previously untapped geospatial resources. Nevertheless, important preconditions such as service level agreements and quality guarantees should be enforced to prevent against casual participation which could negatively impact upon the gigateway service and users confidence in it.

This model does not suit all however organisations may not have sufficient numbers of metadata records to justify contributing in such a way, they may not have the

16

AGI gigateway Advisory Group Meeting Minutes, 18th May 2005: http://www.gigateway.org.uk/aboutus/aboutus.html

18

resources to install and maintain the required hardware and software, or they may simply be unwilling for reasons of cost, effort and so on. While transferring hosting responsibility elsewhere may get round local issues of node maintenance and the related costs, what will result is a further disconnect between the metadata and the data they describe.

Hosting by proxy Participating organisations opting not to host their own metadata may expose their holdings from nodes mounted elsewhere e.g. the central repository managed by the gigateway service. Submission may be by bulk transfer (e-mail, FTP, CD/DVD); those choosing the gigateway repository have the further option of submitting via the MetaGenie online editor. This comprises a web-based form which is completed to describe each dataset, generating records which still need to be manually processed. Regardless of approach, resources are necessary to assure the metadata is appropriate for publication.

To counter these manual processing requirements and consequent update latency concerns, an automated metadata harvesting facility could be introduced to the metadata generation-publication workflow, using for instance the library communitys Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH)
17

. Standing in

contrast to the approach employed by Z39.50 solutions, OAI-PMH retrieves metadata in bulk into a central repository. Conceptually it may be viewed as substituting one node type (Z39.50) for another (OAI-PMH), but as there are little maintenance overheads

17

http://www.openarchives.org/

19

aside from creating a web accessible folder, the approach may mitigate some concerns associated with its management. Moreover, as the protocol is HTTP-based, no additional configuration and security measures are necessary beyond those necessary for standard web servers (Amin, 2003). Using this as a method for contribution, participating organisations can at will deposit validated metadata into the web accessible folder from where they will be automatically harvested no dialogue need be opened between supplier and host.

Update latency concerns meanwhile can be alleviated by scheduling frequent harvests. Furthermore, as long as metadata quality, validity and adherence to GEMINI can be assured, processing resources at the host site are spared and metadata can be exposed immediately.

Metadata currency and quality The role of providers does not end with metadata submission they are responsible for ensuring that their metadata continue to accurately reflect the associated data. Within the GEMINI standard, currency is partially catered for through the Date of update of metadata element. There is an argument that, no matter how frequently a dataset is revised indeed, if at all a regularly maintained Date of update of metadata field confers confidence in the currency of the metadata record. For static or infrequently updated datasets however an old Date of update of metadata is likely to suggest that the data asset is outmoded and therefore less useful. Given that it would be inappropriate to update the Date of update of metadata field where a review has been undertaken, but no

20

actual update has taken place, this highlights the need for a Date of metadata reviewed field, currently absent from GEMINI.

Evidence from observations of the service18 nevertheless suggest that such elements are mostly ignored by producers after publishing their metadata. This could be tackled using a regular automated email notification mechanism based upon the Date element and Email address of distributor metadata fields. A quality stamp19 associated with each metadata item would complement this and enhance user confidence. Providing a system by which metadata can be rated for quality, either independently or by user feedback, would allow records to be evaluated at a glance as well as place an onus on contributors to maintain metadata quality, thus assuring their reputation. Additionally, using the Date elements as criteria for evaluating quality provides impetus to distributors to review and maintain records on a systematic basis.

Data access Complications relating to metadata not accurately reflecting the underlying data extend beyond contributors, affecting the end-users of the service. There is a twofold problem do the records returned unambiguously represent the data sought, and if so, how are the data obtained? Whilst the some standards (e.g. gigateways Discovery Metadata Specification, the forerunner to GEMINI) attempt to address these concerns by providing a sample field (containing a visual representation of the data) as well as the

18

AGI gigateway Advisory Group Meeting Minutes, 17th Nov 2004: http://www.gigateway.org.uk/aboutus/aboutus.html Guidelines for creating gigateway approved metadata exist but the proposed quality stamp has yet to be effectively

19

associated with hosted records. Further details may be found in: http://www.gigateway.org.uk/metadata/downloads/Gigateway_metadata_guidelines_ukgemini.pdf

21

contact details for the data distributor, neither fully address this issue. Evidence from existing records suggests that the sample field is rarely used, arguably as it requires more effort. Similarly, supplier contact details may not guide the user directly to the data even when traditional contact routes (telephone, FAX, postal address) are supplemented by a web URL20. The latter typically signposts the distributors homepage, where the data must be again be located. The degree of separation between data and metadata consequently disrupts workflow efficiency for the prospective user and can result in the ordering of an inappropriate product.

Providing an efficient means of accessing a more current representation of the data prior to procurement will go some way towards alleviating this problem. The UK GEMINI standard presents an improved treatment for visualisation by providing a field21 for a URL pointing directly to a representation of the data, not currently exploited by gigateway. Whether licensed or freely available, presenting a shop window for data via a live preview enhances the probability that the data shall be pursued. Complementing this with a facility for immediate download will boost workflow efficiency as well as help realise one the basic objectives of gigateway to promote geospatial asset exchange.

The use of the Browse graphic element could be extended to contain a URL pointing to, for instance, and OGC22-compliant Web Feature Server (WFS) such that custodians of unlicensed data have an opportunity to deliver the actual data via the same means as a
20

Uniform Resource Locator The Browse graphic element Open Geospatial Consortium

21

22

22

visualisation. For the provider, resources required to administer such a service are offset by the time spared from fulfilling data requests; for the user, waiting times associated with data procurement are significantly reduced.

Providing access to commercial data clearly requires the inclusion of a transaction model, which takes account both of direct payment and those subject to service-level licensing agreements (Figure 5.). Visualisation can be permitted via an OGC-compliant Web Map Service (WMS), which renders a picture of the data, and not the data itself. Subscribing organisations could download the data following a secure login, enabling retrieval in volumes or units dictated by the licensing model agreed. Individual users can be catered for through solutions provided by Internet Payment Service Providers (IPSPs e.g. PayPal).

23

approve transaction dataset

WFS

WMS

procure

transaction processor

visualise, download metadata

visualise account verification

user

license agreement account creation

user / corporate account

Figure 5: Adding data visualisation, access and transaction support to gigateway. OGC-compliant Web Feature Services (WFS) provide visualisation and access to free and purchased data. Web Map Services (WMS) provide a means of visualising licensed content without providing access to the underlying data. Purchased vector data can be downloaded as a feature set via WFS, imagery can be downloaded in compressed file format.

Underlying architecture While provided in the context of the architecture of gigateway, none of the solutions elaborated herein are considered tightly-bound to the current service infrastructure. The reality is that all geospatial data-sharing initiatives based on metadata will encounter similar issues regarding metadata authoring, quality, currency, accessibility and integrity. With this in mind, the beginnings of a more extensive overhaul of gigateway can be contemplated.

Since the rollout of askGIraffe, the UK metadata service has exclusively employed the Z39.50-based Isite. Flexible and efficient for near transparent querying of multiple metadata repositories, it is nevertheless argued by some that the Z39.50 protocol is

24

functionally limited in a number of ways, particularly when it comes to the representation of results (Tsou, 2002). Troll and Moen (2001) question Z39.50s ongoing utility given its complexity and interoperability handicaps, while different flavours provide varying degrees of support for spatial searching and its ability to scale is also called into question (Medyckyj-Scott et al., 2001; Amin, 2003). Rocha and Henriques (2004) meanwhile argue that the changing face of geographical information services, with increased demand for mobile solutions, real-time, data-ready applications and the long-term aim of data retrieval in the absence of human mediation dictates the adoption of a different paradigm.

The emerging OGC Catalogue Service Specification 2.x (OGC, 2005) aims to provide for such a different paradigm. Adhering to the trend in which the development of geographical information technologies continue to be more closely aligned with the mainstream IT industry and interoperability efforts (Higgins et al., 2005), the Specification details an open, standard interface that enables diverse but conformant applications to perform discovery, browse and query operations against distributed and potentially heterogeneous catalog servers23. Defining a number of communication protocols (bindings) based on CORBA, HTTP and a new iteration of Z39.50, adherence to the Specification enables creation of custom applications through the use of application profiles. Interoperability between different bindings is enabled through the use of a minimal abstract OGC_Common Catalogue Query Language, providing further support for spatial query constructs including DISJOINT, INTERSECT, WITHIN and OVERLAP (OGC, 2005).

23

OGC Press release http://www.opengeospatial.org/press/?page=pressrelease&prid=188

25

Despite outlining a more sophisticated, yet open, treatment for geospatial resource discovery, the Catalogue Service Specification 2.x remains an abstract specification with few well-tested or mature implementations - the communication protocol predominantly relied upon remains a legacy version of Z39.50. OAI-PMH is a notable exception, but is promoted as a complementing rather than an alternative technology (Breeding, 2002). Commercially-developed solutions24 meanwhile do provide sophisticated alternatives that integrate data storage, querying, middleware, desktop and Web clients into a coherent software stack, but concerns relating to cross-platform support and community acceptance may preclude their adoption.

However, the availability of the OGC Geospatial Portal Reference Architecture (OGC, 2004, Figure 6.) provides a new basis for commercial and open source solutions. The architecture offers specifications which allow a core system to be implemented, for example the GeoNetwork Metadata Catalogue Server, a collaborative development effort led by the FAO, UNEP and WFP25. Implementing the architectures portal and catalogue components, GeoNetwork continues to be based on Z39.50, thus offering the potential for incorporation within or replacement of the current gigateway architecture. Whilst not offering a departure from the protocol as espoused by Tsou (2002) or Rocha and Henriques (2004), its open, modular architecture provides scope to replace the communication protocol as laid out in the OGCs Catalogue Service Specification, as well as allowing interoperability with national and international schemes, as propounded by the INSPIRE directive.

24

For instance ESRIs GIS Portal Toolkit and products from MapInfo and Intergraph Food and Agriculture Organisation, the United Nations Environment Programme and World Food Programme

25

26

Portal Services viewers web query interfaces access management

Portrayal Services features coverages maps

Internet

Data Services content access data processing

Catalog Services data discovery service discovery data querying Figure 6: The OGCs Portal Reference Architecture (adapted from GeoNetwork homepage http://193.43.36.138/.) GeoNetwork provides for core Portal and Catalog Services, into which existing Portrayal Services (e.g. MapServer, GeoServer) and emerging Data Services may be incorporated.

Further considerations
Prospective efforts to reinvigorate the gigateway service will predictably be fraught with difficulty. Future visions of how the service is manifested aside, questions as to the prudence of jeopardising a long history of investment in the current technology, infrastructure and expertise certainly arise. Even with its oft-perceived limitations, the current infrastructures track record is proven within the UK context, underpinning what remains a popular and dependable service. Considering the diversity of gigateways stakeholders and the resistance to change witnessed in some quarters, strong reasoning for any proposed modifications will be necessary. Even if a consensus is forthcoming, damage to gigateways reputation could prove fatal if an enhanced service proves

27

unreliable or does not live up to user expectations. Of course, initiating and maintaining prospective changes in service paradigm are contingent on whether the necessary financial and human resources are forthcoming.

Some of the development paths elaborated above raise further issues. With respect to coupling automated metadata generation with dataset editing workflows, the lack of open, standard geo-interfaces or Application Programming Interfaces (APIs) across the GIS industry currently precludes the creation of a universal solution, thereby necessitating the development of package-specific strategies.

Automating metadata management and submission processes will serve to reduce the resources necessary for contribution to gigateway, but do underline the need for quality and validation safeguards to ensure that inappropriate records are not exposed on the service. Any implementation of the solutions suggested above should therefore be supplemented with systematic human-mediated quality control performed by appropriately trained users, whether on a spot-check basis or brute force evaluation of all metadata items processed. Similar deliberations are necessary if there is to be a system supporting the independent accreditation of metadata posted on the service. As for quality benchmarks, steps aimed at converging the current gigateway approved stamp with international accreditation schemes (such as those guided by ISO) should be made to facilitate cross-application compatibility and adoption.

For organisations with few records, or datasets that change infrequently, manually generating, updating and submitting records may well represent the preferred way

28

forward. Similarly, preference for retaining some manual control over automated processes should not be discounted, particularly for those already with well-defined protocols in place or those reluctant to yield control to what may be perceived as a black box procedure. In any case, focus should remain on promoting quality contribution to gigateway, not the excessive imposition of further layers of complexity on the process where it is not wanted nor warranted.

DBMS techniques would by their nature provide for better management and integrity of metadata within the gigateway service. While there is an argument that suggests this would significantly add to the complexity of the system, the well-established interfaces to DBMS based on SQL (Structured Query Language) should render such components appropriately modular. Although issues of cost may be raised as concerns, free and open source software (FOSS) such as MySQL and Postgres are viable options.

Z39.50 has been criticised for its failings in relation to geographical metadata. However, the advent of geo-centric extensions, together with more recent developments associated with the OGC Catalogue Service Specifications, overcome some of these concerns. The ability to access metadata and aggregate search results through more modern protocols such as HTTP GET and POST requests integrate metadata access more closely with standard web-based systems. The key, for gigateway, is to provide a transparent transition from the old to the new.

Both proprietary and FOSS solutions have been discussed each has its place, with their own particular advantages and disadvantages. Proprietary systems can be argued to

29

offer stability, less risk and provide buy-in to a ready-made, hopefully well-tested product complete with support. Yet they can prove expensive. FOSS can provide a less expensive alternative, although are rarely completely free, often requiring specialised expertise whether in-house or out-sourced. What is crucial is to ensure the modularity of components linked by standardised interfaces such that there should be no dependence on either proprietary or FOSS because these components can be readily replaced.

Additional flexibility can be conferred by providing the aforementioned software complete with their source code, whether crafted in proprietary or open environments. While universally applicable solutions are presented, enabling access to the inner workings of such software will ease integration efforts with incumbent configurations that invariably differ between organisations. Moreover, by providing support for facilities similar to those of the online open source communities (e.g. SourceForge), namely a code repository and a user forum, enthusiastic participants can further develop, discuss and distribute provided solutions in a collaborative setting to the benefit of the wider participating community.

While the current work is presented in the context of gigateway and the GEMINI discovery schema, it is important that any implemented solution not be tightly bound to any one particular standard. The state of flux and delays associated with standard stabilisation efforts (GEMINI itself is yet to be finalised), the need to implement metadata profile extensions and the emergence of new profiles and schemas all make the ability to substitute one standard for another a functional requirement for the adopted solutions.

30

Providing robust account management, secure access and scalable computing resources are critical, particularly if offering facilities for data visualisation and download are successful in attracting more users to the service. Usage should therefore be closely monitored to enable a proactive response to potential increases in traffic volumes. And with the proposed provision of both free and licensed data, such monitoring systems could be extended to grant data procurement analyses and feedback mechanisms, as well as offer a potential test-bed in which the implications of supplying free versus licensed data can be analysed.

The INSPIRE and PSI directives alone will most likely be insufficient in affording the momentum necessary to drive the metadata generation and contribution necessary for tapping underexploited geospatial resources. Marketing campaigns and educational drives (such as workshops and seminars) can continue to complement and reinforce other facets of data sharing initiatives, help raise awareness of the benefits of, and lower barriers to, participation.

Future prospects
While a range of technical options for advancing the gigateway initiative has been presented, what must be borne in mind is that the current service is simply a means to an end, not the end itself. The overall goal of the service is to facilitate the discovery and exchange of geospatial data, not the metadata used to describe it. Nevertheless, a central objective in any such service should be to make metadata easier to create, maintain, publish and locate. Logical extensions to this are to provide a means of

31

accessing the data such surrogates depict and to provide linkages to other, similar schemes at national and international level.

Considering the diverse nature of stakeholders involved in gigateway, any decision on how to evolve the service will never be based purely on the technological. Indeed, there remains an urgent need to resolve the aforementioned political issues and to garner consensus amongst both those directing the service and contributing to it not only regarding a future direction but also where the funding for service upkeep, improvement and potential overhaul shall be sourced from. Fundamental decisions must be made relating to the overall objectives of the service and how it should be manifested, such as whether it should persist as a metadata service, or whether opportunities presented by promising technologies should be taken to broaden gigateways scope, as suggested above. Any assessment shall clearly be tempered by a number of considerations. Interoperability with other services must remain a critical factor, particularly in light of legislative requirements at national and European level. The need to maintain the services standing in the face of emerging initiatives more in tune with both contributor and consumer expectations is also crucial to avoid perceptions of complacence and the resulting implications on numbers contributing to and exploiting gigateway. Whatever path ultimately taken, the overall objective should not only be the realisation of a service befitting that of an internationally visible initiative, but one that its users view as being fit for purpose.

32

References
Amin, S. (2003). The Open Archives Initiative Protocol for Metadata Harvesting: An Introduction. DRTC Workshop on Digital Libraries: Theory and Practice, Bangalore, India: DRTC.

Breeding, M. (2002). The Open Archives Initiative. <http://www.librarytechnology.org/ltg-displaytext.pl?RC=9627> Accessed 04-06-2006.

Date, C. J. (2003). An Introduction to Database Systems, Eighth Edition. Boston, MA: Addison Wesley.

Davey, A. and Murray, K. 1996. Update on the National Geospatial Database Collaboration between Organisations. In AGI 96 Conference Proceedings: Geographic Information Towards the Millenium, Birmingham, UK. AGI.

Deng, Y. (2002). The Metadata Architecture for Data Management in Web-based Choropleth Maps. <http://www.cs.umd.edu/projects/hcil/census/JavaProto/metadata.pdf> Accessed 04-082006.

Dunfey, R. I., Gittings, B. M. and Batcheller, J. K. (In press). Towards an Open Architecture Vector GIS. Computers and GeoSciences.

ESRI. (2002). Metadata and GIS. Available from http://www.esri.com/. 33

Gigateway 2003. Discovery Metadata Specifications. Available from http://www.gigateway.org.uk/.

Gbel, S. and Lutze, K. (1998). Development of meta databases for geospatial data in the WWW. In Proceedings of the 6th international symposium on Advances in geographic information systems, Washington, United States. ACM Press.

Guptill, S. G. (1999). Metadata and data catalogues. In P. Longley, M. F. Goodchild, D. J. Maguire and D. W. Rhind, Geographical Information Systems (pp.677-692). Chichester: Wiley.

Harder, C. (1998). Serving Maps on the Internet -Geographic Information on the World Wide Web. Redlands, CA: ESRI, Inc.

Hart, D. and Phillips, H. (2001). Metadata Primer - A "How To" Guide on Metadata Implementation. <www.lic.wisc.edu/metadata/metaprim.htm> Accessed 04-08-2006.

Higgins, C., Medyckyj-Scott, D. and Reid, J. (2003). A Community Specific SDI - the Case of UK Academia. In Geodaten- und Geodienste-Infrastrukturen - von der Forschung zur praktischen Anwendung, Mnster, Germany. University of Mnster.

Higgins, C., Robertson, A. and McGarva, G. (2005). Edinburgh University Data Library Geographic Information Standards: Final Report. Available from http://www.edina.ac.uk/.

34

Hobona, G., James, P. and Fairbairn, D. (2004). Facilitating Data Discovery In Environmental Data Clearinghouses Through Spatial Data Mining. In Proceedings of the GIS Research UK 12th Annual Conference, Norwich, UK. University of East Anglia.

Luo, Y., Wang, X. and Xu, Z. (2003). Extension of Spatial Metadata and Agent-based Spatial Data Navigation Mechanism. In GIS'03: Proceedings of the 11th ACM international symposium on Advances in geographic information systems, New Orleans, LA, USA. ACM.

Mathys, T. (2004). The Go-Geo! Portal Metadata Initiatives. In Proceedings of the GIS Research UK 12th Annual Conference, Norwich, UK. University of East Anglia.

Medyckyj-Scott, D., Chappell, C., Pradhan, A. and O'Hanlon, C. (2001.) A geo-spatial data resource discovery tool for UK Further and Higher Education - Project Overview and Recommendations. Available from http://www.edina.ac.uk/.

Nanson, B., Smith, N. and Davey, A. (1995). What is the British National Geospatial Database? In AGI 95 Conference Proceedings: Expanding Your World, Birmingham, UK. AGI.

OGC (2004). Geospatial portal reference architecture: a community guide to implementing standards-based geospatial portals (OGC Draft Report No OGC 04-039). Open Geospatial Consortium.

35

OGC (2005). OGC Catalogue Services Specification 2.0.1. (OGC implementation specification 04-02lr3).

Putz, S. (1994). Interactive information services using world-wide web hypertext. Computer networks and ISND System, 27, 273-280.

Rackham, L. (2004). An Independent Review of the Sustainability of a UK Metadata Service for Geographically Related Information. Available from http://www.gigateway.org.uk/.

Rocha, J. G. and Henriques, P. R. (2004). Towards XML Web Services based Clearinghouses. In Proceedings 7th Global Spatial Data Infrastructure Conference, Bangalore, India.

Schweitzer, P. N. (1998). GIS and Metadata - Putting Metadata in Plain Language. <http://www.geoplace.com/gw/1998/0998/998abc.asp> Accessed 04-06-2006.

Troll, D. and Moen, B. (2001). Report to the DLF on the Z39.50 Implementers' Group Moving Towards the Future of Z39.50. Issues and Options Based on ZIG Meeting Discussions December 6-7, 2000. Available from http://www.diglib.org/.

Tsou, M.-H. (2002). An Operational Metadata Framework for Searching, Indexing, and Retrieving Distributed Geographic Information Services on the Internet. In Egenhofer

36

M. and Mark, D. Geographic Information Science (GIScience 2002): Lecture Notes in Computer Science Vol. 2478 (pp.313-332). Berlin: Springer-Verlag. Tulloch, D. L. and Robinson, M. (2000). A progress report on a U.S. National Survey of Geospatial Framework Data. Journal of Government Information 27, 285-298.

Tyler, G. T. (2002). Managing Metadata: Developing technical solutions for the askGIraffe geospatial metadata gateway. Unpublished MSc. Thesis, University of Edinburgh, Edinburgh.

Vermeij, B. (2001). Implementing European Metadata Using ArcCatalog: ArcUser July-September 2001. Available from http://www.esri.com/news/arcuser/.

37

You might also like