Professional Documents
Culture Documents
DiGIR 1
Project Goals
To define a protocol for retrieving
structured data from multiple,
heterogeneous databases
To build a reference implementation of
said protocol
DiGIR 2
Design Goals
To use open protocols and standards, such
as HTTP, XML, and UDDI to leverage
existing and emerging technologies
To de-couple the protocol, software and
semantics
To automate the establishment of a new
data provider as much as possible
DiGIR 3
High-level Architecture
Protocol
Provider
Portal
Registry
DiGIR 4
Protocol
Defines request and response message
formats for communication between
Provider and Portal
Assumes Providers conform to a known
federation schema
Remains flexible to allow for federation
schema pluggability
DiGIR 5
Provider
Makes structured data
available to portals
Communicates via protocol
compliant messaging only
Complies with a known
federation schema
Supplies meta-data to
describe data classification
and availability
DiGIR 6
Portal
The entry point for a “user”
Can make requests of N
number of providers
Communicates via protocol
compliant messaging only
Queries registry for available
providers
Can determine, based on
provider meta-data, whether
a provider should be queried
DiGIR 7
Project Information
The DiGIR project is a collaborative effort
DiGIR is currently established as an open
source project on SourceForge (
http://sourceforge.net).
Further documentation is available on the
SourceForge site.
Please join us in collaborating!
DiGIR 8
Protocol Details
DiGIR 9
Protocol Details
Specified in an XML Schema (.xsd)
Intended to work in conjunction with
federation schemas, also expressed as
XML Schemas
Actual request and response documents are
instance documents conforming to both the
protocol schema and a federation schema
DiGIR 10
<request xmlns="http://www.namespaceTBD.org/digir"
xmlns:darwin="http://www.namespaceTBD.org/darwin"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.namespaceTBD.org/digir digir.xsd
http://www.namespaceTBD.org/darwin darwin.xsd">
<header>
<requestType>search</requestType>
</header>
<search>
<dbName>myDiggableBipesDB</dbName>
<filter>
<and>
<in>
<list xsi:type=“darwin:list”>
<darwin:Month>11</darwin:Month>
<darwin:Month>12</darwin:Month>
</list>
</in>
<equals>
<darwin:Genus>Bipes</darwin:Genus>
</equals>
</and>
</filter>
<records start=“0” count=“50”>
</search>
</request>
DiGIR 11
Request Explanation
Composed of elements from the protocol
namespace (default) and the schema namespace
<header> contains information about the payload
<search> contains dbName, filter, and record
specification (will also specify result format)
<filter> is effectively an XML representation of a
SQL where clause
This search request is for the first 50 specimen
records that are genus Bipes and were found in
the months of November or December.
DiGIR 12
Filter Building
LOPs (logical operators) COPs (comparison ops)
<equals>
<and>
<lessThan>
<or>
<lessThanOrEquals>
<andNot>
<notEquals>
<orNot> <greaterThan>
Can be nested <greaterThanOrEquals>
<like>
<in> (multi value)
DiGIR 13
What “binds” the schemas?
The protocol schema defines various abstract
types and elements:
<xsd:element name="searchCondition" abstract="true">
<xsd:element name="alphaSearchCondition" abstract="true“
substitutionGroup="searchCondition">
<xsd:complexType name="listType" abstract="true" />
<xsd:complexType name="numericListType" abstract="true" />
DiGIR 14
<xsd:complexType name="list
<xsd:complexContent>
<xsd:extension base="digir:listType">
<xsd:sequence>
<xsd:choice>
<xsd:element ref="ScientificName" maxOccurs="unbounded"/>
<xsd:element ref="Kingdom" maxOccurs="unbounded" />
<xsd:element ref="Phylum" maxOccurs="unbounded" />
<xsd:element ref="Class" maxOccurs="unbounded" />
<xsd:element ref="Order" maxOccurs="unbounded" />
<xsd:element ref="Family" maxOccurs="unbounded" />
<xsd:element ref="Genus" maxOccurs="unbounded" />
<xsd:element ref="Species" maxOccurs="unbounded" />
<…>
</xsd:choice>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
DiGIR 15
Why “bind” like this?
To provide data-typing (string, numeric,
etc.) for various concepts within operators
at an abstract level (e.g. LIKE only valid
for string data; IN allows for multiples, but
in a controlled fashion)
To allow for federation schemas to simply
classify data as types without having to
redefine/extend operators
DiGIR 16
Request Issues
Do we need another abstract element such as
dateSearchCondition?
What information will be useful in the header?
How should we specify the format of the results?
What standard formats should be offered (I.e.
brief, full?).
Will tblName be part of the meta-data required of
providers?
What concepts of Darwin Core 2 are searchable?
DiGIR 17
Response Prototype
<response xmlns="http://www.namespaceTBD.org/digir"
xmlns:darwin="http://www.namespaceTBD.org/darwin"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.namespaceTBD.org/digir digir.xsd
http://www.namespaceTBD.org/darwin darwin.xsd">
<header>
<!-- contents TBD -->
</header>
<content>
<record>
</record>
</content>
<diagnostics>
</diagnostics>
</response>
DiGIR 18
Response Issues
How do we format and validate the response
content?
What elements are needed for the <header>, if
any?
Do we always have diagnostics, or only if there is
an error?
Should a finite set of diagnostics be created and
maintained in its own XML Schema? Will there
ever be a diagnostic that is specific to a federation
schema?
DiGIR 19
Provider Details
DiGIR 20
Provider Details
Implemented as a web application that answers questions
Interface is not specific to a particular information domain
No state information is recorded
Each request is treated as unique and uninfluenced by previous
requests
Must always generate a valid response
Consists of four key components
Request handler
Filter handler
Result set cache
Response generator
DiGIR 21
Request Handler
Receives XML document
Validates document
Generates internal structures for further
processing
DiGIR 22
Filter Handler
Internal structural representation of filter
(query) structure
Responsible for generating a native query
string for querying the database
Communicates with UDDI to obtain
standard database definition
Custom configured to work with specific
database implementation
DiGIR 23
Result Set Cache
Contains the results of applying a query
Responsible for generating the response
records in the requested format
Somewhat directly integrated with the
response generator
DiGIR 24
Response Generator
Generates the response XML document
Serializes the response header information
Serializes diagnostic information
Serializes the requested subset of records
DiGIR 25
Provider Configuration
Portal
Profile
Schema
DiGIR DiGIR
Provider Provider
Data Data
DiGIR 26
Portal Details
DiGIR 27
Portal Details
Divided into two distinct components: a
presentation layer and PortalServices
The presentation layer supports the UI and
translates requests (HTTP requests from forms or
links) into protocol compliant XML requests
The presentation layer also handles all display
issues involving the responses, such as format,
sorting, collating, etc…
The presentation layer is envisioned to be an
application server/web server implementation
DiGIR 28
Portal Details
PortalServices handles all external network
activity (UDDI calls, provider calls, etc)
PortalServices limits provider calls to those
necessary based on provider meta-data
PortalServices threads provider calls for
increased performance (I.e. response time)
PortalServices is envisioned to be a webapp and
supporting classes running within an application
server, such as TomCat
DiGIR 29
PortalServices
RegistryAccess
ProviderCache
PortalConfig
PortalServlet
PortalRequestHandler
ProviderFilterer
Marshallers
DiGIR 30
Portal Issues
What information will be stored in UDDI about a
provider?
What information will be known for
communicating with a Provider (I.e. IP address,
port, etc…?)
What meta-data will be provided and what are the
rules for using such data for provider filtering?
What requirements are there for logging and
monitoring?
DiGIR 31