2001 - HOWARD HOW LEUNG LOUIE, A Framework For Trust Management in Mediated Query Systems

A Framework for Trust Management in
Mediated Query Systems
BY
HOWARD HOW LEUNG LOUIE
B.S. (University of California, Davis) 1999

M.S. (University of California, Davis) 2001
THESIS
Submitted in partial satisfaction of the requirements for the degree of
MASTER OF SCIENCE
in
Computer Science
in the
OFFICE OF GRADUATE STUDIES
of the
UNIVERSITY OF CALIFORNIA
DAVIS
Approved:
_________________________
_________________________
_________________________
Committee in Charge
2001
-i-
Acknowledgements
The work in this thesis is a result of considerable effort on my part, which would not have
been possible without the support of many people. I thank my parents, Ton Been and Chung Ping
Louie. I also thank my brother, Kenneth, who supported me throughout this writing.
I thank my advisors, Michael Gertz and Premkumar Devanbu, for giving me the
opportunity to work on this project and for teaching and guiding me throughout this thesis.
I also thank Hewlett-Packard and Boeing for their generous financial support. The
summer I spent at Hewlett-Packard Laboratories with Troy Shahoumian, Pankaj Garg, Jerremy
Holland, Vijay Machiraju, Mohamed Dekhil, and Klaus Wurster was a fun and memorable
experience.
- ii -
Table of Contents
1 Introduction................................................................................................................................... 1
1.1 Motivation .............................................................................................................................. 1
1.2 Requirements.......................................................................................................................... 2
1.3 Contributions.......................................................................................................................... 4
1.4 Structure of the thesis ............................................................................................................. 5
2 Background ................................................................................................................................... 6
2.1 Software life cycle management ............................................................................................ 6
2.2 Trust management .................................................................................................................. 7
2.3 Mediated query systems ......................................................................................................... 9
3 Infrastructure............................................................................................................................... 12
3.1 Data model ........................................................................................................................... 12
3.2 Trust model .......................................................................................................................... 13
3.2.1 Trust types ..................................................................................................................... 14
3.2.2 Flow of trust metadata ................................................................................................... 15
3.3 Trust authorities.................................................................................................................... 16
3.4 Trust broker .......................................................................................................................... 19
3.4.1 Trust broker schema ...................................................................................................... 20
3.4.2 Trust broker services ..................................................................................................... 21
3.5 Mediator ............................................................................................................................... 23
3.6 Client .................................................................................................................................... 24
3.7 Individual component knowledge ........................................................................................ 25
4 Formulation of trust in queries.................................................................................................... 28
4.1 Overview .............................................................................................................................. 28
4.2 Conceptual model................................................................................................................. 29
- iii -
4.3 Query language extensions................................................................................................... 37
4.4 Pragmatic issues ................................................................................................................... 46
5 Effect of trust metadata on query processing.............................................................................. 49
5.1 Overview of query processing.............................................................................................. 49
5.2 Changes to mediation in query processing........................................................................... 50
5.3 Integration into mediation .................................................................................................... 62
6 Conclusions and future work ...................................................................................................... 67
References...................................................................................................................................... 70
- iv -
Table of Figures
Figure 1.1 Mediated query system................................................................................................... 9
Figure 3.1 Infrastructure overview diagram .................................................................................. 16
Figure 3.2 Overlap of trust statements for wrapper DTD .............................................................. 18
Figure 3.3 Overlap of specifiers for mediator DTD....................................................................... 25
Figure 3.4 Properties of components known to other components................................................ 27
Figure 5.1 Steps to process a query for MQS without trust extensions ......................................... 49
Figure 5.2 Steps to process a query for MQS with trust extensions .............................................. 50
-v-
1
1 Introduction
With the advent of the Web, software life cycle management has the potential to improve
by leaps and bounds. The old methods of installing software from a CD has already been slowly
phased out to the purchasing and installation of software directly from the Web. Commercial
products such as Marimba [Mar98] and research prototypes such as the Software Dock [Hal99]
take a step further by managing the life cycle of software from installation to retirement of
software directly via the network.
The impetus for our information systems research stems from the desire to build a
software life cycle management system that is both scalable and secure. Previous research
prototypes such as Software Dock are scalable but have not considered security issues. Our
research aims to address issues of trust in managing retrieving software life cycle management
data.
1.1 Motivation
We will present a scenario to motivate our research. Consider a user Joe working in an
organization ABC. Joe currently uses the Java Development Kit (JDK) 1.1 [Sun] and would like
to upgrade to JDK 1.2. There are many configurations of JDK 1.2 for as many platforms.
Variations include differences in operating system version, standard or enterprise edition, with or
without advanced cryptography, etc.. Joe would like to obtain the correct software configuration
description for his workstation. He submits a query, which states his own desktop configuration,
plus company ABC's trust policies, to a software configuration information system. The
information system then retrieves the requested configuration data, perhaps by pulling data from
one or more sources. The configuration data retrieved satisfies both the trust policies of the
organization ABC and solves Joe's upgrade problem.
Our goal is to provide a framework to allow the trust portion of Joe's queries to work.
Joe trusts the information system to respect the trust policies of ABC. He knows that the data he
2
gets back satisfies the trust constraints, and has been annotated to let him know how the query
result satisfies the trust constraints.
Our research falls into a hybrid of trust management and data quality issues. Trust
management has been described as deciding whether requested actions should be allowed. Data
quality research aims to provide clients of information systems a certain degree of confidence
about their data. Trust management systems such as REFEREE [CFL97] allow for general
assertions regarding the properties of Web sources and are not designed for any specific
information system. REFEREE itself is simply a platform with a language and evaluation
environment for trust policies. Integrating the language and trust policy evaluation environment
into some information system requires more research.
Information systems that allow for requirements on data often do so under the viewpoint
of data quality. Systems such as described in [NLF99] allow for specifying a certain degree of
quality necessary in the query result. Such systems, however, contain metadata regarding quality
that is centrally administered and somewhat static. Distributing the task of creating metadata
would make such frameworks more scalable, dynamic, and responsive to information resource
changes over time.
The combination of decentralized and flexible creation of metadata assertions along with
policies for specifying requirements on data, combined within the information systems paradigm
- that is the framework envisioned by our research.
1.2 Requirements
We outline our requirements for the framework below. The requirements are divided into
four broad categories and we discuss each one in turn.
First and foremost, we must allow decentralized assertions of trust for information
sources. Decentralization allows for scalability and dynamism. Other frameworks, such as
[Kha96] also decentralize their approach to prescribing assertions. If we allow for distributed
components to state assertions we are no longer limited to almost static metadata. At the same
3
time, the decentralized sources of assertions automatically decouple the producers and consumers
of those trust assertions. Thus, assertions, once created, may be utilized by multiple consumers.
Any individual that desires the advantage of using trust assertions made by others in
collecting data should be provided a conceptual model to formulate trust requirements. The
requirements must be independent of the data content requested. When the trust requirements are
independent of the data, the trust requirements may be specified separately from the data
requirements, which is important from an administrative viewpoint. Our software life cycle
application specifies that not all users of the information system may know when or why trust is
granted. The user may simply rely on a central security administrator to provide the trust
requirements. Therefore the data content and trust requirements must be independent of each
other.
The trust model developed must specifically be compatible with the XML data model
[BPS00]. There are two reasons for this. First, we are motivated to address the software life
cycle management problem. Although other languages and schemas have been developed to
describe software configurations [HHW98, HHW98b], a schema for software configurations has
been defined using the XML data model, e.g. a DTD has been defined for software configurations
[HHW99]. Second, the XML data model is industry standard for data exchange and data
integration. It is flexible enough to handle data for all kinds of applications from all types of
heterogeneous information sources.
We choose to build our framework for trust on top of the mediated query system (MQS)
[DD99] due to its advantages of flexibility, dynamism, and transparency. Therefore, our trust
model must be easily integrated with existing mediation frameworks. The advantage of
flexibility, dynamism, and transparency of MQS must not be hindered. Mediators provide a
value-added service by making the disparate data from information sources more useful as a
whole [Wie92]. Our additions must not limit the current functionality of MQS in any way.
4
1.3 Contributions
Our contributions include the following:
1 An architecture to enhance MQS with trust extensions
2 Formalizing the notion of trust assertions, including its types and semantics
3 Describing a central entity to collect trust assertions and transform the
assertions into trust metadata.
4 A model to conceptualize trust requirements, including a language to
express the requirements.
5 Outline of mediation extensions to utilize trust metadata for the benefit of
trust-aware data integration.
Our architecture extends, not replaces, the MQS architecture. We formalize the notions
of trust authorities, which are the producers of trust assertions, by defining the abilities and
identifying their knowledge. The infrastructure is decentralized because it allows independent
producers of trust assertions. Since the architecture is decentralized, it is also scalable and
dynamic. It is dynamic because producers may join or leave the system at will - they are not
bound to the system. It is scalable because there is no upper limit to the number of trust
authorities.
Borrowing from paradigms found in real-world organizations, trust authorities form trust
assertions. The trust assertions are similar to accreditation for academic institutions, or ratings of
the strength of insurance companies. The trust assertions are a certification of some aspect of the
data from information sources. The semantics for the use of trust assertions are also defined.
Usage of trust assertions is non-destructive, so we may have no limit on the number of trust
assertion consumers.
We design a trust broker to store and convert trust assertions into trust metadata.
Management of the trust metadata is handled by the logically centralized trust broker. The
decentralized producers encourage dynamic trust metadata that is responsive to changing sources.
5
The conceptual model we provide to clients for specifying trust requirements is
independent of the data content requested. The advantage of the data request and trust
requirements independence, as Section 1.2 points out, is that the trust requirements may be reused
for many queries. Another advantage of the independence is that the query language extensions
are not bound to any particular query language. The query language extensions also allow for
specifying a liberal or a conservative application of trust requirements.
Finally, we outline and give examples of how mediation may be extended to include
trust. The additional considerations from the trust metadata assist in eliminating sources not
trusted according to the client’s requirements, and for conflict resolution according to the client’s
requirements.
1.4 Structure of the thesis
Chapter 2 provides background information on relevant research and technologies.
Chapter 3 discusses the infrastructure that supports the trust model. We detail the functionality of
a trust broker to collect and manage the trust assertions. We show how assertions of trust may be
formulated in a decentralized and scalable manner. Chapter 4 examines the conceptual model of
trust available to clients. This same chapter also specifies the usage of the language used to
represent the criteria of clients regarding trust. Chapter 5 gives an outline of how the mediator
makes use of trust metadata in its mediation. Finally, Chapter 6 gives the conclusions and future
work.
6
2 Background
Our research draws on many related technologies. The areas of trust management,
software life cycle management, and mediated query systems provide the basis for our work, and
in return we make a contribution related to those fields. Section 2.1 gives the background on
software life cycle management and related works. Section 2.2 covers trust management. This
includes policy languages and paradigms for trust. Section 2.3 explains the principles of mediated
query systems and gives some examples of existing systems.
2.1 Software life cycle management
Software life cycle management is concerned with the management of software, from the
delivery of the software to retiring the software at the client site. Hall's Ph.D. thesis [Hal99] is
the first to formalize the notion of software life cycle management. He provides a framework
within which he divides software life cycle management into identifiable, distinct stages: release,
retire, install, activate, deactivate, reconfigure, update, adapt, and remove. Hall also architected
the Software Dock [HHH97], which takes full advantage of the software life cycle framework.
The Software Dock uses agents to facilitate software lifecycle management. Agents
travel to and from release docks (representing software producers) and field docks (representing
software consumers). The agents learn of software releases from the release docks, and make
changes at the consumer side through the field docks as necessary. A wide-area event system
provides communication between docks and agents, providing notification of changes.
Marimba [Mar98] has a product called Castanet which handles delivery, update,
management, and repair of custom and shrink-wrapped applications over the networks. The
Castanet model is that the application server provides all the file and directory artifacts and
registry changes one needs to configure applications. Encryption, user authentication and
application authentication provides the necessary security measures. The Castanet infrastructure
7
deployment tools are proprietary and organizations need to buy into Castanet products before
deploying their software.
Microsoft and Marimba together has created the Open Software Description (OSD)
format [HPT97]. Currently a W3 Consortium proposed standard, OSD has a vocabulary for
describing relationships between software components with their various versions. OSD is
related to Microsoft's Channel Definition Format (CDF) [Ell97], which is used to "push" software
systems for automatic installation and update.
The Desktop Management Task Force, a personal computer management standards
setting consortium, has created a software management interface called the Desktop Management
Interface (DMI) [DMI98]. The DMI is a common interface to manage applications. The
Management Information Format (MIF), also part of the DMI specification, was created to
describe computer systems. The Common Information Model (CIM) [CIM98] is an object-
oriented model for describing the systems, and replaces the MIF.
2.2 Trust management
Research for trust management has been done in the public key infrastructure space
(PKI), and the Web space. Regardless of the domain space, the central issue of trust remains the
same: Why do we believe/adopt that data/code?
Trust in most systems is assumed to be a property we assign to an entity. This property
allows that undertaking an operation using the entity will not violate the security and integrity of
the underlying system in any way [CFL97]. The entity that is assigned trust may be, for example,
some data, a process, or a person. Policies for assigning trust may vary. Usually policies are
based on the limitations and intended security of the system. 1) Trust all, 2) trust only if meet
criteria, such as authentication (examples are Microsoft Authenticode [Auth], public key
infrastructure [PKI00]), and 3) don't trust are all examples of policies enforceable based on the
abilities of the underlying system. Don't trust, in some cases, allows that if the entity is some
executable code then there is some ability to monitor the code or modify the code [ET99].
8
PolicyMaker [BFL96, BFL96b], PGP [Zim94], and hierarchical public key
infrastructures (PKI) [Win98] each offers different approaches for trust. Hierarchical PKI assume
some central omnipotent certificate authority which everyone trusts. PGP assumes an ad hoc
approach in which we trust our trusted friends to vouch for keys. PolicyMaker allows for writing
trust policies, and advocates binding actions to keys (instead of identity to keys), thus anyone
holding the key may perform the corresponding action.
In the Web space, REFEREE [CFL97] allows for specifying trust policies and provides
an environment to safely evaluate compliance of policies and actions with the specified policy.
REFEREE builds on the infrastructure supported by PICS [RM96]. PICS is a W3C
recommendation for labeling anything with a Uniform Resource Indicator (URI) [BF98] on the
Web. For example, Web resources are described using PICS labels. The PICS labels define
properties of the resource (e.g. executable code has been virus checked). The labels are made by
rating services. Users specify trust policies that assert what rating sources are trusted and the
requirements imposed on the PICS labels. Users are not necessarily aware of the resources PICS
labels describe, but rely on the rating services with the PICS labels to select trusted resources.
Labels may be collected in a label bureau. PICS labels are machine readable so programs can be
written to automatically categorize, filter, etc. labeled Web resources.
Trust Management on the World Wide Web [KR98] outlines the basic elements of trust
on the Web, and the implications of trust management for future Web applications. They define
trust management as a "framework for decentralizing security decisions that helps developers and
others in asking ' why ' trust is granted rather than immediately focusing on ' how ' cryptography
can enforce it" [KR98]. It states the issues in describing principles, principals, policies, and
pragmatics of trust management infrastructures. It also highlights the need for people working
and using the Web to help turn the Web into a Web of trust.
9
2.3 Mediated query systems
Our approach to the enabling of trusted query results is developed on the foundation of
mediated query systems (MQS). MQS allows for using a single, dynamic schema to access
multiple, dynamic and heterogeneous sources of data. The MQS architecture is divided into three
layers: the application, integration, and data sources layer [DD99]. Mediators at the integration
layer provide a single interface for clients at the application layer and perform integration of data
sources with heterogeneous data models [Wie92]. Rules and constraints written by application
domain experts constitute the simple intelligence mediators have for data integration. Clients
formulate queries based on the mediated schema. The unmaterialized schema is composed of
views corresponding to information sources and other mediators. The exported schema of a
mediator may be used as a query interface for other mediators – thus, mediators may be stacked.
To enable mediators to handle heterogeneous data models and schemas, wrappers at each
data source provide a uniform data model to the integration layer. Thus the mediation component
itself (exclusive of the wrappers) deals only with an uniform data model. Figure 1.1 depicts
multiple MQS with clients and multiple data sources. Wrappers are not shown but assumed where
we have sources.
Figure 1.1 Mediated query system
Client Client
Query flow & Data flow
Mediator
Source
10
Query decomposition and data integration are the two principal tasks for mediators.
During the decomposition of queries, mediators use rules to form subqueries to send to sources.
When the result objects are returned from sources, mediators perform data integration using rules
to create objects for export.
Rules can be specified in the number of languages [BRU96], including Mediator
Specification Language (MSL) [GPQ95], datalog [Ull97], and Object Query Language (OQL)
[Cat96]. In general, rules are really queries. The body of the rule selects from the source and the
head of the rule forms objects, which the mediator exports. Datalog is a prolog-like logical rule-
based pattern matching language. MSL is a variant of datalog that allows for querying
unstructured as well as structured data. In contrast, datalog can only be used to query structured
data. OQL is an object-oriented version of the Structured Query Language (SQL).
Rules are used in object fusion. Object fusion entails constructing a result object (e.g.
Object Exchange Model (OEM) [PGU95] object or XML document) from data gathered from
querying multiple sources [PAG96]. Sometimes the source data is inconsistent or has redundant
data. For inconsistencies, conflict resolution is necessary. For example, rules may provide a
priority that favors objects with the most recent date subobject. This type of conflict resolution
always favors the most recent date. Also, to eliminate retrieving the same data twice, or even to
avoid inconsistencies, the rule may say to retrieve a subobject from a secondary source only if a
primary source does not already have the same subobject.
Some examples of MQS include TSIMMIS [CGH94], HERMES [Sub], Information
Manifold [LRO96], and InfoSleuth [BBB97]. TSIMMIS provides for the rapid, declarative
generation of mediators and wrappers that integrate diverse and dynamic data from multiple
heterogeneous sources. Some of the chief contributions of the TSIMMIS project include the
Object Exchange Model (OEM) [PGU95], the Mediator Specification Language [PGU95], and
wrapper- and mediator-generators [GPQ95]. HERMES provides a general, declarative language
for creating extensible mediators. Such mediators allow for incremental integration of new
11
systems into existing mediator systems. InfoSleuth is an agent-based information retrieval and
processing system. Information Manifold uses descriptions of source content and capability to
prune efficiently the set of available sources and thus allow for scaling up to hundreds of
information sources.
12
3 Infrastructure
In this chapter, we present an infrastructure that allows for the management of trust and
its application in the mediation of data. After a brief introduction to the components in our
infrastructure, we present our data model in Section 3.1. Section 3.2 states the trust model used
throughout the rest of the thesis, including notions of trust types and the flow of trust metadata.
Section 3.3 discusses trust authorities and trust statements. Section 3.4 details the trust broker,
which includes its schema for storing trust metadata and services for manipulating trust metadata.
Section 3.5 examines the schema extensions to the mediator. Section 3.6 discusses how clients
specify trust requirements for queries and for trust their associated semantics. Finally, Section
3.7 examines the knowledge required for each component in the order to fit together this
framework.
Our infrastructure builds on a mediated query system (MQS) infrastructure [DD99]. As
discussed in the background section, the MQS infrastructure provides mediators through which
clients submit queries and the mediator integrates a query result for the client from various
sources. We will introduce trust extensions to the mediator, and introduce the notions of trust
authorities (TA) and trust broker (TB). Trust authorities validate sources according to their own
special standards and publish trust metadata on those sources. The trust authorities are known to
clients through mediators. Mediators are known only to the client. The trust broker provides the
separation of concerns for managing trust metadata. The trust broker provides a direct trust
metadata collection and dissemination service for those mediators with trust extensions.
3.1 Data model
Our research focuses on extending the MQS where the MQS is based on the XML data
model. Many sources with heterogeneous data models and schemas are integrated into the MQS.
Wrappers convert data from each source into a common XML data model. The wrappers have
different schemas that each conform to a document type definition (DTD) [BPS00]. The
13
mediator supports a DTD and exports valid XML data to the client. A DTD can be used to
specify a grammar or a schema. XML standards such as SAX, document object model (DOM)
[ABC98], and InfoSet [Cow00] allow the mediator to process the disparate data from sources
before returning the query result to the client.
Other data models we have considered for the mediator include the relational model. It is
possible to develop this theory for the relational model and for the object-oriented model. Since
the XML data model can be used to represent relational and object-oriented data, we only
concentrate on the XML data model.
We assume that the mediator and wrappers in the MQS are all required to support a DTD.
The DTD may be different for each wrapper. If the DTD varies from the wrapper to the mediator
then we assume the mediator has methods available to transform XML data from a wrapper's
schema to the internal schema of a mediator. XSL [ABC00] programs are an example of how one
can translate from the wrapper DTD to the mediator DTD. Henceforth, we assume MQS
wrappers and sources are treated as a single component, and each wrapper exports all of its
source's schema and data. We assume that different mediated systems with the same data model
and different mediated schemas will nevertheless use identical wrappers if the source happens to
be shared.
3.2 Trust model
Determining whether data from Web-based sources may be trusted or not can be a
difficult problem. We seek to provide assertions about trust for data. These assertions, in the
form of trust statements, may be processed by MQS components in order to provide a trusted
query result. To provide more flexibility in trust statements, we allow trust statements to be
classified by trust types. The notion of a type for trust statements allows different applications to
define their own meanings and intentions for trust. For example, in a mission-critical application,
the trust type may state that published trust statements have legal ramifications.
14
Trust statements form the basis of our trust model. A trust statement structure is a 4-ary
tuple <source, trust authority, trust type, qualifier>. The semantics of the trust statement is
that the trust authority asserts that the data at source satisfying qualifier are trusted with
respect to trust type.
3.2.1 Trust types
There can be many reasons why a source is trusted. These reasons are expressed via the
trust type. Trust types allow a trust statement to specify how or why a source is trusted. A trust
type is a pair <TY, TY-URI>, where TY is a single English word, and TY-URI is an Uniform
Resource Indicator (URI) [BF98] specifying where the definition of the trust type TY may be
found. By having an English word denote the trust type, the intuitive meaning of the trust type is
immediately available. Also, by having a document written in a natural language detailing the
specific meaning of the trust type, the exact definition of the trust type can be examined at
anytime. By having multiple trust types available within the MQS, more flexibility is allowed for
trust statements. This allows the most appropriate trust type to be utilized by a component
wishing to publish the trust statement.
A set of trust types and their definitions are created for each application by consensus
among trust authorities. Trust statements are based only on trust types from the set. New trust
types may be added and old trust types may be deleted if they are no longer appropriate. When a
trust type is deleted, trust statements based on that trust type also deleted. Adding and removing
from the set of trust types is done by a centralized component, the trust broker. Adding and
removing trust statements is also guarded by the trust broker. No other component may directly
add or remove trust statements.
The sample trust types below covers some data quality [NLF99] issues. Trust in these
examples is based on the quality of data. The Insured100K example below shows that a
monetary value can be placed on the trust a trust authority has for a source's data. Since the trust
15
type can be arbitrarily defined, the basis and definition of the trust type may provide for
guarantees, insurance, and other methods of demonstrating trust.
Approved The source has high reliability and excellent reputation. Reputation is based on
professional experience. Reliability is based on experimental methods.
Insured100K The trust authority will guarantee that all data from a source is 99.99 percent
reliable. The trust authority will insure for any losses due to the use of any information from this
source for up to 100K US dollars.
Audited The source's data has been regularly audited and verified for timeliness. The trust
authority regularly verifies that 95 percent of data is less than one week old.
3.2.2 Flow of trust metadata
A layout of the flow and availability of trust metadata is crucial to understanding the
behavior and properties of the system. Mediators notify the trust broker of new sources added to
its MQS. The trust broker notifies trust authorities of the new source. Trust authorities evaluate
the source according to their proprietary standards and (if passing) publishes the trust statement
for the source to the subscribed trust broker. This is done via a push model from trust authorities
to the trust broker. Mediators with trust extensions retrieve a set of trust authorities from their
trust broker and makes the trust authorities available to their clients. Application level software
allows the user to submit queries using a subset of all the available trust authorities and using a
subset of all available trust types. They also allow the user to submit additional requirements for
the trust authorities and trust types. The additional requirements allow the user to state the trust
relationships among trust authorities and trust types.

16
Figure 3.1 Infrastructure overview diagram
Solid arrows indicate flow of data. Dashed arrows indicate flow of trust metadata.
The wide arrow indicates the trust authority evaluates the source.
Client
Mediator
Trust
Broker Wrapper
Trust Authority
Source
3.3 Trust authorities
We desire a component called trust authority within the MQS with trust extensions to
validate sources. We assign to trust authorities the responsibility of assigning trust properties to
sources.
Trust authorities correspond to well-known entities such as the W3C, the Department of
Defense, or Microsoft. Each trust authority has a software agent on the Web with a unique URI
representing the trust authority. When we use the term trust authorities it means both the
software agent and the real-world entity it represents. Trust authorities are always known to the
trust broker. Mediators know of trust authorities after the mediator queries the trust broker for
available trust authorities. Clients know of trust authorities after they query the mediator for
available trust authorities.
Trust authorities make assertions about sources with respect to trust by publishing trust
statements. Trust authorities receive notification of new sources and the available trust types
from the trust broker. The source notification information is actually a pointer to the wrapper
corresponding to the source. Hence trust authorities know wrappers. When a trust authority
17
receives notification of a new source from the trust broker, the trust authority evaluates the
source. For each available trust type, the trust authority determines whether it should publish a
trust statement using that trust type. A subset of the available trust types are selected. The trust
statements are then pushed to the trust broker.
Trust authorities may not always want to certify that a whole source is trusted. Rather,
they may prefer to specify trust on a finer granularity. Elements in a document provide the ideal
granularity for association with trust statements. Elements are able to represent an entire object.
For example, the root element of a software configuration document represents all possible
configurations of a software. The Properties element and its subelements describe all properties
of the software. The Licensing element contains all information regarding licensing of the
software. We can point to arbitrary elements in a document by using XPath [CD99] expressions.
The semantics are that the trust statement is only valid for the elements designated by the XPath
expression. The trust statement also applies recursively to any subelements, but does not extend
to elements referenced in attributes, although that may be potentially extended with further
research. Hence, XML documents are modeled as trees, not as graphs.
Trust authorities may publish trust statements for arbitrary elements from the wrapper's
DTD. However, the elements specified by XPath expressions from different trust authorities may
be intersecting. Figures 3.2 A and B display two ways XPath expressions normally may specify
overlapping document regions. We do not need to consider Figure 3.2A in our scenarios because
we only consider XML document trees and not graphs. For intersections of the type shown in
Figure 3.2B, the semantics are that the trust statement TS2 for the subtree associated with “XPath
expression 2” co-exist with TS1. Neither TS1 nor TS2 override each other.
The set of trust statements is allowed to change over time. Trust authorities have
primitive operations to publish or revoke trust statements. Generally, updating trust statements
18
requires a revoke operation followed by publish operation. The operations are pushed to the trust
broker for execution.
If two trust statements point to the same element, for the same source and trust type from
a single trust authority, then the more recently published trust statement replaces the prior trust
statement. Otherwise, the later trust statement adds, not replaces, to the current set of trust
statements. Once a trust statement is added to the current set of trust statements, there is no longer
any notion of chronological ordering of trust statements. Only during the operation of adding a
trust statement to the current set of trust statements we consider that the yet-to-be-added trust
statement is “more recent” and all the trust statements in the current set of trust statements are
considered “less recent.”
Figure 3.2 Overlap of trust statements for wrapper DTD
Figure A Figure B
XPath expression 2 XPath expression 1 (TS1)
XPath expression 1
XPath expression 2 (TS2)
From Section 3.2 we have the structure of the trust statement as <source, trust
authority, trust type, qualifier>. source corresponds to the wrapper URI of an information
source. Trust types have been discussed in detail in Section 3.2.1. The qualifier is the XPath
expression selecting a set of elements from the wrapper’s DTD. It is beneficial for qualifier to
be based on the wrapper DTD and not on the source schema because sources are heterogeneous
and may have different data models. We avoid the problem of qualifiers for many different data
models by using a uniform data model exported by wrappers. Complete coverage of the DTD is
not necessary. Qualifiers may simply specify the elements of the wrappers that are trusted,
without covering the whole DTD.

19
No negative trust statements are allowed. This includes that trust types are always defined
in a positive fashion, i.e. there are no trust types that are used for defamatory or distrustful
assertions. The presence or absence of trust statements is the only representation of trust. When a
trust statement exists for a source, it means the associated trust authority trusts that source. When
a trust statement does not exist for a source, it means the trust authority does not know or does not
trust that source, but either way the source is not trusted. Conflicting statements are not possible.
Two trust statements with the same source and qualifier may be duplicates or else uses different
trust types. Since no negative trust statements are allowed no two trust statements will conflict
each other.
Our semantics are flexible because trust authorities may publish and rescind trust
statements to fit their trust assertions. Future research may add the benefit of timestamps and
expiration to explore the different semantics allowed by the chronological ordering of trust
statements.
3.4 Trust broker
While the mediator is a well understood component for the retrieval and integration of
diverse data, the trust broker will assist the mediator in selectively deciding how data from
various sources will be retrieved and relatively ordered in the presentation to the client, based on
client specifications. The trust broker is a logically centralized component, separated from the
mediator, dealing only with trust metadata.
The goal of the trust broker is to provide the mediator with the most up-to-date, accurate
and complete information service regarding the trust metadata associated with sources. The
responsibilities of the trust broker, in pursuit of those goals, are (1) to provide value-added client-
parameterized processing on trust metadata with respect to sources provided by the mediator and
(2) to obtain the most up-to-date, accurate and complete trust metadata regarding sources.
20
3.4.1 Trust broker schema
The trust broker's schema is very structured and rigid. Object-oriented and XML schemas
are unnecessary. Thus, we choose to use the simplest data model for the trust broker’s schema,
the relational model. In this section we present the structure of the trust broker schema. The
schema may be specified as a set of relations. The trust broker schema is application-independent
and is therefore constant across all applications of this framework.
The trust broker is the manager of trust types. It records new trust types, and deletes old
ones. Thus trust broker has knowledge of all the trust types in the MQS with trust extensions.
The trust types are communicated from the trust broker to mediators or trust authorities as
necessary. The structure required to hold information about trust types is AvailableTY (TY, TY-
URI).
The trust broker is the point of contact for trust authorities. New trust authorities are
added to the MQS when the trust broker provides a handle to the trust authorities that allows the
trust authorities to push trust statements to the trust broker. The list of trust authorities is
provided when requested by mediators. The structure to record information about trust
authorities is AvailableTA (TA, TA-URI).
Lastly, the trust broker stores trust statements. Trust statements are pushed to the trust
broker from trust authorities. The trust broker signs the trust statements before storing them. The
trust statements are processed and sent to mediators when requested by mediators. Add and
delete operations on trust statements are supported. Trust statements are valid until their
requested revocation by the authoring trust authority. Revocation of a trust statement results in its
deletion.
Without the prerequisite trust types and trust authorities, there are no trust statements.
Therefore certainly trust types and trust authorities are less dynamic than trust statements. In a
closed environment where the number of trust authorities is limited, there may be limited or no
dynamism in trust authority participation. That is, no new trust authorities join the MQS and no
21
old trust authorities leave the MQS. In an open Web-based environment, trust authorities may be
added or removed quite often. Certainly, the trust broker’s database must be populated with trust
types, trust authorities, and trust statements before it becomes useful.
The structure TAstatement (source, TA, TY, qualifier) is for storing trust statements,
and we discuss each of the structure’s attributes below:
Source (“source”) contains an Web URI that uniquely corresponds to an actual information
source such as a Web source. No two different sources will have the same source identifier and a
single source will have exactly one source identifier.
Trust authority (“TA”) This name identifier uniquely corresponds to an actual trust authority.
Since the name is unique for each trust authority, there is no confusion in identification. The URI
pointing to a description of this trust authority can be found in AvailableTA under the trust
authority name, when the URI is provided.
Trust type ("TY") This attribute is the unique name of the trust type. The definition of the trust
type is individually defined and well-known to each application. The trust type may be defined at
the location specified by the TY-URI in order to clarify its definition.
Qualifier (“qualifier”) This attribute contains an XPath expression that determines which
elements of the source' s associated wrapper schema is qualified for this trust statement.
3.4.2 Trust broker services
The trust broker provides services for both the mediator and trust authorities. For trust
authorities, the trust broker becomes a trust statement collection center. The trust broker handles
all trust statements from all trust authorities, adding and removing trust statements from
TAstatement as requested by trust authorities.
The trust broker provides notice of new sources in the MQS to trust authorities. When
the trust broker is informed of a new source, it pushes the alert of the new source as <source,
set of trust types> to all trust authorities the trust broker knows. set of trust types is the
set of all trust types the trust broker knows and that exist within the MQS. The trust broker in
22
return receives new trust statements from trust authorities concerning source. These trust
statements are added to TAstatement. The trust broker may receive zero or more trust
statements from zero to all trust authorities as a result of pushing <source, set of trust
types> to trust authorities. There is no upper limit to how many trust statements the trust broker
may receive for each <source, set of trust types> pushed out since trust authorities are
allowed to send more than one trust statement for the same source. Multiple trust statements
from the same trust authority for the same source may simply have variations in the trust type or
qualifier.
The requests the trust broker receives from trust authorities may be either to add a trust
statement or to remove a trust statement from the TAstatement relation. On the delete request,
wildcards are allowed in place of any of the trust statement attributes except the TA attribute.
Trust authorities may only request trust statements to be deleted if they were the original author
of the trust statement. The template for the SQL-like query the trust broker executes to add trust
statements is:
insert into TAstatement (source, TA, TY, qualifier) values (<source>,

<trust authority>, <trust type>, <qualifier>)
An example SQL-like query the trust broker executes to remove trust statements is:
delete from TAstatement where TY = <trust type> and TA = <trust

authority>
The trust broker services requests from mediators for trust metadata. Trust metadata are
trust statements processed for use by mediators. Through the trust broker's interface to mediators,
the trust broker services the following requests:
1 What are the available trust types for which there exists trust statements?
2 For a given trust type, what are the available trust authorities that have issued
trust statements using the trust type?
3 Given a particular trust type and a source, what trust statements apply to the
source, if any?
23
4 Given a trust requirement, if more than one trust statement applies to the
source, which trust statement or statements satisfy the trust requirement?
5 If two sources provide conflicting data then given the available trust
statements, which source's data should be chosen?
The trust broker does not need to add special functionality to answer question number 1
and question number 2. All the data necessary to answer questions 1 and 2 are in relation
TAstatement. The mediator must simply formulate an appropriate query against TAstatement
to answer questions 1 and 2. In Section 3.5, we will provide details of these query formulations
which will be presented as integration mapping rules from the mediator to the trust broker.
Through the trust broker's interface for mediators, the mediator may submit data
integration problems dealing with trust similar to the problems posed by questions 3, 4 and 5.
However, the answers to questions 3, 4, and 5 requires algorithmic processing by the trust broker.
Processing the results to those questions will require additional input considerations from the
client and therefore will be discussed in Chapters 4 and 5.
3.5 Mediator
The mediator has schema extensions for trust metadata. Like a normal mediator schema,
the trust schema is also not materialized. Instead, rules provide a mapping from the mediator's
trust metadata schema directly to the trust broker's schema. We express our rules in an SQL-like
language. The mediator supports two virtual relations in its trust metadata schema. The
structure, integration rules, and semantics of the virtual relations are as follows:
Stucture: MAvailableTY(TY, TY-URI)

Integration rule: select TY, TY-URI from AvailableTY
Semantics: MAvailableTY simply provides a set of all the trust types available to the
client for querying. AvailableTY is the relation from the trust broker. A
sample query against MAvailableTY from a client would be "select TY
from MAvailableTY"
24
Stucture: MAvailableTA (TA, TA-URI)

Integration rule: select TA, TA-URI from AvailableTA
Semantics: MAvailableTA provides a set of all trust authorities.
A sample query
against MAvailableTA from the client would be "select TA from
MAvailableTA”
Since the mediator only uses rules to map from its trust metadata schema to the trust
broker schema, the mediator does not materialize any trust metadata on its site (except
temporarily during query processing). Thus, there is no maintenance of trust metadata required at
the mediator. The mediator also uses the trust broker’s services to assist in data integration.
Chapter 5 will be devoted to showing how mediators use trust metadata from the trust broker.
3.6 Client
The abstract notion of a client includes any individual or organization that requests the
services of the mediator. When we specifically refer to an individual we will use the term end-
user client, otherwise we make no distinction between the individual or organization and simply
refer to any entity that uses the services of the mediator as a client.
In the fully closed trusted world, one need not be concerned about the “qualifications” of
sources providing data. But in the open, dynamic web, clients of information systems need
reassurance that answers to queries can be trusted. In our framework, clients specify a trust
requirement T plus a query to the MQS. Clients know how to query the DTD of the mediator, but
at what level of granularity are the trust requirements? Should there be one or many trust
requirements?
In some applications it may be desirable to specify different trust requirements for
different portions of the same DTD. For example, the section of a DTD that deals with software
licensing issues may have a different trust requirement then the rest of the DTD. Therefore, we
allow for associating trust requirements with particular substructures of the DTD.
Just as trust authorities specify trust statements at the granularity of elements, clients can
also specify trust requirements at the granularity of elements. XPath expressions can again be
used to point to a set of elements. Overall, clients should be able to specify multiple trust
25
requirements for the same query. The question now is: what constraints are there on the trust
requirements with the XPath expressions?
In general, the subdocuments specified by XPath expressions may be intersecting. Figure
3.3A and Figure 3.3B display two ways XPath expressions normally may specify overlapping
document regions. Similar to the case for trust statements, we do not need to consider Figure
3.3A in our scenarios because we only consider XML document trees and not graphs. For
intersections of the type shown in Figure 3.3B, the semantics are that the trust requirement T2 for
the subtree associated with “XPath expression 2” overrides T1. T1 only applies recursively down
the XML tree until another trust requirement replaces T1. This semantics effectively partitions the
XML document tree into disjoint sections according to the trust requirements
To specify that T2 overrides T1 is the most flexible semantics. Any other semantics that
involves meshing T1 with T2 would subject T2 to unnecessary constraints, e.g. T2 and T1 must be
mutually consistent. Our semantics allows the client the flexibility to specify T2 in anyway
desired, regardless of T1. We make no limiting assumptions and therefore clients may restrict or
free T2 at will.
Figure 3.3 Overlap of specifiers for mediator DTD
Figure A Figure B
XPath expression 2 XPath expression 1 (T1)
XPath expression 1
XPath expression 2 (T2)
3.7 Individual component knowledge
We present the knowledge that components have of other components. Components
include mediators, wrappers, sources, clients, trust authorities, and trust brokers. We focus on the
knowledge of properties of components. These properties include trustworthiness, schemas,

26
interfaces, etc. Although trustworthiness is more abstract than concrete, it is still useful to know
that, e.g., clients attribute a measure of trustworthiness to trust authorities. The knowledge
discussed here may be acquired during design-time or after deployment of the MQS
infrastructure.
In keeping with the original MQS framework, sources know nothing about the MQS that
they participate in. They also know nothing of trust nor the trust extensions to the MQS
framework. Wrappers know only of the source they cover. They understand how to access a
source but do not know the rest of the MQS, trust, nor of the trust extensions to the MQS.
Mediators know of trust and use trust metadata to assist in data integration. Mediators access
sources through wrappers so mediators know the wrapper's DTD. Mediators also need to access
trust metadata so they know the trust broker's schema and interface. Clients are aware of trust.
They have their own personal trust level for trust authorities. They are also aware of the data
schema and schema extensions for trust of mediators. Clients do know that their own personal
preferences for trusting trust authorities will affect the query outcome from mediators.
The trust broker knows of trust authorities since the trust broker is the trust authority's
point of contact. The trust broker has no knowledge of mediators, the trust broker simply
provides its services to any entity that requests it.
Trust authorities know of trust but are oblivious to its uses for data integration. Trust
authorities do not know of mediators, but they do know the interface (not the schema) of the trust
broker. Trust authorities also know the trustworthiness of sources based on their own standard of
trust (e.g. quality) trust authorities also know the wrapper DTD associated with sources since they
must specify qualifiers in terms of the DTD.
Figure 5 presents a diagram of the knowledge each component in our architecture has of
other components. An arrow from component x to component y indicates component x needs and
has knowledge of the component properties listed next to the arrow.

27
Figure 3.4 Properties of components known to other components
Non-italicized properties indicate knowledge acquired before deployment. Italicized properties

are discovered by the individual components after the infrastructure has been deployed.
Client
Mediator DTD
Trust Authorities
Trust types
Trust authority “trustworthiness”
Mediator
Trust broker schema

Trust broker interface
Trust Broker
Trust authority interface

Trust broker interface
DTD
Trust
DTD Wrapper
Authority
Source schema,
Data model,
& Query language
Source qualification Source
28
4 Formulation of trust in queries

Chapter 4 details the conceptual model for trust requirements and the language-based
representation of the model. In Section 4.1, we give an overview of and motivate the conceptual
model. Section 4.2 explains the model and its semantics. Section 4.3 details the query language
extensions that allow for expressing instances of the model. Finally, Section 4.4 lists pragmatic
issues that affect an implementation of the query language extensions.
4.1 Overview
Mediated query systems allow for clients to submit queries against mediators. The
execution of a query produces data that comprise the query result. We seek to allow that clients
may specify required properties of query result data. The properties are based on notions of trust
for data. Client specification of trust properties for data requires formulation of conditions on
trust. Our goal is to allow clients to express their trust requirements within queries against
mediators and have the MQS deliver data that is either all trusted or partially trusted as
determined by the trust requirement.
The trust requirement formulated by clients may apply to a portion or all of the query
result. An issue is the level of granularity clients should be allowed to specify via trust
requirements. If the mediator supports an XML schema or DTD, should trust be applied to
document substructures or only complete documents? We seek to motivate and determine an
appropriate level of granularity in the following sections.
In particular, we seek to provide a model for clients to abstract their trust requirements.
This conceptual model should allow for clients to specify at a transparent level what they trust
and whom they trust more and whom they trust less. At the same time, clients should not have to
know anything about sources. Clients with knowledge of the conceptual model may solidify an
instance of the model that represents their trust requirements with respect to applications at the
29
client side. The model instance is interpreted by MQS components in interaction with the trust
broker in order to construct appropriately trusted query results.
In order to communicate the trust requirement, clients need to use a language. Since
clients utilize query languages it would be natural to integrate an extension into a query language.
The extension would allow for declaring the trust requirements. When the mediator receives the
query plus trust requirement extensions, the mediator may simply parse, split and pass on the trust
requirements to the trust broker.
4.2 Conceptual model
Since trust types and trust authorities are factors that determine trust, trust requirements
should specify the effect on trust of each such factor. In fact, the most basic domains in our
conceptual model are the set of all trust authorities (TA) and the set of all trust types (TY).
Clients query the mediator's extended schema for trust to get the list of TA and TY. This has
been detailed in Section 3.5. These two sets provide the basis for a higher abstraction, <TA,TY>
pairs. <TA, TY> pairs, in turn, are one of the basic components in the trust preference. We give
the definition of trust preference as follows:
Definition 4.1 (Trust Preference) Given a set of trust authorities (TA), and a set of trust
types (TY), a trust preference is a partial ordering of <TA, TY>. <TA, TY> pairs selected to
participate in the partial ordering are trusted. The relation “” indicates that one <TA, TY> pair
is trusted more than the other. A sequence of <TA, TY> pairs connected by “” is a trust
expression. Trust expressions can be connected together by “;” (AND) to form a trust preference.
The partial ordering corresponds to a set of disconnected graphs (Hasse Diagrams), and
the consistency of the partial ordering must be maintained by having no cycles. Each graph in the
set of disconnected graphs corresponds to a trust expression. The nodes of the graph correspond
to <TA, TY> pairs and a directed arrow in the graph represents that one <TA, TY> pair is trusted
more than the other.

30
We introduce the usage of trust preferences by the examples here. With the exception of
the most primitive trust preference (our first example below), there are many ways of expressing
semantically equivalent trust preferences. The most succinct representation is presented first,
along with its meaning, then less succinct representations are given. In the following examples,
we assume the set of all trust authorities is {x, y, z} and the set of all trust types is {a, b}. First,
we demonstrate simple preferences and then work our way to more difficult ones.
Example 1 <x, a> This trust preference pair means that sources qualified by trust authority x for
trust type a are trusted. There is no other representation for this trust preference.
Example 2 <x, *> This pair indicates that sources qualified by trust authority x for all trust types
are trusted. An alternate expression with the same meaning is <x, a>; <x, b> That is, the “*” is
expanded to include all trust types. The “;” symbol means that <x, a> and <x, b> are trusted.
Hence sources qualified by either <x, a> or <x, b> are trusted.
Example 3 <*, a> Sources qualified by any trust authority for trust type a are trusted. An
alternate expression is <x, a>; <y, a>; <z, a>.
Example 4 <*, *> We expand <*, *> by taking the cross product of TA and TY. Hence, sources
that are qualified by any TA for any TY are trusted.
Example 5 <x, a>  <y, a> Sources qualified by either trust authorities x or y for trust type a
are trusted. Furthermore, the operator “” indicates that <x, a> is “trusted more than” <y, a>.
This implicitly imposes a partial ordering on sources qualified by either x or y for a. Effectively,
sources qualified by x for a are “trusted more than” sources qualified by y for a.
Example 6 <*, a>  <y, a> Sources qualified by any trust authority for trust type a are trusted.
Furthermore, with regards to trust type a, a partial ordering is imposed on trust authorities where
all trust authorities are trusted more than trust authority y. The ordering interpretation is more
31
easily understood when we expand the example by substituting all available trust authorities into
the wild-card "*":
<x, a>  <y, a>;

<y, a>  <y, a>;
<z, a>  <y, a>;
The implied trust preference <y, a>  <y, a> is simply a reflexive expression, which
we eliminate. The result is that all trust authorities (except for y) for trust type a are trusted more
than y for trust type a. Implicitly, all sources qualified by any trust authorities (except for y) for
trust type a are trusted more than sources qualified by y for trust type a. The binary operator “;”
corresponds to logical AND semantics for two operand expressions, and so no order is implied
for the two operands.
For the similar example <x, a>  <*, a>, the same reasoning of expanding "*" to
include all trust authorities means sources qualified by x for a are trusted more than sources
qualified by any other trust authority for trust type a. Both types of sources are trusted when
qualified either way.
Example 7 <x, *>  <y, a> Sources qualified by trust authority x for any trust type or qualified
by trust authority y for trust type a are trusted. Furthermore, sources qualified by x for any trust
type are trusted more than sources qualified by y for a.
Similarly, <x, a>  <y, *> means sources qualified by x for a are trusted more than
sources qualified by y for any trust type. Of course, both types of sources are trusted as long as
they are qualified by x for a or y for any trust type.
Example 8 <*, *>  <y, a> In this case, expanding <*, *> involves taking the cross product of
TA and TY. Expanding <*, *>  <y, a> we get:

32
<x, a>  <y, a>

<x, b>  <y, a>
<y, a>  <y, a>
<y, b>  <y, a>
<z, a>  <y, a>
<z, b>  <y, a>
After elimination <y, a>  <y, a>, the expansion of the example leaves us to five
orderings of <TA, TY> pairs, and the meaning of each one of those orderings is easy to
understand as simple variations of Example 4. Also, just the mere existence of the <*, *> pair in
the expression indicates that sources qualified by any <TA, TY> pair are trusted.
Example 9 <x, a>  <*, *> This example is very similar to the previous Example 8. However, in
this case, sources qualified by <x, a> are more trusted than sources qualified by any other <TA,
TY> pair. The <*. *> indicates that all sources qualified by any <TA, TY> pair are trusted.
The expansion of <*, *> gives us six expressions. One of the expressions is reflexive so
we eliminate it. The other five expressions indicate that <x, a> is “trusted more than” each one
of <y, a>, <z, a>, <y, b>, <z, b>, and <x, b>.
Example 10 <x, a>  <y, *>  <z, b> Sources qualified by <x, a> or <y, *> or <z, b> are
trusted. Furthermore, sources qualified by x for a are trusted more than sources qualified by y for
any trust type. Sources qualified by y for any trust type are trusted more than sources qualified by
z for b. By the property of transivity, sources qualified by <x, a> are also more trusted than
sources qualified by <z, b>.
Example 11 <x, a>  <*, b>; <z, a>. First we expand <x, a>  <*, b>:
<x, a>  <x, b>;

<x, a>  <y, b>;
<x, a>  <z, b>
Our final trust preference is then
<x, a>  <x, b>; <x, a>  <y, b>; <x, a>  <z, b>; <z , a>.
33
The first four expressions in the trust preference are simply variations of Example 5.
Each of their individual meanings is clear. The last expression is also clear from Example 1. The
meaning of the entire expanded trust preference is that sources qualified by trust authority x for
trust type a are trusted more than sources qualified by any other trust authority for trust type b.
All sources qualified by <x, a>, <x, b>, <y, b>, <z, b> or <z, a> are trusted. Although
sources qualified by z for a are trusted, they are not ordered with the other trusted sources (unless
the sources happen to be qualified by <x, a>, <x, b>, <y, b> or <z, b> also, which places the
sources in an ordering).
Example 12 <x, a>  <*, b>; <z, *> This example is almost exactly like Example 11.
Expansion of <x, a>  <*, b> is covered in Example 11. We also expand <z, *> to obtain:
<z, a>;
<z, b>
<z, b> is redundant in the trust preference since the expansion of <x, a>  <*, b>
already includes that <z, b> is trusted. (<x, a>  <z, b> is an expression expanded from <x, a>
 <*, b>). Therefore, after expanding <z, *>, we have a trust preference equal to Example 11.
Example 13 <*, *>  <*, *> This example is semantically illegal, because we can expand the
example to include <x, a>  <y, a>; <y, a>  <x, a> which is an inconsistency. As a matter
of fact, any expression (with the wild-card "*") that can expand to new expressions must be given
careful consideration, since inconsistencies may be formed inadvertently.
Trust annotations
Up until this point, each and every one of the examples describe conditions on data, and
implicitly on sources that possibly provide data. However, in many circumstances, the query
result may be incomplete if data must always satisfy the trust requirement. Thus, it may not
always be desirable to leave out data that does not satisfy the trust requirement. When we need to
include such data, we need to let the client know the data is "untrusted".
34
In consideration of data that does not satisfy the trust requirement, and to provide an
added advantage to clients, we propose to annotate result objects. Clients like to know what are
the trust properties of objects that are part of the query result. Result objects include
substructures of documents and even entire documents. Annotating query results provides clients
feedback on how the result objects are trusted, including that the objects are untrusted. We give
the definition of trust annotation below:
Definition 4.2 (Trust annotation) Given an XML element E, a trust annotation is an attribute-
value pair for the XML element. The attribute label is trust and the value is a set of <TA, TY>
pairs. The set of <TA, TY> pairs qualified the source using a qualifier which designates an
element that is E or an ancestor of the XML element E, according to the DTD of the wrapper that
the XML element E is obtained from. The <TA, TY> pairs in the set of <TA, TY> pairs are
separated by the symbol “;”. The symbol “;” indicates that the two adjacent <TA, TY> pairs are
not ordered, but both qualify the same source the element is obtained from, and both have
qualifiers which designate E or an ancestor of E.
XML constructs provide more than one opportunity to place annotations. For example,
we may place annotations as comments, processing instructions, CDATA, attributes of element
tags, or element tags themselves. We annotate every element object, and only attributes are
associated with element tags. Therefore, attributes are the best method to represent annotation
information. For example, <SoftwareVersion trust = “(x, a)”>. Here, SoftwareVersion is
the element tag, and trust is the attribute that contains the annotation.
We annotate an object using the <TA, TY> pairs that are closest to the top of the trust
preference expressed by the client. The imprecise notion of being “closest to the top of the trust
preference” is made precise by defining the following three functions and a rule. Let s be a
variable representing a source and TP be a variable representing a trust preference. Let x, y be
variables representing trust authorities and a, b be variables representing trust types. Let f, g be
35
variables representing qualifiers and E be a variable representing an element from source s. Let
DE be the DTD for the wrapper schema containing element E.
We define functions ancestor-self (E, f, DE), bucket(s) and LMax as follows:
• ancestor-self (E, f, DE) means that f is a qualifier designating an element
that is E or an ancestor of E in the XML tree that corresponds to the DTD
DE.
• bucket(s) = { <x, a, f> | trust statement <s, x, a, f> has been published}
• LMax(bucket(s),TP, E) = { <x, a> <x, a, f> ∈ bucket(s) ∧ ancestor-self (E,
f, DE) ∧ ¬∃<y, b, g> ∈ bucket(s) ∧ ancestor-self (E, g, DE) : (<y, b>  <x, a>)
∈ TP}
We give a natural language explanation of LMax to clarify its meaning. Given a bucket
for a source s, a trust preference TP and an element E, LMax returns the set of <TA, TY> pairs
that 1) are mentioned in the trust preference and 2) have trust statements issued by the TA trust
authorities for the TY trust type where 3) the source s is qualified by the qualifier associated with
the trust statement and the qualifier references element E or an ancestor of E and 4) according to
the trust preference, no other <TA, TY> pair that also satisfies the requirements 1, 2, and 3 just
listed are trusted more than the <TA, TY> pairs return by LMax.
Finally, the precise rule for annotating an XML element is:
Annotate element E with { <x, a>  ∃s, E ∈ source sk ∧ <x, a> ∈
LMax(bucket(sk, TP, DE)) }
In other words, the set of <TA, TY> pairs used for annotating the element is the very
reason for the element’s inclusion in the query result. We delimit (without order) multiple pairs in
the annotation by the symbol “;”.
End-users normally do not directly view XML documents. Rather, the application
software offers a user-friendly presentation interface. Thus, we may annotate all elements and let
the application software interpret or filter the annotations. At the system level, we are not
36
concerned with the possibility of too many annotations. For this reason, we will annotate each
and every query result.
Properties of trusted and untrusted data
In some security applications, clients may also want the option to either retrieve only
trusted data, or to retrieve trusted and untrusted data. For example, the military only wants
software configurations from trusted sources, but in contrast, a user at home may only care to get
all possible information on configurations (subject to the user’s trust preferences).
Here we discuss the properties of query results that contain only trusted data and query
results that contain a mix of trusted and untrusted data. When only trusted data is requested, the
data must satisfy certain properties. Given that the client has specified some kind of a trust
preference, the properties are as follows:
1. The data must come from only those sources that are qualified by some <TA, TY> pair in the
trust preference, and the qualifier must specifically designate the element (or its ancestor) that
the data is from.
2. If two sources provide conflicting data, then if both sources are qualified by different <TA,
TY> pairs for their respective data, and the <TA, TY> pairs are ordered in the trust
preference, then the data must be selected from the source that is qualified by the one <TA,
TY> pair that is “trusted more than” the other <TA, TY> pair. This required property of data
enables the use of trust preferences in conflict resolution. Although we do not resolve all data
conflicts, at least this may assist in some cases.
When untrusted data is also added to the query result, the trusted portion of the query
result must satisfy the same conditions that query results containing only trusted data must
satisfy. The untrusted data must satisfy the following properties:
1. The untrusted data is not available from a source qualified by any of the <TA, TY> pairs in
the trust preference or the qualifier did not designate the element containing the untrusted
data or its ancestors. Only the other sources have the data.
37
2. The untrusted data must be annotated trust = “untrusted.”
In Section 4.3 we will detail query language constructs that allows for specifying whether
only trusted data is desired or a mix of trusted and untrusted data is desired.
4.3 Query language extensions
Here we focus on embedding the model in a query language. We assume the client uses
some query language for XML data to query the mediator's schema, which is represented in form
of a DTD. The DTD is just a schema for data, and has nothing regarding trust or trust
requirements. The client gathers the prerequisite information to formulate trust requirements by
querying the mediator's schema extensions, which has already been discussed in Section 3.5. We
assume the query language supports a condition clause that specifies the pattern of the data to
select. XML-QL [DFF98] is an example of such a query language.
In the case of query languages for the XML data model, in general user queries to
mediators take the prototypical form
where <pattern>
construct <template>
pattern may contain variables which bind to attribute or text values. The attribute or text
values may belong to some objects or elements. The variables may then be used in template in
order to construct new data (perhaps according to some schema). To the prototypical query
language for XML data we add a constraint for trust.
Our modifications to the query language introduces new keywords related to specifying
trust requirements. Each combination of keywords has its own semantics and influences the
integration of data using trust metadata.
In order to include trust requirements, the minimum keyword that must be added to a
query is trust <criteria>. Thus, the following is a query that expresses some trust criteria on the
data:
38
where <pattern>
construct <template>
[trust <criteria>]
The optional keyword trust indicates that the specified trust requirements should be
respected in integrating the query result. The parameter <criteria> is a language-based
representation of multiple trust preference. The language used to specify criteria is given as a
context-free BNF grammar:
criteria :- condition
 specifier for condition
 criteria and specifier for condition
condition :- [ONLY] [OPT  PES] statement
statement :- clause  statement ; clause
clause :- pair  clause > pair
pair :- (TA,TY)
specifier :- pointer to Element
specifier is an XPath expression that points to a set of XML elements in the DTD. TA
and TY are names corresponding to trust authorities and trust types. pair simply corresponds to
a <TA, TY> pair in the conceptual model. clause allows for specifying the partial ordering of
pairs. The “>” symbol is essentially the “” operator in the conceptual model. statement
allows for multiple clauses. Each clause is separated by the ";" delimiter.
The following optional keywords provide some additional modifications for utilizing
trust metadata. Adding the optional keyword only indicates that only trusted data are to be
integrated into the query result. Thus, any data returned will be from a source qualified by <TA,
TY> pair listed in statement. Omitting only means both trusted and untrusted data will be
included in the query result.
Often, there is a need to combine data from two or more sources into a single object. For
example, XML elements from two or more sources taken to form a single XML document is a
39
merge operation. When the object is constructed from two or more sources, it is not immediately
obvious how the new, composed object should be annotated. Should we randomly choose one of
the sources, then annotate the object as if it is from that source? Or, should we perform some
computation on the respective sources' trust metadata to derive some annotation?
In addressing the issues of annotating merged objects, we want to provide some
flexibility to accommodate applications where the importance of trust may be either critical or
simply informational. Hence, we do not pre-declare rigid rules for annotating merged objects.
Instead, we allow the client to provide some input into the annotating process. We include the
option for controlling annotations into the query language.
The optional use of either of the keywords OPT or PES correspond to, respectively, an
optimistic (function OPTAnnotate) or pessimistic (function PESAnnotate) annotating of merged
objects. OPT specifies to annotate optimistically so that data merged from multiple sources is
annotated the same as the most trusted of the unmerged data. PES specifies to annotate
pessimistically so that data merged from multiple sources is annotated the same as the least
trusted of unmerged data. If neither is specified then PES is assumed by default. Below, we give
the exact semantics of the two keywords in terms of set-oriented operations.
First we must review some functions already defined in Section 4.2.
• ancestor-self (E, f, DE) means that f is a qualifier designating an element
that is E or an ancestor of E in the XML tree that corresponds to the DTD
DE.
• buckets(s) = { <x, a, f> | trust statement <s, x, a, f> has been published}
• LMax(bucket(s),TP, E) = { <x, a> <x, a, f> ∈ bucket(s) ∧ ancestor-self (E,
f, DE) ∧¬∃<y, b, g> ∈ bucket(s) ∧ ancestor-self (E, g, DE) : (<y, b>  <x, a>)
∈ TP}
40
We may now define OPT annotating (OPTAnnotate) and the PES annotating
(PESAnnotate). Let s1 and s2 be variables representing sources.
OPTAnnotate(s1,s2) = LMax{bucket(s1) ∪ bucket(s2)}
PESAnnotate(s1,s2) = LMax{bucket(s1) ∩ bucket(s2)}
If the result of an OPTAnnotate or PESAnnotate operation is the empty set, then we
annotate the merged element as being untrusted.
The following examples illustrate expressing trust requirements using the grammar. Just
like the conceptual model examples in Section 4.2, we assume the set of all trust authorities is {x,
y, z} and the set of all trust types is {a, b}. We do not show the where <pattern>
construct <template> portion of the query because it is irrelevant to the trust requirement
portion. Therefore we only show the trust requirement portion of the example queries. The trust
preference in some of these examples is selected from the conceptual model examples shown
earlier in Section 4.2.
Example 14
trust (x, a)
This example corresponds to the formulation <x, a> in the conceptual model. The trust
requirement applies to the entire result document, including all its elements. The meaning of <x,
a> is that sources qualified by trust authority x for trust type a are trusted. According to this
trust requirement, data from such sources are trusted. Since only is left out of the condition,
untrusted data are annotated trust = “untrusted” and integrated into the query result also. Any
necessary annotating of data merged from multiple sources will be done pessimistically by
default.
41
Example 15
trust only (*, a)
The meaning of (*, a) corresponds to the meaning of <*, a>, which is that sources
qualified by any trust authority for trust type a are trusted. Additionally, only data from trusted
sources will be integrated into the query result. Data from any other source will not be in the
query result and pessimistic annotating is the default.
Example 16
trust OPT (x, *)
The trust preference for this example comes from Example 2. The meaning is that
sources qualified by trust authority x for all trust types are trusted. OPT specifies to use optimistic
annotations. Omitting only means that both trusted and untrusted are in the query result.
Example 17
trust only PES (*, *)
The trust preference for this example comes from Example 4. The (*, *) indicates that
sources by any trust authority for any trust type are trusted. only indicates only trusted data are in
the query result, and PES indicates pessimistic annotating of data merged from multiple sources.
Our grammar allows for specifying trust requirements at the granularity of the XML
element. To add multiple specifiers with their associated elements, we need only list them
appropriately with “,” delimiters. Two specifiers may not reference the same element. We give
some examples below. We assume {r, s, t} represent sample specifiers (that point to elements of
the mediator DTD), the set of all trust authorities is {x, y, z}, and the set of all trust types is {a,
b}.
42
Example 18
trust (x, a)
and r for (y, b)
(x, a) is the XML document-level trust preference. (y, b) is the trust preference for only
the XML elements specified by the XPath expression r. The trust preference (x, a) does not apply
to the elements specified by r. Untrusted data are annotated trust = “untrusted” and integrated
into the query result throughout the entire document.
Example 19
trust (x, a)
and r for (y, b)
and s for (z, b) > (y, b)
(x, a) is the default document-level trust preference. (y, b) is the only trust preference
for only the substructures specified by r. (z, b) > (y, b) is the only trust preference for the
substructures specified by s. Untrusted data are annotated and integrated into the query result.
Example 20
trust (x, a)
and r for ONLY (y, b)
and s for PES (z, b) > (y, b)
Same meaning as Example 19 except for two constraints:
1 Only data from sources qualified by trust authority y for trust type b will be integrated into the
elements specified by r. Also, pessimistic annotating is implied for the elements specified by r.
2 Pessimistic annotating of data merged from multiple sources applies for the elements specified
by s. As usual, untrusted data are allowed to be integrated into the elements specified by s
because only is omitted and PES does not rule out such data.
We considered the possibility that the flexibility to associate trust preferences with more
granular portions of the DTD may be enabled at the application level. That is, if we do not add
the specifier for granularity on the DTD, then can the client still specify trust preferences for
43
portions of the DTD? The answer is yes, but in a limited way. The basic method is that the client
may submit the query multiple times, each time with a new trust preference and with a different
structural projection, but the same selection constraints. Using this alternative, the client has less
input into the mediation process, but it reduces complications at the mediator.
Trust broker algorithm
Below we present an algorithm for the trust broker, which enables the trust broker to
provide a service to mediators. The algorithm accepts as input a list of trust preferences. The
algorithm outputs a list of sets, each set containing one or more partial orderings. The nodes of
each partial ordering correspond to a set, with each set containing (source, qualifier) pairs. The
trust broker executes the algorithm in response to requests from the mediator, which provides the
list of trust preferences taken from the client's query. Each condition of the client's query is a
trust preference, and the mediator simply puts all the condition from the client's query in a list
before calling the trust broker.
Algorithm 4.2 Compute-poset-source-qualifier algorithm

INPUT: A list of trust preferences
OUTPUT: A list of sets, each set containing one or more partial orderings. The nodes of each partial ordering are
labeled by <TA, TY> pairs. Each node maps to a set of (source, qualifier) pairs.
We define some notation for use later on:
TS is a variable representing a trust statement. TS.<attribute> means the attribute value of the trust statement. For
example, TS.<TA, TY> means the value of the <TA, TY> pair of the trust statement. Let f be a variable representing a
qualifier and TP be a variable representing a trust preference.
• bucket (<ta, ty>) = { <s, f> | trust statement <s, ta, ty, f> has been published }
• MoreTrusted (<ta, ty>i, <ta, ty>j, TP) is a predicate where (<ta, ty>i, <ta, ty>j) ∈
MoreTrusted if (<ta, ty>j  <ta, ty>i) ∈ TP. MoreTrusted captures all the orderings of <TA,
TY> pairs provided by a trust preference, including pairs obtained by transivity.
First, verify each trust preference is consistent by itself. Reject input and return error message if any of the trust
preferences are inconsistent.
44
For each trust preference, the trust broker does the following:
For each expression in the trust preference, the trust broker does the following: (Recall that the operator ";"
separates expressions in a trust preference).
Step 1: Create a new graph isomorphic to the graph of the expression. The nodes of the new graph
correspond to and are labeled by a <TA, TY> pair (call it "p"), and each node maps to a set
bucket (p). The arrows of the new graph are represented by the function MoreTrusted.
Step 2: For each TS ∈ TAstatement, visit every node of the graph, and if TS.<ta, ty> = the label on
the node then add TS.<s, f> to the bucket with the label TS.<ta, ty>.
End For
End For
Step 3: For each trust preference, consolidate all the graphs corresponding to the many expressions of the trust
preference into a set of one or more unified graphs for the trust preference. This is easily done by creating a set of new
graphs, where the combined nodes of the set of new graphs are a union of all the nodes of the other graphs that
correspond to expressions. Also preserve the ordering relationships among <TA, TY> pairs from the old graphs in the
set of unified graphs. If a node <ta, ty>k in a unified graph is present in more than one graph corresponding to an
expression, then take a union of all sets of (source, qualifier) pairs in all <ta, ty>k buckets to get a combined set of
(source, qualifier) pairs for the node <ta, ty>k in the new, unified graph.
Return the list of sets, each set being a set of unified graphs, as the output of the algorithm. Each unified graph may be
represented by the bucket (<TA, TY>) function and the MoreTrusted (<TA, TY>, <TA, TY>, TP) predicate. The
domain and range of bucket and MoreTrusted determines the nodes and edges of the graph.
___________________________________
Algorithm 4.2 Compute-poset-source-qualifier algorithm
Below, we give an example for Algorithm 4.2. We demonstrate the algorithm for only a
single trust preference, because demonstrating the entire loop for many trust preferences is
redundant. Normally, the algorithm accepts a list of trust preferences and outputs a list of sets,
each set containing one or more partial orderings. For this example, we output only a single
partial ordering.
Assume that all published trust assertions are as follows:
<s2, y, a, E>
<s1, x, a, A>
<s2, x, b, B>
<s3, y, b, C>
<s1, z, b, D>
<s3, x, a, B>
<s1, z, a, A>
<s3, y, a, C>
45
For this example (for simplicity), we assume qualifiers are based on a DTD common to
all wrappers. The DTD is as follows:
<A>

<D></D>

<C>
<E></E>
</C>
</A>
Example 21
INPUT: trust preference: <x, a>  <y, b>  <z, b> ; <y, b>  <x, b>; <y, a>
OUTPUT: A set of partial orderings. The elements of each partial ordering is a set of (source,
qualifier) pairs, each set labeled by a <TA, TY> pair.
Step 1 For the expression <x, a>  <y, b>  <z, b>, we create a graph:
<x, a> → {}
<y, b> → {}
<z, b> → {}
Step 2 Process all trust statements and add to buckets as necessary
<x, a> → {(s1, A), (s3, B)}
<y, b> → {(s3, C)}
<z, b> → {(s1, D)}

46
The remaining two expressions <y, b>  <x, b> and <y, a> are processed the same
way, resulting in three graphs. Observe that the trust statement <s1, z, a, A> is ignored since <z,
a> is not part of the trust preference.
<x, a> → {(s1, A), (s3, B)} <y, b> → {(s3, C)}
<y, b> → {(s3, C)} <x, b> → {(s2, B)}
<z, b> → {(s1, D)} <y, a> → {(s2, E), (s3, c)}
Step 3 Consolidate the graphs corresponding to expressions into graphs corresponding to the trust
preference
<x, a> → {(s1, A), (s3, B)}
<y, a> → {(s2, E), (s3, C)}
<y, b> → {(s3, C)}
<z, b> → {(s1, D)} <x, b> → {(s2, B)}
Observe that specifiers are not needed as input for this example nor for Algorithm 4.2 in
general. The trust broker only provides (not uses) trust metadata about sources with respect to
trust preferences. The mediator will use specifiers to discern how to apply the trust metadata.
4.4 Pragmatic issues
In this section, we discuss the administration of trust preferences at client sites, and the
practical representation of <TA,TY> pairs. We also discuss some optimizations and practicalities
of implementing the trust broker and Algorithm 4.2.

47
The trust preference of clients do not change often. Thus, the trust preference may be
kept persistent at some location such as a Web source. An URI that points to the trust preference
may be substituted in the trust preference statement. Besides the time savings due to not having
to figure out and re-enter a potentially complex trust preference, the client organization might
want all the individual users to utilize the same trust preference, perhaps as part of an overall
security policy. Thus, the trust preference may be centrally administered and its complexities
hidden away from individual users.
Conceptually, we represented trust authorities as simply TA, trust types as TY. In an
actual implementation, the <TA, TY> domain may be represented by the name and URI of trust
authorities, plus the name and URI of trust types. Since there exists a 1-to-1 mapping from trust
authorities and trust types to URI, the URI unique to each TA and TY is a sufficient
representation to differentiate between them.
Below, we give some ideas for implementing Algorithm 4.2, as well as for enhancing the
performance and scalability of the trust broker.
bucket(<TA,TY>) can be pre-calculated by the trust broker for every <TA,TY> pair. By
sorting all trust statements according to the <TA,TY> pair, the trust statements will be implicitly
grouped by <TA,TY> pairs. If the trust broker maintains an index into the different groups, then
the trust broker can look up the trust statements for each <TA,TY> group via the index. The
<TA,TY> groups are kept updated when newly added trust statements are placed with their
respective <TA,TY> groups. MoreTrusted (<TA,TY>,<TA,TY>, TP) can easily be implemented
by a ternary relation that stores the two ordered <TA,TY> pairs and their associated trust
preference. The relation can be easily populated during evaluation of parse trees corresponding
to trust preferences.
For performance purposes, the trust broker can also cache the sets of partial orderings
associated with their respective trust preferences. Since clients often use the same trust
preferences for many different queries, caching provides performance benefits. Caching of trust
48
preferences is not easy to accomplish at the mediator, since the set of all trust statements may
change at any time, thus rendering out-of date any metadata not accessible by the trust broker.
Although the trust broker is a logically centralized component, we can enhance the
scalability of its service to mediators by replicating the trust broker. For example, a hierarchy of
replicated trust brokers may be used. In this case, trust authorities will publish new trust
statements and rescind trust broker through the root trust broker. The root trust broker of the
hierarchy will inform the other trust brokers of trust statement changes, and mediators can be
serviced more quickly by the replicated trust brokers below the root of the hierarchy.
49
5 Effect of trust metadata on query processing

The mediator and trust broker collaborate to resolve queries respecting the trust
requirements of the client. The mediator performs the mediation with the trust metadata input
from the trust broker. In this chapter, we will outline the changes necessary to mediation in order
for the mediator to be able to use the trust metadata from the trust broker, thus fulfilling the trust
requirements. Section 5.1 gives an outline of the steps involved in accepting and processing a
query, both with and without trust extensions. Section 5.2 discusses the actions added to
mediation by the trust extensions. These actions are prune, resolve, and annotate. We give an
overview of and give examples for each of these actions. Section 5.3 outlines the changes to
mediation necessary to accommodate the actions.
5.1 Overview of query processing
Below, we outline the steps involved for a mediator to accept a query, process and return
a query result. We outline the steps for handling queries both with and without trust extensions.
Figure 5.1 outlines the steps for the MQS without trust extensions. Below Figure 5.1 we provide
the narrative for each of the steps. Likewise, Figure 5.2 outlines the steps for the MQS with trust
extensions, and the narratives for each of the steps are given below Figure 5.2. The extra steps
added for handling trust are italicized.
Figure 5.1 Steps to process a query for MQS without trust extensions
The numbers shown in counter-clockwise increasing order indicate the order of events. The
events are listed below the two figures.
1 6
Indicates to submit/ receive
query / query result
2 Mediator 5
3
4
50
1. Mediator receives query q

2. Mediator formulates query plan
(Decomposition) q = {q1, q2, … qm},
where each qi is a subquery to a source (wrapper) si.
3. Mediator submits qi to source si, for i = 1 to m
4. Mediator receives query results from sources (wrappers)
5. Mediator processes query result to send back to client (data integration including conflict resolution)
6. Mediator returns query result to client.
Figure 5.2 Steps to process a query for MQS with trust extensions
The numbers shown in counter-clockwise increasing order indicates the order of events. The
events are listed below the two figures.
1 6
2.1 Indicates to submit/ receive
2 5.1 query / query result
Trust
broker Mediator 5
2.2
2.3
3 4
1. Mediator receives query q with a set of (specifier, trust preference) pairs from client.
2. Mediator formulates query plan
(Decomposition) q = {q1, q2, … qm},
where each qi is a subquery to a source (wrapper) si.
2.1 Mediator submits list of trust preferences extracted from q to trust broker
2.2 Mediator receives list of sets of graphs from trust broker
2.3 Mediator changes query plan based on input from trust broker.
3. Mediator submits qi to source si, for i = 1 to n, n ≤ m
4. Mediator receives query results from sources (wrappers)
5. Mediator processes query result to send back to client (data integration including conflict resolution)
5.1 Mediator annotates query result as necessary
6. Mediator returns query result to client.
The changes to mediation that we propose may very well result in an empty query result.
When a client uses the keyword only in its query, sources may be eliminated that contain much
pertinent, although untrusted, data. For example, if all potential sources need to be merged
(perhaps as a join) with a source offering only untrusted data, and we eliminate that one source,
then we lose the potential query result.
5.2 Changes to mediation in query processing
As part of mediation, there are actions the mediator must take that utilize trust metadata.
The four actions are: 1 prune, 2 optimize, 3 resolve, and 4 annotate. The mediator enacts each of
51
these steps at the appropriate stage of query processing. Assuming the mediator already has a
query plan, we briefly describe the four actions and when each is appropriate below:
1 Prune
In Section 4.3 we detailed the properties of trusted and untrusted data, and of the sources
that provide such data. Pruning attempts to modify queries for untrusted data into queries for only
trusted data, and if unsuccessful to eliminate the queries against untrusted sources. If all such
queries are modified or eliminated, then this effectively removes all untrusted data from the query
result.
2 Optimize
When multiple sources provide similar data, we can avoid retrieving redundant (and
possibly conflicting) data by ordering the execution of queries to sources. Sources subsequent to
the initially queried source are only queried for data not already obtained from the prior queries.
By ordering the execution sequence to sources, we help to optimize the overall query plan.
3 Resolve
If the data returned from multiple sources are conflicting, then the mediator must consult
the list of sets returned from the trust broker to attempt to select one of the data. If unsuccessful,
then the mediator must use its “original” conflict resolution rules for resolving the remaining
conflicts.
4 Annotate
Annotate any untrusted objects with trust = "untrusted". Annotate objects composed
from multiple sources according to the rules of either optimistic or pessimistic merging. Also
annotate trusted objects as specified by LMax in Section 4.3.
Outline of pruning, optimize, resolve, and annotate.
Here we outline the steps required for the actions prune, optimize, resolve, and annotate.
It is important to note that we only provide the outline, and not detail, for the various actions. For
each of prune, optimize, resolve, and annotate, we must first determine which trust preference to
52
use. The answer depends on where the mediator plans to put the retrieved data once the source
provides the data. First we must recall that (specifier, trust preference) pairs partition the
mediator’s DTD tree, so that each partition is associated with its own trust preference.
The target element of the mediator DTD that will directly contain the data is where we
“put the retrieved data.” The partition that the target element is in determines the trust preference
we use. Because each trust preference is paired with a specifier, and the specifiers partition the
nodes of the DTD tree into disjoint sections, there is exactly one specifier that will contain the
target element in the specifier' s partition.
The specifier maps to a set of graphs which are isomorphic to the set of graphs that
represent the trust preference for the specifier. Recall that the mediator submitted a list of trust
preferences to the trust broker in Step 2.1 and in return received a list of sets, each set containing
one or more graphs. The mediator simply matches up its submitted list of trust preferences to the
list of sets to obtain the set of graphs that applies for each specifier. For example, assume there
are three (specifier, trust preference) pairs {(<A>, TP1), (, TP2), (<E>, TP3)} declared for the
mediator DTD from Example 21, restated below:
<A>

<D></D>

<C>
<E></E>
</C>
</A>
Any subqueries for data intended to be placed as text for the elements <A> and <C> must
satisfy the trust preference TP1. Subqueries for data intended to be placed as text for elements
 and <D> must satisfy the trust preference TP2, and subqueries for data intended to be placed
as text for element <E> must satisfy trust preference TP3.
Observe that although the mediator DTD and wrapper DTD provide for similar data, their
structures may be very different. Since specifiers are declared against the mediator DTD and
53
qualifiers are declared against the wrapper DTD, and integration rules exist to map to and from
the mediator DTD and wrapper DTD, there is no requirement that structurally the mediator DTD
and wrapper DTD must be the same.
We allow that the mediator can always extend qualifiers to reference the element closest
to the root of the DTD without compromising the meaning of the trust statement. For example, a
qualifier that references <C> below can be extended to reference , since the element is
purely structural. However, the qualifier may not be extended to include <A>, since <A> contains
the text “abc”.
<A> abc

<C>
xyz
</C>

</A>
For the rest of the chapter, the examples demonstrating pruning, query optimization,
conflict resolution, and annotation will be based on the trust preference and set of partial
orderings described at the end of Section 4.3, Example 21. For convenience, we restate the trust
preference and set of partial orderings below:
trust preference: <x, a>  <y, b>  <z, b> ; <y, b>  <x, b>; <y, a>
set of partial orderings:
<x, a> → {(s1, A), (s3, B)}

<y, a> → {(s2, E), (s3, C)}
<y, b> → {(s3, C)}
<z, b> → {(s1, D)} <x, b> → {(s2, B)}

54
Pruning
Pruning of sources is required when the client’s trust criteria uses the keyword only so
that only trusted data may make up the query result. Pruning is part of Step 2.3. At this step the
goal is to modify q’ = {q1, q2, … ,qm} so that we modify any of the subqueries qi which returns
any untrusted data. If the modified subquery no longer provides useful data then we must
eliminate the subquery. As part of the pruning process, we will process all the subqueries q1 …qm,
one at a time, in a loop. For each qi ∈ q’, we prune as follows:
First, the mediator determines which trust preference applies for qi. Associated with this
trust preference is a set of partial orderings on sets of (source, qualifier) pairs returned from the
trust broker. Assume qi references some element Ei in the wrapper DTD Di associated with some
source si. The mediator tries to find at least one pair of (source, qualifier) that covers the element
Ei. A (source, qualifier) pair (s, f) covers an element Ei if:
1 The element Ei is from the wrapper DTD Di associated with the source
si where si=s and
2 The qualifier f references an ancestor node of or the node
corresponding to Ei in the wrapper schema’s DTD tree. This DTD tree is
constructed from the DTD Di that both the qualifier and element are
presumed to be from.
If there does not exist at least one such (source, qualifier) pair that covers Ei, then we
must modify qi. The modification reduces the elements referenced in the subquery to only
reference elements that are covered. For example, using the wrapper DTD and set of partial
orderings for Example 21, if a subquery is for <A> from s2, then we must modify the subquery to
retrieve only from s2.
Unfortunately, refinement of subqueries to transform a subquery for untrusted data into a
subquery for trusted data is not easy and generally not practical under most circumstances. Since
qualifiers qualify the components of wrappers’ schemas by the names of elements only, and not
55
by the values of data under those elements, then most attempts at refinement would result in loss
of data. The subquery result would be incomplete due to the removal of surrounding, ancestor
(untrusted) elements.
For example, if only is qualified below, then modification eliminates <A> which
makes security.jar useless. security.jar is meaningless if we ignore its
context, <A> JDK 1.2 </A>.
<A> JDK 1.2


security.jar

</A>
Under a trust preference where only trusted data is integrated into the query result,
queries for elements not covered by the necessary (source, qualifier) pairs are either modified or
removed from the query plan. Under such a trust policy, it is possible that no data is returned to
the client. Pruning is the only action we propose that may result in empty query results. If a client
receives an empty result the client may choose to remove the only restriction for comparison to a
result that includes untrusted data.
Example 23 For this example, we examine the set of partial orderings, and wrapper DTD of
Example 21. <A> from s3 is not trusted and a subquery for <A> needs to be modified to retrieve
either or <C> from s3, since there is no (source, qualifier) pair that covers <A> from s3 in
any of the buckets associated with the partial orderings. Subqueries for <C> from s2 would be
modified to retrieve only <E> also since <C> from s2 is not trusted.
Query Optimization
Query optimization using trust metadata depends on being able to order multiple sources
that provide similar data. [PAG96] already discusses how ordering the execution of queries to
sources eliminates retrieving redundant and potentially conflicting data.

56
In the case that mediators already have a partial ordering for sources that provide similar
data, then they can choose to use either trust metadata to order sources first, or use their own
method of ordering first. The advantage of the former is that the client's trust criteria is a priority
for providing input into the process. The latter no longer needs the trust metadata for ordering
sources since we assume mediators already can always order sources.
Essentially, the goal of the optimize algorithm is to determine, for every two sources si
and sj, an ordering si > sj or sj > si, where “>” represents the relation “query before”. For example,
si >sj means to query si before querying sj. If both si > sj and sj > si are evident, then si and sj are
unordered. Further research can enhance this semantics by examining the exact “significance” of
si > sj versus the “significance” of sj > si, to select the most significant one.
The optimization algorithm uses the list of sets of graphs returned from the trust broker.
We assume that the mediator has already identified a set of subqueries that will provide similar
data. For simplicity, we assume each subquery is associated with a (source, element) pair. That
is, a subquery queries a source for exactly one element (and its children). Extending the
algorithm to include subqueries for multiple elements is a simple detail. We describe the optimize
algorithm in the following paragraphs.
Outline of Query Optimization Algorithm
First, we determine which set of graphs to use based on the target element of the mediator
DTD that will contain the presented data. Second, for each (source, element) pair, we mark every
node of the set of graphs where the bucket of the node contains a (source, qualifier) pair that
covers the element of the (source, element) pair. We use the marked function, which takes a
node as an argument, to represent the set of sources that form the markings for that node. Third,
we find the set of "highest trust" nodes allowed for each (source, element) pair. Let m, n be
variables representing nodes. The set of “highest trust” nodes for a (source, element) pair p is as
follows:
57
“highest trust” nodes NHT(p)= {n|p ∈ marked (n) ∧
¬∃m, m is an ancestor of n and p ∈ marked(m) }
Fourth, eliminate any p not in its "highest trust" node. Finally, extrapolate all possible
orderings between every pair of sources from the marked graph.
An ordering si > sj exists between si and sj if:
∃ ni, nj,, si ∈ marked (ni) and sj ∈ marked (nj), ni is ancestor of nj, and
¬∃ nk, nl,, si ∈ marked (nk) and sj ∈ marked (nl)
and nl is an ancestor of nk.
Also, an ordering si > sj exists between si and sj if:
si ∈ marked (ni) and sj is not an element of marked for any node.
Example 24 For this example, we examine the set of partial orderings and wrapper DTD of
Example 21. If one subquery is for from s2 and another subquery is for from s3, then s3
> s2. We have s3 > s2 because (s3, B) is in the bucket of a node labeled <x, a> that is an ancestor of
the node labeled <x, b>, whose bucket contains (s2, B). The mediator queries s3 for <D> first, and
the objects not already retrieved from s3 will be retrieved from s2. If one subquery is for 
from s1 and another subquery is for from s3, then neither s1 > s3 nor s3 > s1 are true since s1
and s3 are not comparable. If one subquery is for <C> from s1 and another subquery is for <C>
from s3 then s1 > s3 since <x, a>  <y, b>. The fact that (s3, C) is in the bucket of <y, a> is
irrelevant since <y, a> and <x, a> are not ordered. If there is a source s4, from s2 would be
queried before from s4 is queried, since s4 is not mentioned in the set of partial orderings.
Conflict resolution
Conflict resolution is an action required by the mediator when multiple sources provide
different, conflicting, data values for the same object. Usually some common name, key, or
identifier (such as object ID) informs the mediator that multiple, disparate data are really
describing the same object. Conflict resolution is performed regardless of whether the client
specifies the keyword only or not.

58
When data conflicts occur, the mediator must attempt to resolve the conflict. There are
two approaches we may take with regards to using trust metadata for conflict resolution. We may
have the mediator try to resolve the conflict first, in which case the trust metadata is not needed
for conflict resolution since the mediator may always resolve conflicts. Using mediation rules
that include some priority order either for sources or data, such as always choose the most recent
data or always choose the source that has participated in the MQS for a greater period of time,
mediators can already always resolve data conflicts.
The alternative is to first use the trust metadata for conflict resolution, and if
unsuccessful, then to use the mediator's conflict resolution. The advantage of this alternative
method is to allow for input from the client's trust preferences to have priority over other conflict
resolution schemes. Using trust metadata to resolve data conflicts is not always successful,
however, as we shall see shortly with our approach for conflict resolution outlined below.
The difference between the algorithms for resolve and optimize is that resolve tries to
distill the subset of “most trusted” data out of the set of all available (conflicting) data, and
optimize tries to find as many orderings as possible among the set of all sources (that purport to
provide similar data).
Conflict resolution in this trust framework is a function of where the individual
conflicting data are coming from and where the finished data is going to. First, the mediator
looks up which trust preference will be used to resolve the conflict. The mediator knows the trust
preference maps to a set of graphs, each graph representing a partial ordering of sets of (source,
qualifier) pairs.
59
Having chosen the correct set of graphs for conflict resolution, the mediator commences
as follows:
Outline of Conflict Resolution Algorithm
For each individual conflicting data, the mediator visits every node (in the set of graphs)
and marks any node that contains a (source, qualifier) pair that covers the element E that the
conflicting data is from.
The marking the mediator makes on the node may simply be a representation of the data.
A node may have more than one marking, which together form a set. Next, the mediator
examines the graph to determine if there is some "most trusted" (although still conflicting) data.
The set of "most trusted" data gets integrated into the final result, if the set is of size one.
Otherwise the mediator has a choice to use its own conflict resolution methods to resolve
conflicts among the remaining data. Data that satisfies the "most trusted" criteria are as follow:
Let n and m be variables representing nodes in the set of graphs. Let d be a variable
representing some data in conflict with other data. Let marked(node) represent the set of
markings at a node. Then the set of "most trusted" data dMT is:
“most trusted” data set dMT = { d | ∃n, d ∈ marked (n) ∧ ¬∃m, m is an
ancestor of n and marked (m) ≠ ∅}
Example 25 For this example, we examine the set of partial orderings and wrapper DTD of
Example 21. If from s2 conflicts with from s3, then dMT = { from s3}, because (s3,
B) is in the bucket of a node labeled <x, a> that is an ancestor of the node labeled <x, b>, where
bucket contains (s2, B). If from s1 conflicts with from s3, then dMT = { from s1, 
from s3} since neither has ancestors where marked ≠ ∅. If <C> from s1 conflicts with <C> from
s3, then dMT = {<C> from s1} since <x, a>  <y, b>. The fact that (s3, C) is in the bucket of <y, a>
is irrelevant since <y, a> and <x, a> are not ordered.
60
Annotations
Recall that in Sections 4.2 and 4.3 we only gave the specification for annotating objects
from single sources and objects composed of data from multiple sources. Here we show how
annotations are done, using the list of sets of graphs returned from the trust broker. We provide a
classification for the three types of annotation below.
First, if only is left out of the trust criteria, then data from sources that would otherwise
be eliminated if only trusted data are integrated are annotated trust = "untrusted". This has been
discussed in Section 4.2. Second, trusted data are annotated according to the specification
described in Section 4.2. For convenience, we restate the specifications for annotations below:
• ancestor-self (E, f, DE) = f is a qualifier designating an element that is E
or an ancestor of E in the XML tree that corresponds to the DTD DE.
• buckets(s) = { <x, a, f> | trust statement <s, x, a, f> has been published}
• LMax(bucket(s),TP, E) = { <x, a> <x, a, f> ∈ bucket(s) ∧
ancestor-self (E, f, DE) ¬∃<y, b, g> ∈ bucket(s) ∧ ancestor-self (E, g, DE) :
(<y, b>  <x, a>) ∈ TP}
The precise rule for annotating an XML element is:
Annotate element E with { <x, a>  ∃sk, E ∈ source sk ∧
<x, a> ∈ LMax(bucket(sk, TP, DE)) }
Here we outline operationally how trusted data are annotated. We start with the list of
sets of graphs provided by the trust broker. Depending on where the trusted data is to be placed,
we select one specifier that provides the set of graphs we need. We assume the nodes of the
graphs have their buckets populated according to Algorithm 4.2.

61
Outline of algorithm for annotating trusted data
First, for all buckets, eliminate all (source, qualifier) pairs that cannot possibly cover the
trusted data. The only (source, qualifier) pairs left in any bucket are (source, qualifier) pairs that
may cover the trusted data. Second, eliminate all (source, qualifier) pairs (s, f)i from all <ta, ty>i
buckets, (s, f)i ∈ <ta, ty>i bucket, where there is a <ta, ty>j bucket, <ta, ty>j  <ta, ty>i and a
(source, qualifier) pair (s, f)j ∈ <ta, ty>j bucket. This second elimination step applies regardless of
the TA or TY values in the <TA, TY> pairs of bucket labels (Note we do not require to verify
that “<ta, ty>i  <ta, ty>j >” is not true since we already know the trust preference upon which the
set of graphs is based has no cycles of two or more nodes). The remainder of <TA, TY> pairs that
are labels on nodes with non-empty buckets form the set of <TA, TY> pairs used in the
annotation. In short, the most trusted <TA, TY> pairs that have buckets with (source, qualifier)
pairs where the (source, qualifier) pairs cover the data are part of the annotation.
Example 26 Using the set of partial orderings and DTD of Example 21, <E> from s2 is annotated
“<y, a>; <x, b>” since both (s2, E) and (s2, B) exist in the buckets of the trusted pairs <y, a> and
<x, b>, respectively, and <y, a> is not ordered with respect to <x, b>. <D> from s1 is annotated
“<x, a>” (<z, b> is omitted from the annotation) since <x, a>  <z, b>.
Third, objects composed of data from multiple sources are annotated according to
optimistic or pessimistic merging. This has been discussed in Section 4.3. We restate the
specification for PESAnnotate and OPTAnnotate below:
OPTAnnotate(s1,s2) = LMax{Bucket(s1) ∪ Bucket(s2)}
PESAnnotate(s1,s2) = LMax{Bucket(s1) ∩ Bucket(s2)}
Here we outline operationally how objects composed of data from multiple sources are
annotated. For each individual data, the steps are the same as before, but for clarity we restate the
steps as follows:
62
We start with the list of sets of graphs provided by the trust broker. Depending on where
the composed object is to be placed, we select one specifier that provides the set of graphs we
need. We assume the nodes of the graphs have their buckets populated according to Algorithm
4.2.
First, for all buckets, eliminate all (source, qualifier) pairs that cannot possibly cover the
trusted data. The only (source, qualifier) pairs left in any bucket are (source, qualifier) pairs that
may cover the trusted data. Second, eliminate all (source, qualifier) pairs (s, f)i from all <ta, ty>i
buckets, (s, f)i ∈ <ta, ty>i bucket, where there is a <ta, ty>j bucket, <ta, ty>j  <ta, ty>i and a
(source, qualifier) pair (s, f)j ∈ <ta, ty>j bucket.
Finally, after all individual data are processed, the annotation is the union of all
remaining <TA, TY> pairs that are labels on nodes with non-empty buckets for optimistic
annotation, or the intersection of all remainder <TA, TY> pairs that are labels on nodes with non-
empty buckets for pessimistic annotation.
Example 27 For this example, we use the set of partial orderings and wrapper DTD of Example
21. If from s1 and s3 are combined into one object, the single object would be annotated “<x,
a>” optimistically, and annotated “<x, a>” pessimistically. If from s1 and <C> from s3 are
combined into one object, then the new object is annotated “<x, a>; <y, b>” optimistically and
the annotation is trust = “untrusted” pessimistically.
5.3 Integration into mediation
In this section, we briefly discuss some of the issues that may arise in adding prune,
optimize, resolve, and annotate to a MQS with trust extensions. Our discussion highlights the
changes necessary to the stages of mediation for each of prune, optimize, resolve, and annotate.
We do not discuss the details, but instead give an overview which demonstrates the possibilities.
First, for clarity of discussion, we categorize handling of queries by the mediator into
four stages. These four stages are query decomposition, query optimization, query plan
63
execution, and data integration. Query decomposition is determining the subqueries to sources
that are necessary in order to compose a query result for the client query. Query optimization
makes the query plan more efficient, for example by eliminating redundant subqueries. Query
plan execution does the actual work of sending the queries to sources. Data integration processes
the data returned from sources. For example, data integration may include conflict resolution,
merging of data into objects, and discarding of redundant data.
In order to compose a result for a client query, mediators use rules that map from a query
on the mediated schema to subqueries on wrapper schemas. We call a rule that is formulated to
obtain a specific query result an expanded rule. We assume the mediator dynamically generates
expanded rules for each client query. These expanded rules represent a decomposition of the
client query. In some cases, the rules may be based on some predefined generic template rules.
For example, the expanded rule Rule 1 of Example 28 may be generated for the pattern based on
mediator DTD and client query of Example 28. We omit the template for generating Rule 1 as it
is not relevant for our discussion.
Example 28
Pattern based on mediator DTD:
<X></X>
<Y></Y>
Client query, where a is a variable and xyz is text that is composed of x and yz:
WHERE <X>a</X> <Y>xyz</Y>

CONSTRUCT <X>a</X>
TRUST only (ta2, ty2) > (ta1, ty1)
Rule 1:
<X>a</X> <Y>xyz</Y> :- a<D>xyz</D> @s1,

AND decomp(xyz,c,d),
AND <E>c</E> @s2,
AND <E>d</E> @s3
Rule 1 is an expanded rule that states how to query the sources in order to construct a
query result for the client query. The variable a represents text for the element <X>. The text for
64
element <X> is obtained from the text of element at source s1. The text from element at
s1 is integrated only if 1) xyz is the text of element <D>, and 2) the value of variable c is the text
of <E> at s2 and 3) the value of variable d is the text of <E> at s3. c and d are the decomposition
of xyz, for example c = x and d = yz.
Pruning primarily affects query decomposition and query optimization. The expanded
rule is modified to eliminate queries on untrusted data. We will modify Rule 1 to demonstrate
pruning. Assuming there are no trust statements with (source, qualifier) pairs that covers at
s1, the subquery a<D> xyz </D>@s1 needs to be removed. If another source exists that is
qualified for by <ta2, ty2> or <ta1, ty1>, then a new subquery may be substituted in.
Otherwise, there is no text data for <X> in the result document.
Obviously, query optimization using trust metadata affects the query optimization stage.
Assuming there is a source s4 that provides data similar to the data provided by s1, normally
(without query optimization) we modify Rule 1 to become Rule 2 as follows:
Rule 2:
<X>a</X> <Y>xyz</Y> :- a<D>xyz</D> @s1,

AND a<D>xyz</D> @s4,
AND <E>c</E> @s2,
AND <E>d</E> @s3
However, if 1) we assume that only the trust statement (s4, ta2, ty2, //B) has been
published, or 2) we assume that both trust statements (s4, ta2, ty2, //B) and (s1, ta1, ty1, //B) are
published, then since (ta2, ty2)  (ta1, ty1), we modify Rule 2 to Rule 2’ so that s1 is only queried
for data not already available from s4.
Rule 2’:
<X>a</X> <Y>xyz</Y> :- a<D>xyz</D> in temp_store(a)@s4,

AND a<D>xyz</D> not temp_store(a) @s1,
AND <E>c</E> @s2,
AND <E>d</E> @s3
65
Rule 2’ stores all a from s4 in the temporary storage structure temp_store, and then
retrieves a from s1 only if a is not already available from temp_store. Notice that we swapped the
subquery for s4 with the subquery for s1 so that s4 is queried first.
Conflict resolution occurs in the data integration stage. We give an example of using
conflict resolution to resolve data conflicts. Assume that, in addition to Rule 1, the mediator has
generated a second rule, Rule 3. Rule 3 retrieves the text for elements from source s4 as
follows:
Rule 3:
<X>a</X><Y>xyz</Y> :- a<D>xyz</D> @s4.
If we assume X is the element tag SoftwareTitle and Y is the element tag
LicensingNumber, then if Rule 1 returns “55555” for a and Rule 3 returns “88888” for a, then we
need to resolve this conflict. Assuming only the trust statement (s4, ta2, ty2, //B) has been
published, we resolve in favor of “88888” since s4 has been qualified for and s1 has not been
qualified at all. If we assume an additional trust statement (s1, ta1, ty1, //B) has been published, we
still resolve in favor of “88888” since <ta2, ty2>  <ta1, ty1>.
Annotations are done in the data integration stage. In order to annotate data returned from
sources, subqueries that request the data need to be associated with the annotations. One method
of association is to modify the expanded rules. Modifications simply entails adding an annotate
function associated with each subquery. The annotate function maps a subset of the wrapper
DTD elements to sets of <TA,TY> pairs.
The expanded rules may be modified before the query plan execution stage. Since query
results from sources contain element tags from the wrapper DTD, the mediator is able to
distinguish and annotate the returned values from sources. For example, assuming the only trust
statements published are (s2, ta3, ty3, //C) and (s3, ta2, ty2, //E), we show what the modified Rule 1
looks like, as Rule 4 below:

66
Rule 4:
<X>a</X><Y>xyz</Y>:-
annotate (“untrusted”), a<D>xyz</D>@s1
AND decomp (xyz, c, d)
AND annotate (“<ta3, ty3>”), <E>c </E>@s2
AND annotate (“<ta2, ty2>”), <E>d</E>@s3
According to Rule 4, the elements returned from s1 for element <X> of the mediator DTD
are annotated trust = “untrusted”, and xyz as the text of element <Y> will be annotated trust =
“untrusted” due to the default pessimistic annotation of objects composed of data from more than
one source.
67
6 Conclusions and future work

In this thesis, we have investigated the issues and solutions regarding trust and its
application as metadata. We have offered solutions for trust management that is built on
mediated query systems (MQS). Our contributions are concerted around the framework of trust
management to support the trust extensions to a MQS. The framework includes a conceptual
model and language for expressing trust, an architecture that includes trust authorities and the
trust broker for asserting and collecting trust statements, respectively, and methods for utilizing
trust metadata in mediation.
We have developed a conceptual model and language for clients to specify their trust
preferences. The advantage of our conceptual model and language is the clients do not need to
know anything about sources. Clients only need to know of trust authorities. The trust types
allow for expressing why trust authorities trust sources. By decoupling the data request and trust
requirements, we have simplified administration of trust preferences. Organizations can decide
on trust policies applied uniformly throughout, and the trust preferences can be kept persistent
across many queries.
As part of our architecture, we have defined trust authorities that issue trust statements
and the trust broker to transform trust statements into trust metadata. We described the algorithm
for the trust broker to create trust metadata. Mediators use trust metadata to eliminate untrusted
sources, optimize handling of queries, resolve data conflicts, and annotate query results. Clients
have the advantage of annotations in the query result to understand how trust applies to data.
As the Web grows, many new sources providing data are added. Users often have no
reference for trust in the vastness of the Web. Certainly beyond the immediate domain of
communicating software configuration information, many other applications abound where trust
is necessary for data. As no one organization or person can know all sources on the Web, having
68
other experts such as trust authorities to decide what to trust is becoming more and more
necessary.
Future work
We have only outlined the steps and specifications for pruning, query optimization,
conflict resolution, and annotating. Further research needs to be done to integrate these actions
into mediation. In order to do this, we need to describe a full framework for how these actions
are integrated.
In particular, annotations have many cases with details to consider. We have only barely
addressed generally how annotations may be done. We need to research how to annotate
elements added to results that are for structural use only. Also if we reorder elements or split up
elements for composition into new elements, how do we annotate the new elements? More
research needs to be done to address the many issues of annotations.
Research into securing the trust broker and trust statements can alleviate any fears of
tampering. Using technology such as digital signatures we can ensure the authenticity, integrity,
non-repudiation, etc. of trust statements. Perhaps we can even eliminate the need to trust the trust
broker by addressing these security issues.
Timestamps for trust statements can add a new factor for trust. The benefits of
chronologically ordered trust statements must be investigated. Perhaps trust statements can
expire, in which case the trust broker must ensure that all the trust metadata are valid. Other
applications including trust such as PKI already use certificates binding identity-public keys that
may expire. The benefits of this expiration of trust metadata is perhaps some kind of a guarantee
to the client.
For greater flexibility, one may have trust authorities defer trust to other trust authorities.
Since clients may not know of all trust authorities, they may prefer to delegate trust to their few
known trust authorities. A chain of trust delegations, such as described in [RS97] for PKI, may be
constructed. Clients could authorize the length of the chain of delegations.

69
Trust authorities may publish trust statements for sources based on the type of data from
sources, and not just based on the element in the wrapper DTD. Likewise, clients may specify
different trust preferences for different types of data. The multiple trust preferences can still be
independent of the requested data and yet are applied based on the type of data involved in
composing the query result. Overall, these extensions add semantic qualifiers (and specifiers) to
our current structural-only approach.
We hope that with the infrastructure, trust model, conceptual model, and language
recorded in this thesis, we open more possibilities for new applications. The one application that
remains our original motivation, software life cycle management, now has an important part
developed. One can envision such potential applications where mediators provide the trusted data
needed for configuring applications.
In the near future, our research will involve some implementation and optimization work
for mediation. We hope that our contribution sparks new ideas for research in both mediated
query systems and software life cycle management.

70
References
[ABC00] S. Adler, A. Berglund, J. Caruso, S. Deach, P. Grosso, E. Gutentag, A. Milowski, S.

Parnell, J. Richman, S. Zilles. Extensible Stylesheet Language (XSL) Version 1.0. W3C
Recommendation. November 21, 2000. http://www.w3.org/Style/XSL.
[ABC98] V. Apparao, S. Byrne, M. Champion, et al. Document Object Model (DOM) Level 1
Specification. October 1, 1998. http://www.w3.org/TR/REC-DOM-LEVEL-1/.
[Auth] Microsoft Authenticode.

http://msdn.microsoft.com/workshop/security/authcode/authenticode.asp
[BBB97] R.J. Bayardo Jr. et al. InfoSleuth: Agent-Based Semantic Integration of Information in
Open and Dynamic Environments. Proceedings of ACM SIGMOD International Conference on
Management of Data, ACM Press, SIGMOD Record 26(2), pp. 195-206. 1997.
[BF98] T. Berners-Lee, R. Fielding, L. Marinter. Uniform Resource Identifiers (URI): Generic

Syntax RFC2396. 1998. http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt.
[BFL96] M. Blaze, J. Feigenbaum, J. Lacy. Decentralized Trust Management. Proceedings of the

1996 IEEE Symposium on Security and Privacy, IEEE Computer Society Press, pp. 164-173.
1996.
[BFL96b] M. Blaze, J. Feigenbaum, J. Lacy. The PolicyMaker Approach to Trust Management.

DIMACS Workshop on Trust Management in Networks. 1996.
[BPS00] T. Bray, J. Paoli, C.M. Sperberg-McQueen, E. Maker. Extensible Markup Language

(XML) 1.0 (Second Edition). W3C Recommendation. October 6, 2000.
http://www.w3.org/TR/2000/REC-xml-2000/006.
[BRU96] P. Buneman, L. Raschid, J. Ullman. Mediator Languages – a Proposal for a Standard,

Report of an I3/POB working group held at the University of Maryland. 1996.
ftp://ftp.umiacs.eumd.edu/pub/ONRrept/medmodel96.ps.
[Cat96] R.G.G. Cattell, et al. The Object Database Standard ODMG 3.0. Morgan Kaufmann
Publishers. January 2, 2000.
[CGH94] S. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. D.

Ullman, J. Widom. The TSIMMIS Project: Integration of Heterogeneous Information Sources.
Proceedings of the 10th Meeting of the Information Processing Society of Japan, pp.7-18. 1994.
[CFL97] Y. Chu, J. Feigenbaum, B. LaMacchia, P. Resnick, M. Strauss. REFEREE: Trust

Management for Web Applications. Proceedings of the 6th International World Wide Web
Conference, WWW6/Computer Networks 29(8-13), pp.953-964. April 1997.
[CD99] J. Clark, S. DeRose. XML Path Language (XPath) Version 1.0. W3C Recommendation.
November 16, 1999. http:/www.w3.org/TR/xpath.html.
[Cow00] J. Cowan. XML Information Set. W3C Working Draft. July 26, 2000.
http://www.w3.org/TR/xml-infoset.
71
[CIM98] Desktop Management Task Force. Common Information Model Specification Version
2.0. March 3, 1998. http://www.dmtf.org/spec/cims.html.
[DMI98] Desktop Management Task Force. Desktop Management Interface Specifications

Version 2.0s. June 24, 1998. http://www.dmtf.org/spec/spec.html.
[DFF98] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Suciu. XML-QL: A Query

Language for XML. Submission to the W3C. August 1998. http://www.w3.org/TR/NOTE-xml-
ql/.
[DD99] R. Domenig, K. R. Dittrich. An Overview and Classification of Mediated Query

Systems. SIGMOD Record 28(3), pp. 63-72. 1999
[Ell97] C. Ellerman. Channel Definition Format. Microsoft Corporation. 1997.

http://www.w3.org/TR/NOTE-CDFsubmit.html.
[ET99] D. Evans, A. Twyman. Policy-Directed Code Safety. IEEE Symposium on Security and
Privacy, IEEE Computer Society, pp. 32-45. 1999.
[GPQ95] H. Garcia-Molina, Y. Papakonstantinou, D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman,

V. Vassalos, J. Widom. The TSIMMIS Approach to Mediation: Data Models and Languages. The
Second International Workshop on Next Generation Information Technologies and Systems
(NGITS). 1995.
[Hal99] R.S. Hall. Agent-based Software Configuration and Deployment. Ph. D dissertation,
University of Colorado. 1999. http://www.cs.colorado.edu/users/rickhall/.
[HHH97] R. S. Hall, D. Heimbigner, A. van der Hoek, A. L. Wolf. An Architecture for Post-
Development Configuration Management in a Wide-Area Network. Proceedings of the
International Conference on Distributed Configurable Systems, IEEE Computer Society, pp. 269-
278. 1997.
[HHW98] R. S. Hall, D. Heimbigner, A. L. Wolf. Evaluating Software Deployment Languages

and Schema. Proceedings of the 1998 International Conference on Software Maintenance, IEEE
Computing Society. 1998.
[HHW98b] R. S. Hall, D. Heimbigner, A. L. Wolf. Requirements for Software Deployment

Languages and Schema. Proceedings of the 1998 International Workshop on Software
Configuration Management. 1998.
[HHW99] R.S. Hall, D. Heimbigner, A.L. Wolf. Specifying the Deployable Software Description
Format in XML. SERL Technical Report CU-SERL-207-99. March 1999.
http://www.cs.colorado.edu/serl/cm/dock.html #Publications
[HPT97] A. van Hoff, H. Partovi, and T. Thai. The Open Software Description Format (OSD).
Microsoft Corp. and Marimba, Inc. 1997. http://www.w3.org/TR/NOTE-osd.html.
[Kha96] R. Khare. Using PICS Labels for Trust Management. DIMACS Workshop on Trust
Management in Networks. 1996.
72
[KR98] R. Khare, A. Rifkin. Trust Management on the World Wide Web. WWW7 / Computer
Networks 30(1-7), pp.651-653. 1998.
[LRO96] A. Y. Levy, A. Rajaraman, J. J. Ordille. The World Wide Web as a Collection of

Views: Query Processing in the Information Manifold. Workshop on Materialized Views:
Techniques and Applications (VIEW 1996), pp. 43-55. 1996.
[Mar98] Marimba, Inc. Castanet Product Family. 1998.

http://www.marimba.com/datasheets/castanet-3_0-ds.html.
[NLF99] F. Naumann, U. Leser, J. C. Freytag. Quality-driven Integration of Heterogeneous

Information Systems. Proc. of the 25th VLDB Conf., Morgan Kaufmann, pp. 447-458. 1999.
[PAG96] Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. Object Fusion in Mediator

Systems. Proceedings of the 22nd International Conference on Very Large Databases (VLDB),
Morgan Kaufmann, pp. 413-424. 1996.
[PGU95] Y. Papakonstantinou, H. Garcia-Molina, J. Ullman. MedMaker: A Mediation System

based on Declarative Specifications. Proceedings of the 12th International Conference on Data
Engineering (ICDE), IEEE Computer Society, pp. 132-141. 1996.
[PGW95] Y. Papakonstantinou, H. Garcia-Molinda, J. Ullman, J. Widom. Object Exchange

across Heterogeneous Information Sources. Proceedings of the 11th International Conference on
Data Engineering (ICDE), IEEE Computer Society, pp. 251-60. 1995.
[PKI00] PKIX Working Group. Internet X.509 Public Key Infrastructure. Work in Progress.
November 2000. http://www.ietf.org/internet-drafts/draft-ietf-pkix-roadmap-06.txt.
[RS97] M. K. Reiter, S. G. Stubblebine. Toward Acceptable Metrics of Authentication. Proc. of

the 1997 IEEE Symposium on Security and Privacy, IEEE Computer Society, pp. 10-20. 1997.
[RM96] P. Resnick, and J. Miller. PICS: Internet Access Controls without Censorship.
Communications of the ACM, vol. 39, pp.87-93. 1996.
[Sub] V.S. Subrahmanian , S. Adali, A. Brink, R. Emery, J. Lu, A. Rajput, T. J. Rogers, R. Ross,
C. Ward. HERMES: A Heterogeneous Reasoning and Mediator System.
http://www.cs.umd.edu/projects/hermes/overview/paper.
[Sun] Sun Microsystems. Java Development Kit 1.2 http://java.sun.com/j2se/1.3.
[Ull97] J. D. Ullman. Information Integration Using Logical Views. Proceedings of the 6th
International Conference on Database Theory (ICDT), pp.19-40. LNCS 1186. 1997.
[Wie92] G. Wiederhold. Mediators in the Architecture of Future Information Systems. IEEE

Computer, 25(3). March 1992.
[Win98] Microsoft Corp. An Introduction to the Microsoft Windows 2000 Public Key
Infrastructure. July 15, 1999.
http://www.microsoft.com/windows2000/library/howitworks/security/pkintro.asp.
[Zim94] P. Zimmerman, PGP User’s Guide. 1994. http://www.pgpi.org/doc/guide/.

2001 - HOWARD HOW LEUNG LOUIE, A Framework For Trust Management in Mediated Query Systems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2001 - HOWARD HOW LEUNG LOUIE, A Framework For Trust Management in Mediated Query Systems

Uploaded by

Copyright:

Available Formats

A Framework for Trust Management in

Mediated Query Systems

HOWARD HOW LEUNG LOUIE

B.S. (University of California, Davis) 1999

Submitted in partial satisfaction of the requirements for the degree of

OFFICE OF GRADUATE STUDIES

1.1 Motivation .............................................................................................................................. 1

1.4 Structure of the thesis ............................................................................................................. 5

2.1 Software life cycle management ............................................................................................ 6

2.2 Trust management .................................................................................................................. 7

2.3 Mediated query systems ......................................................................................................... 9

3.1 Data model ........................................................................................................................... 12

3.2 Trust model .......................................................................................................................... 13

3.2.1 Trust types ..................................................................................................................... 14

3.2.2 Flow of trust metadata ................................................................................................... 15

3.3 Trust authorities.................................................................................................................... 16

3.4 Trust broker .......................................................................................................................... 19

3.4.1 Trust broker schema ...................................................................................................... 20

3.4.2 Trust broker services ..................................................................................................... 21

3.5 Mediator ............................................................................................................................... 23

3.6 Client .................................................................................................................................... 24

3.7 Individual component knowledge ........................................................................................ 25

4 Formulation of trust in queries.................................................................................................... 28

4.1 Overview .............................................................................................................................. 28

4.2 Conceptual model................................................................................................................. 29

4.4 Pragmatic issues ................................................................................................................... 46

5 Effect of trust metadata on query processing.............................................................................. 49

5.1 Overview of query processing.............................................................................................. 49

5.2 Changes to mediation in query processing........................................................................... 50

5.3 Integration into mediation .................................................................................................... 62

6 Conclusions and future work ...................................................................................................... 67

Figure 1.1 Mediated query system................................................................................................... 9

Figure 3.1 Infrastructure overview diagram .................................................................................. 16

Figure 3.2 Overlap of trust statements for wrapper DTD .............................................................. 18

Figure 3.3 Overlap of specifiers for mediator DTD....................................................................... 25

Figure 3.4 Properties of components known to other components................................................ 27

software directly via the network.

organization ABC and solves Joe's upgrade problem.

result satisfies the trust constraints.

into some information system requires more research.

changes over time.

- that is the framework envisioned by our research.

four broad categories and we discuss each one in turn.

heterogeneous information sources.

Our contributions include the following:

1 An architecture to enhance MQS with trust extensions

3 Describing a central entity to collect trust assertions and transform the

assertions into trust metadata.

4 A model to conceptualize trust requirements, including a language to

express the requirements.

5 Outline of mediation extensions to utilize trust metadata for the benefit of

trust-aware data integration.

identifying their knowledge. The infrastructure is decentralized because it allows independent

The conceptual model we provide to clients for specifying trust requirements is

specifying a liberal or a conservative application of trust requirements.

1.4 Structure of the thesis

Chapter 2 provides background information on relevant research and technologies.

query systems and gives some examples of existing systems.

2.1 Software life cycle management

provides communication between docks and agents, providing notification of changes.

deploying their software.

systems for automatic installation and update.