You are on page 1of 66

Information Governance and the Data Lake

Building trust in the data you use


ISTC Conference (Bangalore)

INFORMATION GOVERNANCE AND THE DATA LAKE


BUILDING TRUST IN THE DATA YOU USE
Ross Davis
IBM Analytics Platform Practice Lead for Asia Pacific
(abstracted from an original presentation by David Stevens)

2 2015 IBM Corporation


Agenda

Why is Information Governance Important?


People define & govern
Its all about Process
Technology supports and enforces the process
Kick Starting Governance
References you can use

3 2015 IBM Corporation


WHY IS INFORMATION
GOVERNANCE IMPORTANT?

4 2015 IBM Corporation


Data is Becoming the New Natural Resource
for Creating Digital Value and Beating the Competition

Yet only 15% of organizations have the capability to leverage


data and advanced analytics across their organization.
2015 HBR Insight Economy Study
5 2015 IBM Corporation
Data Confidence Is A Top Concern for Executives

Only 11% 68% Say Low


are confident in confidence is a major
their data inhibitor to future IT projects

48% Only 26%


Spend too much time finding Of time is spent actually
data analyzing data

12% 33%
Of project time is spent Say too much time is spent
defending data defending data
6 2015 IBM Corporation
Why Information Governance?
Goal is to Create an Information Supply Chain that balances
Business Agility with the appropriate Risk Mitigation across all your data

Structured and
Unstructured Information Governance Data Consumers
Data Sources
People

Process Technology

7 2015 IBM Corporation


Considerations in Building a Data Lake

The Data Lake needs governance to ensure that information is protected and
managed efficiently.
The first step in creating the Data Lake is to establish the information integration
and governance components, the staging areas for integration, the catalog, the
common data standards.
The build out of the Data Lake then proceeds iteratively based on the following
processes:
Governance of a data lake subject area.
Managing an information source.
Managing an information view.
Enabling analytics.
Maintaining the Data Lake infrastructure.
INFORMATION BROKER STAGING AREAS OFFLINE MONITOR WORKFLOW GUARDS
BROKER CODE OPERATIONAL ARCHIVE
HUB GOVERNANCE
HUB

8 Information Integration & Governance 2015 IBM Corporation


The Data Lake subsystems

2 Catalog 4
1
Enterprise Self-
3
IT Data Service
Exchange Data Lake Repositories Access

5 Information Management and Governance Fabric


Data Lake

The key is a governed, managed Data Lake


9 2015 IBM Corporation
Information Governance defined
Information Governance is the orchestration of people, process, and technology to
enable an organization to leverage data as an enterprise asset.

Information governance practices provide a holistic approach to managing, improving


and using information to help you gain insight and build confidence in business
decisions.
The goal is to bring together data from diverse sources for diverse targets, manage its
quality and maintain it throughout its lifecycle; protect data and maintain privacy
requirements; and facilitate information-based collaboration across business and
technical teams.
These broad capabilities help organizations increase the value of data for information-
intensive projects including big data and analytics, application consolidation and
retirement, security and compliance, master data management and many more.
Successful Information Governance requires a combination of people, process and
technology. Focusing on just one or two of these aspects leaves companies open to
potential problems.

10 2015 IBM Corporation


Governance ensures proper management and use of information

Are systems
built to appropriate Are People/Systems
standards? operating properly
Information Governance

Standards Protection Compliance

Information Information Information Policy


Requirements Usage Privacy Administration

Information Lifecycle Policy


Identification Implementation
Information Information
Retention Disposal
Information Policy
Architecture Enforcement
Quality

Information Information Values Information Supply Policy


Dependencies Quality Chain Integrity Monitoring

Is data kept for Is data properly


Is data quality
appropriate protected from loss or
sufficient for use?
11 length of time? inappropriate 2015
use?IBM Corporation
What organizations expect information governance will deliver

An understanding of the information they have,


Confidence to share and reuse information,
Protection from unauthorised use of information,
Monitoring of activity around the information,
Implementation of key business processes that manage information,
Tracking the provenance of information and
Management of the growth and distribution of their information.

Self-Service Data Data Preparation Collaboration Governance Data Integration

Metadata Management *

12 2015 IBM Corporation


What is changing around Information Governance?

What is changing?
Role of the Chief Data Officer
The hopes and dreams around big data / data lake
Regulation is pushing responsibility for governance on business executives
The result
Business is taking control of information governance
Business wants more access to data
Automation of governance actions is key

13 2015 IBM Corporation


A growing demand around information

Business Teams want


Open access to more information
More powerful analysis and visualization tools

IT Teams are
Concerned about cost.
Concerned about governance and regulatory requirements.

14 2015 IBM Corporation


Information Governance Goal
Supporting the Use of Authoritative Sources

Definitions of Control Points in Sprint


authoritative sources and Design Review
and where they are Processes
used.
Make it
Make it Visible Pre-built data structures
Definitions of policies,
rules and processes Clear implementing
around the use of standards.
authoritative sources
Make it
Make it Easy
Unavoidable
to do the Data refinery services for
Enforcement in build, right thing common re-engineering
automated test cases tasks.
and configuration
Easy access to
management
compliant test data.

15 2015 IBM Corporation


Information Governance Supporting a Data Lake Architecture

Catalog Data Business Analyst


Data Scientist

Store Data
2 Catalog & catalog Object Line of Business
Interfaces Store Applications
Real Time Analytics
Ingest Data

Data Reservoir Repositories


NoSQL Doc Store Data Warehouse Deep Analytics,
Modeling,
Reporting Simple, ad hoc
Discovery
Enterprise IT
1 3 and Analysis
Systems of
Transactional Data Lake, Enterprise
Ingestion

Engagement
Systems Exploration, Content 5 Reporting
Data
Service Bus

Archive
Enterprise

System of Consumers
Record Data In of Insight
Applications

4 Information Management & Governance


Data Lake
Provide Self
Service
Govern & Manage
16 2015 IBM Corporation
Considerations for a well-managed and governed data lake
Curation of all data
5 to define meaning Business-led
9 Access to raw data to and classifications information
6
develop new governance and
production analytics. management

2
4 Catalog of data,
Effective
ownership, meaning
interchange of
and permitted usage
data and insight
with other
systems. 10 Moderated, view-
based self-service
access to data and
analytics for line of
business.

3 No direct
access to 7
1 Multiple repositories organized based on
repositories
Data-centric Active monitoring source and usage; hosted on appropriate
and management data platforms for the workload.
8 Security
17 of data 2015 IBM Corporation
Information Governance is People, Process and Technology

Successful Information Governance is implemented with a combination of:


Skilled people, correct roles and organization
Processes that create a pragmatic, targeted and agile work environment.
Technology that automates classification, enforcement validation, and correction of
data.
Information Governance

People Process Technology


(Bus. and Data Mgmt.)

Core objectives of Information Governance:


Establish guidelines for information management decision-making
Ensure information is consistently defined and well understood
Increase trust in data as a shared asset
Protect data and comply with regulatory requirements
18 2015 IBM Corporation
Why Information Governance?
Because today

Data Citizen Chief Data Officer Application Data


Engineer Analyst (CDO) Developer Scientist

Who said you


That will take 2 I can only give Did you file a
IT Cant find it could have
weeks you JSON TPS Report?
this?

Data Sales CRM Supply Chain Finance Operations

19 2015 IBM Corporation


But, tomorrow
with Information Governance and a Data Lake

Data Citizen Chief Data Officer Application Data


Engineer Analyst (CDO) Developer Scientist

Sales CRM Supply Chain Finance Operations

Data Lake / Information Governance Catalog

IT

20 2015 IBM Corporation


Think Time

Name 3 things successful Information Governance is implemented by?

21 2015 IBM Corporation


PEOPLE DEFINE AND GOVERN

22 2015 IBM Corporation


Key governance roles and responsibilities

Governance team
Information Governor leads the governance program

Information Owner allows data to be added to the Data Lake

Information Auditor ensures the Data Lake is properly governed

Information Curator describes the data in the Data Lake

Information Steward fixes problems in data

23 2015 IBM Corporation


Information Governance roles and responsibilities

Roles Responsibilities

DG Executive Strategy
Sponsor(s)
thru Charter and Budget

Steering Control
Committee(s) thru Policy and Leadership

DG Program Coordination
Office
thru Monitoring and Oversight

Business
Data Data
Operations
Process
Owners
Stewards Custodians thru Standards and Processes

Working Groups

24 2015 IBM Corporation


Roles within the Data Lake
Governor; appoint an individual to coordinate the definition of policies related to information governance and their
implementation.
Information Steward; appoint an individual to coordinate the manual activity necessary to monitor and verify that an
information collection is meeting agreed quality levels. Create user interfaces and access rights to involve this individual in
information quality processes such as the exception management process.
Quality Analyst; appoint an individual to monitor and analyze the state of the information flowing through the information
supply chain.

Integration Developer; maintaining the data movement functionality in, around and out of the data lake.

Infrastructure Operator; appoint an individual responsible for starting, maintaining, and monitoring the systems that
support the information supply chain.

Data Scientist; appoint an individual to analyze the information that the organization is collecting in order to understand
10001
01011
01101 patterns of success.
Business Analyst; appoint an individual to analyze the way people are working, understand where the processes can be
improved, and define new procedures, rules, and requirements for the IT systems.
Information Owner; appoint an individual to be the owner of the information collection who is responsible and accountable
for ensuring it is capable of supporting the organizations activities.
Auditor; appoint an individual or team of individuals to review key aspects of how the organization is actually operating and
compare it with agreed processes.
Information Worker; appoint individuals who are responsible for the manual steps in the core business activity. Create user
interfaces and access rights to provide these individuals access to the information supply chain through the information
processes.
Information Curator; appoint an individuals who are responsible maintaining the catalog of information in the data lake

25 2015 IBM Corporation


Data Lake Information Users
INFORMATION
INFORMATION
CURATOR
Who is working with information USER

in the supply chain

INFORMATION INFORMATION SOLUTION ENTERPRISE


WORKER AUDITOR ARCHITECT ARCHITECT

INFORMATION INFORMATION INFORMATION BUSINESS


GOVERNOR OWNER STEWARD ANALYST

DATA DATA QUALITY INFRASTRUCTURE INTEGRATION


SCIENTIST ANALYST OPERATOR DEVELOPER

26 2015 IBM Corporation


Information Governance operating model

27 2015 IBM Corporation


Who should the Information Governance organization report to?

Part of Business Operations Part of Business Transformation Part of Business Process Org

Financ Supply
COO e
Sales
Chain CIO CPO

Data Data
Department Department Governance
IT Department Governance

Senior
Division 1 Division 2 Data Division 1 Division 2
Executive / Governance
CDO

Part of Finance Team Distributed Data Governance Outsourced Data Management

CFO Corporate
Holding Company
Corporate
Outsourcing
Partner
Data Managed
Governance Divisio Divisio Divisio Division Division Service
n n n / /
/ / Business Business
/ for
Business Business Unit 1 Unit 1
Business Data
Planning / Unit 1 Unit 1 Unit 1 Governance
Finance GRC
Budgeting

28 2015 IBM Corporation


ITS ALL ABOUT PROCESS

29 2015 IBM Corporation


Governance is about process

Establishes program authority, scope and program management


function.
Formalizes information accountability within each information domain.
Defines and operates a common process for raising and resolving
conflicts.
Processes provide lateral communication across the organization.
Sample process categories:
organizational responsibilities,
conflict resolution,
quality,
architecture,
retention,
disposal,
metadata & classification

30 2015 IBM Corporation


Three interlocking lifecycles of information governance
Auditor
CDO

Curator

Policy
Data Scientist

Data
Metadata
Engineer
Operations
Policy
Development
Policy

Citizen
Analyst
Developer

31 2015 IBM Corporation


Metadata Management Component
Metadata is defined as data about data, but it can also be explained as another layer of
information created to help people use raw data as information.

Business Metadata
Definitions, Terminology, Glossaries, Information
Governance Rules, Algorithms (using business Governance
language) Business Rule
Policy
Audience: Business users
Business
Category Business Term
Lineage
Technical Metadata
Source and Target systems; their Tables and Fields, Data Quality
Data Element
structures and attributes; Derivations and Rule
Dependencies
Audience: Specific Tool Users BI, ETL, Profiling, External Integration
Data Modeling Design Lineage
Assets Logic

Operational Metadata Execution Runtime


Statistics Lineage
Information about application runs: their frequency,
record counts, component by component analysis,
data flow and other statistics
Audience: Operations, Management

Operational Metadata is generated by the Integration Logic (ETL tool) and typically collected automatically.
It is a key component in documenting the current status of the Data Lake.
32 2015 IBM Corporation
Differing user perspectives of data CDO Auditor

Provision Sand
Boxes.

Sand
Box Search for, locate and download Define governance policies,
data and related artifacts. rules and classifications.
Monitor compliance.

Information Governance
Data Citizen
Scientist
Catalog
Analyst
Curator
Add additional insight into
data sources through
automated analysis.
Developer

Data
Data
Stores
Data
Stores
Stores
Data Develop data management Stewardship of Metadata about
Engineer models and implementations. Stores, Models, Definitions
33 2015 IBM Corporation
Information Governance Best Practices

Maintaining executive buy-in for an information governance program is critical:


Understanding the motivations of executive sponsors and other key members of your
management team is a good place to begin:
What is important to them?
What are their favorite topics or projects?
What has worked in the past?

Choose a small number of people and establish the organization for the information
governance program

Identify goals for improvements in information access and consistency that can be
measured.

Tie the identified pain points and goals directly to the highest-priority content in their
systems
Remember - not all data holds the same value or level of risk to the enterprise

34 2015 IBM Corporation


Information Governance Best Practices (continued)

Start small - measuring the success of an information governance program may benefit
from choosing a single process to measure e.g. email address

Choose a controlled project based on an acknowledged business problem that


information problems are causing - avoid scoping problem too widely

Think Agile make a concerted effort to the avoid the term governance as it can
denote lockdown and control, serving as a turn-off for business users
Business Ready Data

Publicize early wins to help deliver the message and prove out a governance programs
benefits

Strive to hold people accountable

Challenge of Metrics: While ROI/TCO models are needed balance effort with
pragmatic baseline efforts in order to deliver results quickly

35 2015 IBM Corporation


TECHNOLOGY SUPPORTS AND
ENFORCES THE PROCESS

36 2015 IBM Corporation


Information Governance Catalog (IGC)

What does IGC provide?


IBM Infosphere Information Governance Catalog (IGC) is used to hold the set
of business terms to be used for an organizations Data Lake.
Data Lake Catalog is the Repository in which the IGC business terms are
physically stored. This is a run-time repository with the ability for business
users and physical data bases, reports and data flows to be mapped to these
business terms.
It provides a tool with which to establish and manage governance principles,
policies and rules
Its the Holy Grail of the Data Lake . . . .

37 2015 IBM Corporation


Information Governance Catalog
Understand and Collaborate

Most comprehensive governance solution for


end-to-end visibility to your metadata

Understand
Leverage a comprehensive & rich catalog of
information assets
Provides business context for IT assets
Dramatically increases business confidence in information
assets
Govern
Collaboratively establish a governed business vocabulary
Create stewards, assign responsibilities
Rely on end-to-end lineage across information assets
Links business terms & information governance rules
to information assets & operational rules
Open interface to create, import and manage extensions
for Data Sources and Data Flows

38 2015 IBM Corporation


Structuring governance definitions data, data repositories,
types of processing,
and the people consuming the data

Classification
Policy Physical Data
Actioned by
Description

Governs Classified by
Governance
Classification
Policy Rule

Describes
Implemented by
Engine
Governance Rule Executes
Governance Rule
Implementations
Rule Information
Implementations Asset
Implementation Deployed to,
Executed by
IT Landscape
The data lakes catalog contains the definitions of the policies, rules,
data classifications, data repositories, data types, and processing
descriptions.
39 2015 IBM Corporation
Governance rules
Defined for each classification for each situation

Sensitive information Personal


Personal information
information
masked here masked here
masked here

40 2015 IBM Corporation


Governance Classification schemes

Classification is at the heart of information governance. It characterizes the type,


value and cost of information, or the mechanism that manage it. The design of the
classification schemes is key to controlling the cost and effectiveness of the
information governance program.
Business Classifications
Business classifications characterize information from a business perspective. This
captures its value, how it is used, and the impact to the business if it is misused.
Role Classifications
Role classifications characterize the relationship that an individual has to a particular kind
of data.
Resource Classifications
Resource classifications characterize the capability of the IT infrastructure that supports the
management of information. A resource's capability is partly due to its innate functions and
partly controlled by the way it has been configured.
Activity Classifications
Activity classifications help to characterize procedures, actions and automated processes.
Semantic Mapping
Semantic mapping identifies the meaning of an information element. The classification
scheme is a glossary of concepts from relevant subject areas. These glossaries are
industry specific and they are shipped with our industry models. The semantic
classifications are defined at two levels:
Subject area mapping
Business term mapping

41 2015 IBM Corporation


Security is part of governance

The data lakes security is assured with this combination of business processes and
technical mechanisms.

Data Well defined access points


Data access Security
curation Access Security
approval by analytics
for security Data centric security access monitoring audit and
subject area and
and and logging review
owner investigation
protection Isolated repositories

ADMINISTRATION AT RUNTIME REVIEW

42 2015 IBM Corporation


The Information Governance Ecosystem

Policy and Information Information Exception Reporting and


Standards Curation Refineries Management Audit

Information Governance is built on


metadata management,
curation of information resources,
policy aware runtimes such as MDM, Information Server, Guardium and Optim, Workflow
and Reporting.

43 2015 IBM Corporation


Think Time

Besides people, process and technology, what else is important part of governance ?

44 2015 IBM Corporation


KICK STARTING GOVERNANCE

45 2015 IBM Corporation


Kick starting the Governance Program with predefined content

Information Governance Catalog Content Package - a common,


consumable business vocabulary and data governance hub for clear
communication throughout the enterprise to accelerate data governance
implementation and deployment.
Packaged Policies and Rules provide a starting framework for governance
requirements
Common approach facilitates discussion and streamlines information governance
development around governance requirements
InfoSphere Metadata Asset Manager (IMAM) a web-based tool which
allows the import and management of metadata
from design tools, business intelligence tools, databases and files into the metadata
repository of InfoSphere Information Server.
Data Lake Industry Models are being developed to provide the industry
specific common Business Language to be used by all users to describe the
various components of the Data Lake or underpin the Data Lake

46 2015 IBM Corporation


Governance definitions for - example

http://www.ibm.com/developerworks/data/library/techarticle/dm-
1412infosphere-governance/index.html

47 2015 IBM Corporation


Legend
Leveraging IBM Data Lake Industry Models Use of Business Vocabulary to understand
Business Meaning by Users
The Business Vocabulary Terms in IGC can be used to enforce
Mappings to inform common Business
common business meaning through out the Data lake landscape Meaning using the Business Vocabulary in IGC
The output of the various Logical Models can be used to define the Generation of Technical Structure using
technical structure of assets in the lake that need to be created. the ER Data Models in ER tool (e.g. IDA)
Where a predefined schema is required (e.g. Schema at Write)

Business Vocabulary

Logical Model Logical Model


Atomic Dimensional
4 5
1 2 3 Physical Model Physical Physical Model
Hadoop Model RDBMS Dimensional

6
7
Systems of 10
Service 8
Engagement
Interfaces 9
Information
Service Bus

System of
Enterprise

Service Calls
Record Data Scientists
Applications OPERATIONAL Historical SAND
Publishing Data BOXES
HISTORY
Data Out Feeds
New Sources Harvested
Data INFORMATION WAREHOUSE
Third Party Feeds

Third Party APIs Data REPORTING Business Users


Ingestion DEEP DATA DATA
Internal Sources
Data In MARTS

Enterprise IT
Enterprise IT Interaction Repositories

Information MONITOR WORKFLOW


CATALOG Descriptive EXECUTION
Integration & Data ENGINES
48 Governance 2015 IBM Corporation
InfoSphere Metadata Asset Manager

InfoSphere Metadata Asset Manager (IMAM) is a web-based tool which allows the import and
management of metadata from design tools, business intelligence tools, databases and files into the
metadata repository of InfoSphere Information Server. Assets may be previewed prior to sharing them with
the metadata repository, allowing for the comparison and understanding of the changes and impact.

The following Asset types may be imported:


Implemented Data Resources
Physical data model assets
Logical data model assets
Business intelligence Reports

49 2015 IBM Corporation


InfoSphere Information Integration & Governance Platform

Gartner definition:
What It Is: Software products to Information Integration and Governance
enable the construction and Information Data Master Data Data Privacy &
implementation of data access Integration Quality Management Lifecycle
Management
Security

and delivery infrastructure for a


(Information (Information (Master Data (Optim (Guardium
variety of usage scenarios Server for Server for Mgmt& Archiving, DAM,
Data Data Quality) Reference Data Redaction,
Integration + Mgmt) Optim TDM,
Data Optim Privacy) Encyrption)
How It Works: Core functionality Replication)

includes:
Information Governance
Connectivity to data
Metadata and Policy Management
sources/targets
(Information Governance Catalog)
Data transformation and
delivery
On-Premises, As-A-Service (IBM DataWorks) and
Data quality Cloud (SoftLayer)
Metadata support

50 2015 IBM Corporation


The Data Lake Versatile IBM Technology

Business Analyst
InfoSphere Big Insights Data Scientist Info Governance Catalog
Pure Data for Analytics Industry Data Models,
InfoSphere MDM Cloudant, Dataclick
DB2 BLU Catalogue & catalog Object Line of Business
Interfaces Store Applications
Enterprise Content Manager
InforSphere Streams, ODM Real Time Analytics

Data Reservoir Repositories


NoSQL Doc Store Data Warehouse Deep Analytics,
Modeling,
Reporting Simple, ad hoc
Discovery
and Analysis
Enterprise IT
Systems of
Ingestion

Engagement
Transactional Data Lake, Enterprise
Content Reporting
Data

Systems Exploration,
Service Bus

Archive
Enterprise

System of Consumers
Record Data In of Insight
Applications
Cognos
Information Server Information Management & Governance SPSS
DataStage, Data Data Lake Watson Analytics
Replicate,Federation
Guardium/Optim
IBM products support the entire Data Lake model
51
Cloud, On Premises or Hybrid
2015 IBM Corporation
IBM roadmap for Information Governance programs

52 2015 IBM Corporation


Think Time

What do we have to help kick-start governance?

53 2015 IBM Corporation


z
zz
z
z
z
z

Questions?

54 2015 IBM Corporation


55 2015 IBM Corporation
56 2015 IBM Corporation
ADDITIONAL REFERENCE
MATERIAL

57 2015 IBM Corporation


IBM Connections

https://w3-connections.ibm.com/wikis/home?lang=en-
us#!/wiki/Wc244bb451a83_40fc_85f0_edf7421445af/page/Information%20Governance

58 2015 IBM Corporation


IBM Information Lifecycle Governance

IBMs ILG solution portfolio enables


customers to deal with the growing
volume
Effectively retain and archive
information
Efficiently meet eDiscovery
obligations
Manage data more efficiently
Improve data quality
Defensibly dispose of information
to lower both cost and risk.

59 2015 IBM Corporation


Performing Information Governance

Experienced data management


professionals can use it to confirm
the activities, tasks, and best
practices for performing information
governance. College students can use
this book as a textbook in an upper-
level information management
college curriculum. The intended
audience includes the following:
Chief information officers
Chief data officers
Business and technical data
stewards
Data quality analysts and auditors
Metadata management
professionals
Master data management
professionals
60 2015 IBM Corporation
Chief Data Officer

public.dhe.ibm.com/common/ssi/ecm/ci/en/ci
w03090usen/CIW03090USEN.PDF

61 2015 IBM Corporation


Governing and managing Big Data for Analytics and Decision Makers

An introduction to the Data Lake solution

http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html
?Open
62 2015 IBM Corporation
Designing and Operating a Data Lake / Reservoir

Description of the behavior and


processes that make up a data
lake / reservoir
Blog
5 things to know about a data lake /
reservoir
https://www.ibm.com/developerwo
rks/community/blogs/5things/entry
/5_things_to_know_about_data_res
ervoir?lang=en
Redbook
http://www.redbooks.ibm.com/Red
books.nsf/RedpieceAbstracts/sg248
274.html?Open

63 2015 IBM Corporation


Redbook Information Governance Principles and Practices for a
Big Data Landscape

http://www.redbooks.ibm.com/Redbooks.ns
f/RedbookAbstracts/sg248165.html

64 2015 IBM Corporation


Ethics for Big Data and Analytics
Context for what purpose was the data originally surrendered? For
what purpose is the data now being used? How far removed from the
original context is its new use?

Consent & Choice What are the choices given to an affected


party? Do they know they are making a choice? Do they really
understand what they are agreeing to? Do they really have an
opportunity to decline? What alternatives are offered?

Reasonable is the depth and breadth of the data used and the
relationships derived reasonable for the application it is used for?

Substantiated Are the sources of data used appropriate,


authoritative, complete and timely for the application?

Owned Who owns the resulting insight? What are their


responsibilities towards it in terms of its protection and the obligation
to act?

Fair How equitable are the results of the application to all


parties? Is everyone properly compensated? http://www.ibmbigdatahub.
Considered What are the consequences of the data collection and com/whitepaper/ethics-big-
analysis? data-and-analytics
Access What access to data is given to the data subject?

Accountable How are mistakes and unintended consequences https://w3-


detected and repaired? Can the interested parties check the results connections.ibm.com/wikis/home?lang=en-
that affect them? us#!/wiki/Wc244bb451a83_40fc_85f0_edf74214
45af/page/Ethics%20for%20Big%20Data%20an
d%20Analytics
65 2015 IBM Corporation
Common Information Models for an Open, Analytical and Agile
World
To drive maximum value from complex IT
projects, IT professionals need a deep
understanding of the information their
projects will use. Too often, however, IT
treats information as an afterthought: the
poor stepchild behind applications and
infrastructure. That needs to change. This
book will help you change it.
Using a complete case study, the authors
explain what CIMs are, how to build them,
and how to maintain them. You learn how
to clarify the structure, meaning, and intent
of any information you may exchange, and
then use your CIM to improve integration,
collaboration, and agility.
In todays mobile, cloud, and analytics
environments, your information is more
valuable than ever. To build systems that
make the most of it, start right here.

66 2015 IBM Corporation

You might also like