Professional Documents
Culture Documents
12% 33%
Of project time is spent Say too much time is spent
defending data defending data
6 2015 IBM Corporation
Why Information Governance?
Goal is to Create an Information Supply Chain that balances
Business Agility with the appropriate Risk Mitigation across all your data
Structured and
Unstructured Information Governance Data Consumers
Data Sources
People
Process Technology
The Data Lake needs governance to ensure that information is protected and
managed efficiently.
The first step in creating the Data Lake is to establish the information integration
and governance components, the staging areas for integration, the catalog, the
common data standards.
The build out of the Data Lake then proceeds iteratively based on the following
processes:
Governance of a data lake subject area.
Managing an information source.
Managing an information view.
Enabling analytics.
Maintaining the Data Lake infrastructure.
INFORMATION BROKER STAGING AREAS OFFLINE MONITOR WORKFLOW GUARDS
BROKER CODE OPERATIONAL ARCHIVE
HUB GOVERNANCE
HUB
2 Catalog 4
1
Enterprise Self-
3
IT Data Service
Exchange Data Lake Repositories Access
Are systems
built to appropriate Are People/Systems
standards? operating properly
Information Governance
Metadata Management *
What is changing?
Role of the Chief Data Officer
The hopes and dreams around big data / data lake
Regulation is pushing responsibility for governance on business executives
The result
Business is taking control of information governance
Business wants more access to data
Automation of governance actions is key
IT Teams are
Concerned about cost.
Concerned about governance and regulatory requirements.
Store Data
2 Catalog & catalog Object Line of Business
Interfaces Store Applications
Real Time Analytics
Ingest Data
Engagement
Systems Exploration, Content 5 Reporting
Data
Service Bus
Archive
Enterprise
System of Consumers
Record Data In of Insight
Applications
2
4 Catalog of data,
Effective
ownership, meaning
interchange of
and permitted usage
data and insight
with other
systems. 10 Moderated, view-
based self-service
access to data and
analytics for line of
business.
3 No direct
access to 7
1 Multiple repositories organized based on
repositories
Data-centric Active monitoring source and usage; hosted on appropriate
and management data platforms for the workload.
8 Security
17 of data 2015 IBM Corporation
Information Governance is People, Process and Technology
IT
Governance team
Information Governor leads the governance program
Roles Responsibilities
DG Executive Strategy
Sponsor(s)
thru Charter and Budget
Steering Control
Committee(s) thru Policy and Leadership
DG Program Coordination
Office
thru Monitoring and Oversight
Business
Data Data
Operations
Process
Owners
Stewards Custodians thru Standards and Processes
Working Groups
Integration Developer; maintaining the data movement functionality in, around and out of the data lake.
Infrastructure Operator; appoint an individual responsible for starting, maintaining, and monitoring the systems that
support the information supply chain.
Data Scientist; appoint an individual to analyze the information that the organization is collecting in order to understand
10001
01011
01101 patterns of success.
Business Analyst; appoint an individual to analyze the way people are working, understand where the processes can be
improved, and define new procedures, rules, and requirements for the IT systems.
Information Owner; appoint an individual to be the owner of the information collection who is responsible and accountable
for ensuring it is capable of supporting the organizations activities.
Auditor; appoint an individual or team of individuals to review key aspects of how the organization is actually operating and
compare it with agreed processes.
Information Worker; appoint individuals who are responsible for the manual steps in the core business activity. Create user
interfaces and access rights to provide these individuals access to the information supply chain through the information
processes.
Information Curator; appoint an individuals who are responsible maintaining the catalog of information in the data lake
Part of Business Operations Part of Business Transformation Part of Business Process Org
Financ Supply
COO e
Sales
Chain CIO CPO
Data Data
Department Department Governance
IT Department Governance
Senior
Division 1 Division 2 Data Division 1 Division 2
Executive / Governance
CDO
CFO Corporate
Holding Company
Corporate
Outsourcing
Partner
Data Managed
Governance Divisio Divisio Divisio Division Division Service
n n n / /
/ / Business Business
/ for
Business Business Unit 1 Unit 1
Business Data
Planning / Unit 1 Unit 1 Unit 1 Governance
Finance GRC
Budgeting
Curator
Policy
Data Scientist
Data
Metadata
Engineer
Operations
Policy
Development
Policy
Citizen
Analyst
Developer
Business Metadata
Definitions, Terminology, Glossaries, Information
Governance Rules, Algorithms (using business Governance
language) Business Rule
Policy
Audience: Business users
Business
Category Business Term
Lineage
Technical Metadata
Source and Target systems; their Tables and Fields, Data Quality
Data Element
structures and attributes; Derivations and Rule
Dependencies
Audience: Specific Tool Users BI, ETL, Profiling, External Integration
Data Modeling Design Lineage
Assets Logic
Operational Metadata is generated by the Integration Logic (ETL tool) and typically collected automatically.
It is a key component in documenting the current status of the Data Lake.
32 2015 IBM Corporation
Differing user perspectives of data CDO Auditor
Provision Sand
Boxes.
Sand
Box Search for, locate and download Define governance policies,
data and related artifacts. rules and classifications.
Monitor compliance.
Information Governance
Data Citizen
Scientist
Catalog
Analyst
Curator
Add additional insight into
data sources through
automated analysis.
Developer
Data
Data
Stores
Data
Stores
Stores
Data Develop data management Stewardship of Metadata about
Engineer models and implementations. Stores, Models, Definitions
33 2015 IBM Corporation
Information Governance Best Practices
Choose a small number of people and establish the organization for the information
governance program
Identify goals for improvements in information access and consistency that can be
measured.
Tie the identified pain points and goals directly to the highest-priority content in their
systems
Remember - not all data holds the same value or level of risk to the enterprise
Start small - measuring the success of an information governance program may benefit
from choosing a single process to measure e.g. email address
Think Agile make a concerted effort to the avoid the term governance as it can
denote lockdown and control, serving as a turn-off for business users
Business Ready Data
Publicize early wins to help deliver the message and prove out a governance programs
benefits
Challenge of Metrics: While ROI/TCO models are needed balance effort with
pragmatic baseline efforts in order to deliver results quickly
Understand
Leverage a comprehensive & rich catalog of
information assets
Provides business context for IT assets
Dramatically increases business confidence in information
assets
Govern
Collaboratively establish a governed business vocabulary
Create stewards, assign responsibilities
Rely on end-to-end lineage across information assets
Links business terms & information governance rules
to information assets & operational rules
Open interface to create, import and manage extensions
for Data Sources and Data Flows
Classification
Policy Physical Data
Actioned by
Description
Governs Classified by
Governance
Classification
Policy Rule
Describes
Implemented by
Engine
Governance Rule Executes
Governance Rule
Implementations
Rule Information
Implementations Asset
Implementation Deployed to,
Executed by
IT Landscape
The data lakes catalog contains the definitions of the policies, rules,
data classifications, data repositories, data types, and processing
descriptions.
39 2015 IBM Corporation
Governance rules
Defined for each classification for each situation
The data lakes security is assured with this combination of business processes and
technical mechanisms.
Besides people, process and technology, what else is important part of governance ?
http://www.ibm.com/developerworks/data/library/techarticle/dm-
1412infosphere-governance/index.html
Business Vocabulary
6
7
Systems of 10
Service 8
Engagement
Interfaces 9
Information
Service Bus
System of
Enterprise
Service Calls
Record Data Scientists
Applications OPERATIONAL Historical SAND
Publishing Data BOXES
HISTORY
Data Out Feeds
New Sources Harvested
Data INFORMATION WAREHOUSE
Third Party Feeds
Enterprise IT
Enterprise IT Interaction Repositories
InfoSphere Metadata Asset Manager (IMAM) is a web-based tool which allows the import and
management of metadata from design tools, business intelligence tools, databases and files into the
metadata repository of InfoSphere Information Server. Assets may be previewed prior to sharing them with
the metadata repository, allowing for the comparison and understanding of the changes and impact.
Gartner definition:
What It Is: Software products to Information Integration and Governance
enable the construction and Information Data Master Data Data Privacy &
implementation of data access Integration Quality Management Lifecycle
Management
Security
includes:
Information Governance
Connectivity to data
Metadata and Policy Management
sources/targets
(Information Governance Catalog)
Data transformation and
delivery
On-Premises, As-A-Service (IBM DataWorks) and
Data quality Cloud (SoftLayer)
Metadata support
Business Analyst
InfoSphere Big Insights Data Scientist Info Governance Catalog
Pure Data for Analytics Industry Data Models,
InfoSphere MDM Cloudant, Dataclick
DB2 BLU Catalogue & catalog Object Line of Business
Interfaces Store Applications
Enterprise Content Manager
InforSphere Streams, ODM Real Time Analytics
Engagement
Transactional Data Lake, Enterprise
Content Reporting
Data
Systems Exploration,
Service Bus
Archive
Enterprise
System of Consumers
Record Data In of Insight
Applications
Cognos
Information Server Information Management & Governance SPSS
DataStage, Data Data Lake Watson Analytics
Replicate,Federation
Guardium/Optim
IBM products support the entire Data Lake model
51
Cloud, On Premises or Hybrid
2015 IBM Corporation
IBM roadmap for Information Governance programs
Questions?
https://w3-connections.ibm.com/wikis/home?lang=en-
us#!/wiki/Wc244bb451a83_40fc_85f0_edf7421445af/page/Information%20Governance
public.dhe.ibm.com/common/ssi/ecm/ci/en/ci
w03090usen/CIW03090USEN.PDF
http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html
?Open
62 2015 IBM Corporation
Designing and Operating a Data Lake / Reservoir
http://www.redbooks.ibm.com/Redbooks.ns
f/RedbookAbstracts/sg248165.html
Reasonable is the depth and breadth of the data used and the
relationships derived reasonable for the application it is used for?