You are on page 1of 16

RF

Corsello Research Foundation

Data Modeling Interviews


Lines of Questioning

Basics
Data modeling is about defining standard structures for data
Many data sets may share a common structure
Each thing in the real world should have only one data structure Each data structure may appear in multiple data models

Data models come in 2 primary flavors


Domain model
Models all entities specific to a domain Aligns to task automation and workflows

Entity model
Models entities regardless of domain

RF

Corsello Research Foundation

Software
Software works with or on data
Software is actually a form of data as well Software should be keyed to a data model

Software may be built for dynamic data models


Allows for mapping to a specific implementation of a data model Results in general purpose software

Generally, lower performance


Lower specialization, Higher generality

Software may be keyed to a specific data model


Allows for high-performance, specialized tooling Allows for integration with workflows specific to the domain Lower generality

Neither model is better, just different

RF

Corsello Research Foundation

Data Stores
A collection of data based upon a single data model in a coherent repository is a data store A relational database is a form of repository for data stores
A single RDBMS instance may contain multiple data stores

Data stores may be abstracted by software in numerous ways to enable access


Web-based services (SOAP/REST/JSON/RSS) Database API (e.g. ODBC/JDBC)

Remote Service (non-web, like CORBA/DCOM/IIOP)


Native API (e.g. code library/dll/jar)
Corsello Research Foundation

RF

Data Models
Data models serve several purposes

Standard data models enable standard data formats, which enable sharing
Standard data models enable standard software implementations, which enable application integration Data models provide standard vocabularies for communicating Data models provide references for standardizing workflows
A workflow will require and produce data from the model Better enables defining standard entry and exit criteria

Data format standards are not data models


A standard XML schema is a standard encoding of data that implies structure, but is not itself a data model A data model is more abstract, it does not constrain implementation, encoding or use

Data models are only part of the bigger picture of standardization of practice

RF

Corsello Research Foundation

Parts and Pieces


The goal is consistency, repeatability, measurability and reuse (sharing)

This goal requires multiple facets:


Standard data models Standard methodologies
Technical models, algorithms and approaches

Standard business processes


Delineation of responsibility Processes and procedures Workflow models

In short, standards
Does not require agreement, only acceptance Standards do not need to fit everyones needs, only the cross-section of needs Standards should be composable to get more detail thats how to support everyone (a web of standards)

RF

Corsello Research Foundation

People
All activities are performed for and/or by people An task is automated to remove a person from needing to perform the task, however the result of the task will flow to a person People will appreciate the results of standardization, if done well but:
There is a fear that automation is meant to put them out of work There is a dislike for being required to do things in a different manner than we are used to (xenophobia) People want results, standardization is not quick

RF

Corsello Research Foundation

Coping with change


To enable standardization to work well, expect long time lines

Expect people to not support the time lines


Deliver results in the interim, without the promise of the standardization The grand vision of the resulting utopia from standardization should be avoided
There is no silver bullet, only hard work and good intent Dont hide the goals, but emphasize the short-term goals Dont let the short-term goals undermine the grand vision

The long-term goals are the most important to maintain relevance


The short-term goals are the most important to maintain support

RF

Corsello Research Foundation

Interviewing (finally)
When holding a data modeling / business process session, remember it is a collaborative interview Get relevant people involved:
Average user in the domain Hotshot or Hero in the domain Trouble child or Technophobe in the domain Minimal managers in general meetings Meet with management in a separate meeting both before and after for differing views

Get a cross-section of what the domain is

RF

Corsello Research Foundation

The Session(s)
Ask questions to spur discussion
The people are a cross-section of the domain to ensure active discussion

The facilitator / modeler do not actually create the model, the audience does
Maintain enough control and direction to stay on topic Some discussion need to go off-topic to get to a point

The modeler guides the model development based upon their knowledge of modeling practices, not the domain
The modeler should understand the domain well enough to know what is on or off track

The outcome of the meetings is a high-level abstract data model and process model
One is of little use without the other in a specific domain Entity data model sessions

Should result in a domain map indicating what domains this entity model is relevant to
Map should directly intersect the audience

RF

Corsello Research Foundation

Questions
There are no fixed questions to ask
It is imperative to teach data model basics in most cases

The line of questioning should be exploratory Try to answer


What does your domain do (and not do)
Establishes boundary of the domain

Who does your domain contain (and not contain)


Establishes a list of organizations of responsibility and regulatory environment Establishes a relative size for the domain

Who do you serve and interact with


Establishes a list of consumers of what domain produces Establishes a list of suppliers the domain consumes from

How does your domain accomplish this


Establishes a list of processes / practices

RF

Corsello Research Foundation

Modeling
Continue elaborating the previous questions

Extract from the answers


What do you use (tools, data, techniques) Where do you use X (for each data entity X) What is the same/different about each data entity

Establish a baseline of entities


Forms the core data model Extract fields/attributes

Extract metadata (descriptions)


Extract relations/multiplicities
Corsello Research Foundation

RF

Build a Model
Still during the meeting

Depict graphically:
Data entities Entity relations Process uses / domain mappings

Probe users for issues with the model


Whats missing What is not always true with the model What is domain specific about the model

What cannot be lived without


What is too costly to require or is inherently optional

RF

Corsello Research Foundation

Build the Real Model


After the meeting is over

Decompose the model into a logical data model representation (e.g. in UML)
Partition the model
Find natural break points in the entities Isolate each entity

Resolve dependencies into a parent and child


Extends the relational concept in that the parent data model owns the link to the child, the child is not required to know about the link

Address partition consistency issues


Define any mandatory constraints in the model

Expect implementations will not be 100% able to enforce contstraints


Expect implementations to be fully distributed, loosely coupled and inter-organizational

RF

Corsello Research Foundation

Review and Splanations


Provide the real model to the community
Expect concerns and issues
No word generally means nobody understands, or nobody cares Expect most issues will be addressed not by changing the model, but by explaining the concepts of the model

Educate, explain and provide examples


Most users will want to directly relate a model to an implementation of the model It is extremely hard to convey the difference It is critical to maintain a complete separation of the model from its implementation If (when) example implementations are shown, they should test the boundary of what is compliant with the model

RF

Corsello Research Foundation

Questions
RF
Corsello Research Foundation

You might also like