You are on page 1of 17

Data Objects and Attribute Types

Data sets are made up of data objects. A data object represents an entity—in a sales database, the objects may be
customers, store items, and sales; in a medical database, the objects may be patients; in a university database, the
objects may be students, professors, and courses. Data objects are typically described by attributes. Data objects can
also be referred to as samples, examples, instances, data points, or objects. If the data objects are stored in a
database, they are data tuples. That is, the rows of a database correspond to the data objects, and the columns
correspond to the attributes.

What Is an Attribute?

An attribute is a data field, representing a characteristic or feature of a data object. The term dimension is
commonly used in data warehousing. Attributes describing a customer object can include, for example, customer
ID, name, and address. Observed values for a given attribute are known as observations. A set of attributes used to
describe a given object is called an attribute vector (or feature vector). The distribution of data involving one
attribute (or variable) is called univariate. A bivariate distribution involves two attributes, and so on.
The type of an attribute is determined by the set of possible values—nominal, binary, ordinal, or numeric—the
attribute can have. In the following subsections, we introduce each type.

Nominal Attributes

Nominal means “relating to names.” The values of a nominal attribute are symbols or names of things. Each value
represents some kind of category, code, or state, and so nominal attributes are also referred to as categorical. The
values do not have any meaningful order. In computer science, the values are also known as enumerations.
Example: Suppose that hair color and marital status are two attributes describing person objects. In our application,
possible values for hair color are black, brown, blond, red, auburn, gray, and white. The attribute marital status can
take on the values single, married, divorced, and widowed. Both hair color and marital status are nominal attributes.
Another example of a nominal attribute is occupation, with the values teacher, dentist, programmer, farmer, and so
on.

Binary Attributes

A binary attribute is a nominal attribute with only two categories or states: 0 or 1, where 0 typically means that
the attribute is absent, and 1 means that it is present. Binary attributes are referred to as Boolean if the two states
correspond to true and false.

Example - Binary attributes. Given the attribute smoker describing a patient object, 1 indicates that the patient
smokes, while 0 indicates that the patient does not. Similarly, suppose the patient undergoes a medical test that has
two possible outcomes. The attribute medical test is binary, where a value of 1 means the result of the test for the
patient is positive, while 0 means the result is negative.

1|Page
A binary attribute is symmetric if both of its states are equally valuable and carry the same weight; that is, there is
no preference on which outcome should be coded as 0 or 1. One such example could be the attribute gender having
the states male and female.

A binary attribute is asymmetric if the outcomes of the states are not equally impor-tant, such as the positive and
negative outcomes of a medical test for HIV. By convention, we code the most important outcome, which is
usually the rarest one, by 1 (e.g., HIV positive) and the other by 0 (e.g., HIV negative).

Ordinal Attributes

An ordinal attribute is an attribute with possible values that have a meaningful order or ranking among them, but
the magnitude between successive values is not known.

Example - Ordinal attributes. Suppose that drink size corresponds to the size of drinks available at a fast-food
restaurant. This nominal attribute has three possible values: small, medium, and large. The values have a
meaningful sequence (which corresponds to increasing drink size);

Numeric Attributes

A numeric attribute is quantitative; that is, it is a measurable quantity, represented in integer or real values.
Numeric attributes can be interval-scaled or ratio-scaled.

Interval-Scaled Attributes

Interval-scaled attributes are measured on a scale of equal-size units. The values of interval-scaled attributes
have order and can be positive, 0, or negative. Thus, in addition to providing a ranking of values, such attributes
allow us to compare and quantify the difference between values.

Example - Interval-scaled attributes. A temperature attribute is interval-scaled. Suppose that we have the
outdoor temperature value for a number of different days, where each day is an object. By ordering the values, we
obtain a ranking of the objects with respect to temperature. In addition, we can quantify the difference between
values. For example, a temperature of 20 C is five degrees higher than a temperature of 15 C. Calendar dates are
another example. For instance, the years 2002 and 2010 are eight years apart.

Ratio-Scaled Attributes

A ratio-scaled attribute is a numeric attribute with an inherent zero-point. That is, if a measurement is ratio-
scaled, we can speak of a value as being a multiple (or ratio) of another value. In addition, the values are ordered,
and we can also compute the difference between values, as well as the mean, median, and mode.

2|Page
Example - Ratio-scaled attributes. Unlike temperatures in Celsius and Fahrenheit, the Kelvin (K) temperature
scale has what is considered a true zero-point (0 K D 273.15 C): It is the point at which the particles that comprise
matter have zero kinetic energy. Other examples of ratio-scaled attributes include count attributes such as years of
experience (e.g., the objects are employees) and number of words (e.g., the objects are documents). Additional
examples include attributes to measure weight, height, latitude and longitude coordinates.

Discrete versus Continuous Attributes

A discrete attribute has a finite or countably infinite set of values, which may or may not be represented as
integers. The attributes hair color, smoker, medical test, and drink size each have a finite number of values, and so
are discrete.
An attribute is countably infinite if the set of possible values is infinite but the values can be put in a one-to-one
correspondence with natural numbers. For example, the attribute customer ID is countably infinite. The number of
customers can grow to infinity, but in reality, the actual set of values is countable. Zip codes are another example.
If an attribute is not discrete, it is continuous. In practice, real values are represented using a finite number of
digits. Continuous attributes are typically represented as floating-point variables.

OLAP OPERATIONS
CUBE

A Data cube is a multi dimensional extension of two dimensional tables.


It is a collection of two identical 2-d tables stacked upon one another. A cube contains measures
(fact) and dimensions.

Cube is a structure that allows fast analysis of Data .The limitation of arranging data in relational
db has been overcome with Cube.

Data cubes are used to represent data that is complex to be described by a table which has
columns and rows.

Most of the Decision support systems use data In the form of DATA CUBE and it can be a 2D or
a 3D or a Higher Dimension. Now Each dimension Represents an attribute in the DB and hence
data cube is a 3D or more while representing DATA or Interpreting data.

3|Page
Data cubes dimensions represents a characteristics of a DB and the data inside the cube helps in
analysing on every possible dimension, where by generating a trend. These cubes are used by
Analysis systems, Reporting Systems, Data Mining Systems. Multidimensional databases uses
cubes to represent data.

These DATA CUBES are classified into

OLAP

Relational OLAP

Fig 1.0 Example of a Cube

4|Page
OLAP

Stands for online analytical processing is a computer processing that enables users to easily
select, extract and view data from different view point. OLAP, Allows the users, to analyze
database information from multiple database systems at a time.

Data is stored in multi dimensional databases and this database is used for dataware hosue and
OLAP CUBES are created from existing DataBase but in relational DB, we use query.

OLAP is used for Data Mining and all the OLAP Products are designed for multiple user
environment

5|Page
Few olap servers are

Oracle express server

Hyperion

OLAP is used for Data Mining and all the OLAP Products are designed for multiple user
environment

6|Page
Nigel Pendse has suggested that an alternative and perhaps more descriptive term to describe the
concept of OLAP is Fast Analysis of Shared Multidimensional Information (FASMI).

The first product that performed OLAP queries was Express, which was released in 1970 (and
acquired by Oracle in 1995 from Information Resources). However, the term did not appear until
1993 when it was coined by Ted Codd, who has been described as "the father of the relational
database".

The user‐initiated process of navigating by calling for page displays interactively, through the
specification of slices via rotations and drill down/up is sometimes called "slice and dice".
OLAP OPERATIONS

To perform operations on an OLAP CUBE we need Dimension Tables and Fact Tables

7|Page
If there is hierarchical structure for dimensions then we call that structure to be Dimensional
Schema

Eg if product is a dimension then the hierarchy is

PRODUCT TIME Student

ALL All – year All


CATEGORY Semester University
PRODUCT Quarter College
Month Autonomos
Day Affiliated
Location
College code

The Operations of OLAP are as Follows

8|Page
ROLL UP/CONSOLIDATAION

 Multi-dimensional databases have hierarchies with respect to the dimensions.


 Roll up operation will add data with respect to dimensions
 Consolidation is rolling up or adding data relationship with respect to one or more dimensions.
 This hierarchy can be as, the total order street<city<State<country.
 The roll up operation aggregates the data by city to country by location hierarchy as shown in
the following diagram.

9|Page
DRILL DOWN

SLICE

10 | P a g e
DICE

PIVOT

11 | P a g e
Or

https://www.tutorialride.com/data-mining/olap.htm

for OLAP OPERATIONS

THREE TIER DATA WARE HOUSE ARCHITECTURE

A data warehouse is usually constructed by integrating multiple heterogeneous sources such as


relational databases and online transaction records. Data cleaning and data integration techniques
are applied to ensure consistency in naming conventions, encoding structures, attribute measures,
and so on.

The most common Architecture adopted by DW is a THREE TIER architecture which are as
follows

BottomLAYER (EXTRATION & TRANSFORMATION)

Middle

Top

12 | P a g e
Bottom TIER

The Bottom Tier consists of DATA WARE HOUSE SERVER, It is the Relational DATABAES
system, The Dataware house server will fetch only relevant information based on data mining
request.

The backend tools are used to store or feed data into the bottom tier. The functions Performed
by Back End Tools are Extract, Clean, Transformation, Load, and Refresh functions.

The extraction is the process of refining the data that is collected from the different sources like
internal database of the organization, external databases from various departments of the
institute, other leading educational libraries in the city, etc.

Two methods can be used for the extraction of the data from sources, viz.,

bulk extraction and

change-based extraction.

The entire process of extracting data from multiple sources, transforming it into a unique
standard format and finally the loading into the warehouse is referred as extraction,
transformation and loading (ETL) process. Operational Databases

Bottom Tier Contains External sources, DatawareHouse

Only relevant information is extracted based on data mining knowledge base. Where the
extracted information is subject oriented, integrated from multiple sources, time variant,
nonvolatile..

DataMarts

Are subsets of DW, where the info or a data mart is confined to a specific subject.Datamarts can
be categeorized into two..

Dependent – data is directy from an enterprise data warehouse

Independent – dartamaers are from multiple operational data sources

Metadatarepository

Helps us to identify what is available in Datawarehouse . As in the structure of the DW, datasets
names, definitions, algorithms used in performing cleaning, source of extracted data. Sequence
of extracted data.

13 | P a g e
Monitoring and Administration

 Data Refreshment
 Data source synchronization
 Disaster recovery
 Managing data growth, database performace
 Controlling the number & range of queries
 Limiting the size of data warehous

MIDDLE TIER

It presents the users a multidimensional data from data warehouse or data marts.Typically
implemented using two models:

1. ROLAP(Relational OLAP) Model - Present data in relational tables.


2. MOLAP(Multidimensional) Model - Present data in array based structures means map directly
to data cube

It Has OLAP Server where we can have ROLPA or MOLAP.

TOP TIER

Is the presentation layer which has reporting, analysis and data mining tools, which acts as a
User Interface between the Dartawarehouse ad the end user either for Querying, Analyzing,
Report Generation.

OLTP Vs OLAP
OLAP OLTP
On-line Analytical Processing On-line Transaction Processing
Has very low volume of Transactions , queries Large volume of short online transactions such
are often complex and involves aggregation as insert, delete, update

Response time is the key factor for an OLAP Emphasis is on very fast query processing,
System. maintaining data integrity, ensuring more
These systems are widely used in Data Mining number of transactions per second.
techniques

OLAP systems data are form various OLTP OLTP systems data are the original source of
systems and called as Consolidation data data and we call this as operational data.
Data reveals Multi-dimensional View of Data reveals a snapshot of ongoing business
Various kinds of business Activities. processes

14 | P a g e
Processing Speed depends on the volume of The Processing Speed is Very Fast
data involved. Batch data refreshes and
complex queries may take hours.
Data Base Design is De-Normalized with Database design is highly Normalized with
fewer Tables, using Star and SnowFlake Many Tables
Schema
The Purpose of OLAP s are to help with The Purpose of OLTP’s is to control and run
Planning, Problem Solving and Decision the Fundamental Business Tasks
Support.
Space requirement can be larger due to Space Requirement can be very small if
existence of aggregation structures and historic historical data is archived.
data.
Data recovery / back up is done simply by Back up is taken on a regular interval as
reloading the OLTP data as data recovery operational data is critical to run the Business.
Process If this data is lost then it results in monetary
loss and legal liability.

CHARACTERISTICS OF DATA WARE HOUSE

Data Warehouse is designed with four characteristics. They are

1. Time variant.
2. Non Volatile.
3. Integrated.
4. Subject Oriented.

TIME VARIANT

A Data Warehouse is a time variant data base, which supports the business management in
analysing the business and comparing the business with different time periods like Year, Quarter,
Month, Week and Date.

15 | P a g e
ATTRIBUTES OF TIME
 DAY_NAME
 DAY_NUMBER_IN_WEEK
 DAY_NUMBER_IN_MONTH
 DAY_NUMBER_IN_YEAR
 WEEK_NUMBER_IN_MONTH
 WEEK_NUMBER_IN_YEAR
 MONTH_NUMBER
 MONTH_YEAR
 QUARTER_YEAR
 QUARTER_NUMBER
 YEAR
 SESSION
 WEEKEND_INDICATOR_FLAG
 WEEKDAY_INDICATOR_FLAG

NON VOLATILE

It is non volatile Database, once the data entered into the database, it does not reflects to the
change which takes place at operational database. Hence the data is statics in Data Warehouse.

 It generates artificially keys or surrogate keys to store the history.


 A surrogate key generated serious of numbers.
 It requires more disk space.

16 | P a g e
INTEGRATED DATABASE

A DWH is a integrated database, which allows you to collect the data and integrate the data with
multiple database sources.

SUBJECT ORIENTED

Data warehouse is a subject oriented database, which supports the business need of individual
department specific user.
Example : Sales, HR, Accounts, Marketing etc.

17 | P a g e

You might also like