MBA 3rd Sem Assignment

By-V.K.
Saini MBA(IS) Semester- 3

2010(Fall)
Assignment (Set-1)
Subject code: MB0050
Research Methodology
Q 1. Give examples of specific situations that would call for the following types of
research, explaining why – a) Exploratory research b) Descriptive research c) Diagnostic
research d) Evaluation research.
Ans.: Research may be classified crudely according to its major intent or the methods. According to the
intent, research may be classified as:
Basic (aka fundamental or pure) research is driven by a scientist's curiosity or interest in a scientific question.
The main motivation is to expand man's knowledge, not to create or invent something. There is no obvious
commercial value to the discoveries that result from basic research.
For example, basic science investigations probe for answers to questions such as:
• How did the universe begin?
• What are protons, neutrons, and electrons composed of?
• How do slime molds reproduce?
• What is the specific genetic code of the fruit fly?
Most scientists believe that a basic, fundamental understanding of all branches of science is needed in order
for progress to take place. In other words, basic research lays down the foundation for the applied science
that follows. If basic work is done first, then applied spin-offs often eventually result from this research. As Dr.
George Smoot of LBNL says, "People cannot foresee the future well enough to predict what's going to
develop from basic research. If we only did applied research, we would still be making better spears."
Applied research is designed to solve practical problems of the modern world, rather than to acquire
knowledge for knowledge's sake. One might say that the goal of the applied scientist is to improve the
human condition.
For example, applied researchers may investigate ways to:
• Improve agricultural crop production
• Treat or cure a specific disease
• Improve the energy efficiency of homes, offices, or modes of transportation
Some scientists feel that the time has come for a shift in emphasis away from purely basic research and
toward applied science. This trend, they feel, is necessitated by the problems resulting from global
overpopulation, pollution, and the overuse of the earth's natural resources.
Exploratory research provides insights into and comprehension of an issue or situation. It should draw
definitive conclusions only with extreme caution. Exploratory research is a type of research conducted
because a problem has not been clearly defined. Exploratory research helps determine the best research
design, data collection method and selection of subjects. Given its fundamental nature, exploratory research
often concludes that a perceived problem does not actually exist.
Page 1
By-V.K. Saini MBA(IS) Semester- 3
2010(Fall)
Exploratory research often relies on secondary research such as reviewing available literature and/or data,
or qualitative approaches such as informal discussions with consumers, employees, management or
competitors, and more formal approaches through in-depth interviews, focus groups, projective methods,
case studies or pilot studies. The Internet allows for research methods that are more interactive in nature:
E.g., RSS feeds efficiently supply researchers with up-to-date information; major search engine search
results may be sent by email to researchers by services such as Google Alerts; comprehensive search
results are tracked over lengthy periods of time by services such as Google Trends; and Web sites may be
created to attract worldwide feedback on any subject.
The results of exploratory research are not usually useful for decision-making by themselves, but they can
provide significant insight into a given situation. Although the results of qualitative research can give some
indication as to the "why", "how" and "when" something occurs, it cannot tell us "how often" or "how many."
Exploratory research is not typically generalizable to the population at large.
A defining characteristic of causal research is the random assignment of participants to the conditions of the
experiment; e.g., an Experimental and a Control Condition... Such assignment results in the groups being
comparable at the beginning of the experiment. Any difference between the groups at the end of the
experiment is attributable to the manipulated variable. Observational research typically looks for difference
among "in-tact" defined groups. A common example compares smokers and non-smokers with regard to
health problems. Causal conclusions can't be drawn from such a study because of other possible differences
between the groups; e.g., smokers may drink more alcohol than non-smokers. Other unknown differences
could exist as well. Hence, we may see a relation between smoking and health but a conclusion that
smoking is a cause would not be warranted in this situation. (Cp)
Descriptive research, also known as statistical research, describes data and characteristics about the
population or phenomenon being studied. Descriptive research answers the questions who, what, where,
when and how.
Although the data description is factual, accurate and systematic, the research cannot describe what caused
a situation. Thus, descriptive research cannot be used to create a causal relationship, where one variable
affects another. In other words, descriptive research can be said to have a low requirement for internal
validity.
The description is used for frequencies, averages and other statistical calculations. Often the best approach,
prior to writing descriptive research, is to conduct a survey investigation. Qualitative research often has the
aim of description and researchers may follow-up with examinations of why the observations exist and what
the implications of the findings are.
In short descriptive research deals with everything that can be counted and studied. But there are always
restrictions to that. Your research must have an impact to the life of the people around you. For example,
finding the most frequent disease that affects the children of a town. The reader of the research will know
what to do to prevent that disease thus; more people will live a healthy life.
Diagnostic study: it is similar to descriptive study but with different focus. It is directed towards discovering
what is happening and what can be done about. It aims at identifying the causes of a problem and the
possible solutions for it. It may also be concerned with discovering and testing whether certain variables are
associated. This type of research requires prior knowledge of the problem, its thorough formulation, clear-cut
definition of the given population, adequate methods for collecting accurate information, precise
measurement of variables, statistical analysis and test of significance.
Evaluation Studies: it is a type of applied research. It is made for assessing the effectiveness of social or
economic programmes implemented or for assessing the impact of development of the project area. It is thus
directed to assess or appraise the quality and quantity of an activity and its performance and to specify its
Page 2
2010(Fall)
attributes and conditions required for its success. It is concerned with causal relationships and is more
actively guided by hypothesis. It is concerned also with change over time.
Action research is a reflective process of progressive problem solving led by individuals working with others
in teams or as part of a "community of practice" to improve the way they address issues and solve problems.
Action research can also be undertaken by larger organizations or institutions, assisted or guided by
professional researchers, with the aim of improving their strategies, practices, and knowledge of the
environments within which they practice. As designers and stakeholders, researchers work with others to
propose a new course of action to help their community improve its work practices (Center for Collaborative
Action Research). Kurt Lewin, then a professor at MIT, first coined the term “action research” in about 1944,
and it appears in his 1946 paper “Action Research and Minority Problems”. In that paper, he described action
research as “a comparative research on the conditions and effects of various forms of social action and
research leading to social action” that uses “a spiral of steps, each of which is composed of a circle of
planning, action, and fact-finding about the result of the action”.
Action research is an interactive inquiry process that balances problem solving actions implemented in a
collaborative context with data-driven collaborative analysis or research to understand underlying causes
enabling future predictions about personal and organizational change (Reason & Bradbury, 2001). After six
decades of action research development, many methodologies have evolved that adjust the balance to focus
more on the actions taken or more on the research that results from the reflective understanding of the
actions. This tension exists between
● those that are more driven by the researcher’s agenda to those more driven by participants;
• Those that are motivated primarily by instrumental goal attainment to those motivated primarily
by the aim of personal, organizational, or societal transformation; and
• 1st-, to 2nd-, to 3rd-person research, that is, my research on my own action, aimed primarily at
personal change; our research on our group (family/team), aimed primarily at improving the
group; and ‘scholarly’ research aimed primarily at theoretical generalization and/or large scale
change.
Action research challenges traditional social science, by moving beyond reflective knowledge created by
outside experts sampling variables to an active moment-to-moment theorizing, data collecting, and inquiring
occurring in the midst of emergent structure. “Knowledge is always gained through action and for action.
From this starting point, to question the validity of social knowledge is to question, not how to develop a
reflective science about action, but how to develop genuinely well-informed action — how to conduct an
action science” (Tolbert 2001).
Q 2.In the context of hypothesis testing, briefly explain the difference between a) Null and
alternative hypothesis b) Type 1 and type 2 error c) Two tailed and one tailed test d)
Parametric and non-parametric tests.
Ans.: Some basic concepts in the context of testing of hypotheses are explained below –
1) Null Hypotheses and Alternative Hypotheses: In the context of statistical analysis, we often talk about
null and alternative hypotheses. If we are to compare the superiority of method A with that of method B
and we proceed on the assumption that both methods are equally good, then this assumption is termed
as a null hypothesis. On the other hand, if we think that method A is superior, then it is known as an
alternative hypothesis.
These are symbolically represented as:

Null hypothesis = H0 and Alternative hypothesis = Ha
Page 3
2010(Fall)
Suppose we want to test the hypothesis that the population mean is equal to the hypothesized mean (µ H0)
= 100. Then we would say that the null hypothesis is that the population mean is equal to the hypothesized
mean 100 and symbolically we can express it as: H0: µ= µ H0=100
If our sample results do not support this null hypothesis, we should conclude that something else is true.
What we conclude rejecting the null hypothesis is known as an alternative hypothesis. If we accept H0, then
we are rejecting Ha and if we reject H0, then we are accepting Ha. For H0: µ= µ H0=100, we may consider
three possible alternative hypotheses as follows:
Alternative To be read as follows

Hypotheses
Ha: µ≠µ H0 (The alternative hypothesis is that the population mean is not equal to 100 i.e., it
may be more or less 100)
Ha: µ>µ H0 (The alternative hypothesis is that the population mean is greater than 100)
Ha: µ< µ H0 (The alternative hypothesis is that the population mean is less than 100)
The null hypotheses and the alternative hypotheses are chosen before the sample is drawn (the researcher
must avoid the error of deriving hypotheses from the data he collects and testing the hypotheses from the
same data). In the choice of null hypothesis, the following considerations are usually kept in view:
1a. The alternative hypothesis is usually the one, which is to be proved, and the null hypothesis
is the one that is to be disproved. Thus a null hypothesis represents the hypothesis we are
trying to reject, while the alternative hypothesis represents all other possibilities.
2b. If the rejection of a certain hypothesis when it is actually true involves great risk, it is taken
as null hypothesis, because then the probability of rejecting it when it is true is α (the level of
significance) which is chosen very small.
3c. The null hypothesis should always be a specific hypothesis i.e., it should not state an
approximate value.
4
Generally, in hypothesis testing, we proceed on the basis of the null hypothesis, keeping the alternative
hypothesis in view. Why so? The answer is that on the assumption that the null hypothesis is true, one can
assign the probabilities to different possible sample results, but this cannot be done if we proceed with
alternative hypotheses. Hence the use of null hypotheses (at times also known as statistical hypotheses) is
quite frequent.
2) The Level of Significance: This is a very important concept in the context of hypothesis testing. It is
always some percentage (usually 5%), which should be chosen with great care, thought and reason. In
case we take the significance level at 5%, then this implies that H0 will be rejected when the sampling
result (i.e., observed evidence) has a less than 0.05 probability of occurring if H0 is true. In other words,
the 5% level of significance means that the researcher is willing to take as much as 5% risk rejecting the
null hypothesis when it (H0) happens to be true. Thus the significance level is the maximum value of the
probability of rejecting H0 when it is true and is usually determined in advance before testing the
hypothesis.
3) Decision Rule or Test of Hypotheses: Given a hypothesis Ha and an alternative hypothesis H0, we
make a rule, which is known as a decision rule, according to which we accept H0 (i.e., reject Ha) or reject
H0 (i.e., accept Ha). For instance, if H0 is that a certain lot is good (there are very few defective items in
it), against Ha, that the lot is not good (there are many defective items in it), then we must decide the
number of items to be tested and the criterion for accepting or rejecting the hypothesis. We might test 10
items in the lot and plan our decision saying that if there are none or only 1 defective item among the 10,
we will accept H0; otherwise we will reject H0 (or accept Ha). This sort of basis is known as a decision
rule.
4) Type I & II Errors: In the context of testing of hypotheses, there are basically two types of errors that we
can make. We may reject H0 when H0 is true and we may accept H0 when it is not true. The former is
known as Type I and the latter is known as Type II. In other words, Type I error means rejection of
Page 4
2010(Fall)
hypotheses, which should have been accepted, and Type II error means accepting of hypotheses, which
should have been rejected. Type I error is denoted by α (alpha), also called as level of significance of test;
and Type II error is denoted by β(beta).
Decision
Accept H0 Reject H0
H0 (true) Correct decision Type I error (α error)
Ho (false) Type II error (β error) Correct decision
The probability of Type I error is usually determined in advance and is understood as the level of significance
of testing the hypotheses. If type I error is fixed at 5%, it means there are about 5 chances in 100 that we will
reject H0 when H0 is true. We can control type I error just by fixing it at a lower level. For instance, if we fix it
at 1%, we will say that the maximum probability of committing type I error would only be 0.01.
But with a fixed sample size n, when we try to reduce type I error, the probability of committing type II error
increases. Both types of errors cannot be reduced simultaneously, since there is a trade-off in business
situations. Decision makers decide the appropriate level of type I error by examining the costs of penalties
attached to both types of errors. If type I error involves time and trouble of reworking a batch of chemicals
that should have been accepted, whereas type II error means taking a chance that an entire group of users
of this chemicals compound will be poisoned, then in such a situation one should prefer a type I error to a
type II error. As a result, one must set a very high level for type I error in one’s testing techniques of a given
hypothesis. Hence, in testing of hypotheses, one must make all possible efforts to strike an adequate
balance between Type I & Type II error.
15) Two Tailed Test & One Tailed Test: In the context of hypothesis testing, these two terms are quite
important and must be clearly understood. A two-tailed test rejects the null hypothesis if, say, the sample
mean is significantly higher or lower than the hypothesized value of the mean of the population. Such a test
is inappropriate when we have H0: µ= µ H0 and Ha: µ≠µ H0 which may µ>µ H0 or µ<µ H0. If significance
level is 5 % and the two-tailed test is to be applied, the probability of the rejection area will be 0.05 (equally
split on both tails of the curve as 0.025) and that of the acceptance region will be 0.95. If we take µ = 100
and if our sample mean deviates significantly from µ, in that case we shall accept the null hypothesis. But
there are situations when only a one-tailed test is considered appropriate. A one-tailed test would be used
when we are to test, say, whether the population mean is either lower or higher than some hypothesized
value.
2
Parametric statistics is a branch of statistics that assumes data come from a type of probability
distribution and makes inferences about the parameters of the distribution most well known elementary
statistical methods are parametric.
Generally speaking parametric methods make more assumptions than non-parametric methods. If those
extra assumptions are correct, parametric methods can produce more accurate and precise estimates. They
are said to have more statistical power. However, if those assumptions are incorrect, parametric methods
can be very misleading. For that reason they are often not considered robust. On the other hand,
parametric formulae are often simpler to write down and faster to compute. In some, but definitely not all
cases, their simplicity makes up for their non-robustness, especially if care is taken to examine diagnostic
statistics.
Because parametric statistics require a probability distribution, they are not distribution-free.
Non-parametric models differ from parametric models in that the model structure is not specified a priori but
is instead determined from data. The term nonparametric is not meant to imply that such models completely
lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.
Kernel density estimation provides better estimates of the density than histograms.
Nonparametric regression and semi parametric regression methods have been developed based
on kernels, splines, and wavelets.
Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate
Analysis without any distributional assumption.
Q 3. Explain the difference between a causal relationship and correlation, with an example
of each. What are the possible reasons for a correlation between two variables?
Page 5
2010(Fall)
Ans.: Correlation: The correlation is knowing what the consumer wants, and providing it. Marketing
research looks at trends in sales and studies all of the variables, i.e. price, color, availability, and styles, and
the best way to give the customer what he or she wants. If you can give the customer what they want, they
will buy, and let friends and family know where they got it. Making them happy makes the money.
Casual relationship Marketing was first defined as a form of marketing developed from direct response
marketing campaigns, which emphasizes customer retention and satisfaction, rather than a dominant focus
on sales transactions.
As a practice, Relationship Marketing differs from other forms of marketing in that it recognizes the long term
value of customer relationships and extends communication beyond intrusive advertising and sales
promotional messages.
With the growth of the internet and mobile platforms, Relationship Marketing has continued to evolve and
move forward as technology opens more collaborative and social communication channels. This includes
tools for managing relationships with customers that goes beyond simple demographic and customer service
data. Relationship Marketing extends to include Inbound Marketing efforts (a combination of search
optimization and Strategic Content), PR, Social Media and Application Development.
Just like Customer relationship management(CRM), Relationship Marketing is a broadly recognized, widely-
implemented strategy for managing and nurturing a company’s interactions with clients and sales prospects.
It also involves using technology to, organize, synchronize business processes (principally sales and
marketing activities) and most importantly, automate those marketing and communication activities on
concrete marketing sequences that could run in autopilot (also known as marketing sequences). The overall
goals are to find, attract, and win new clients, nurture and retain those the company already has, entice
former clients back into the fold, and reduce the costs of marketing and client service. [1] Once simply a label
for a category of software tools, today, it generally denotes a company-wide business strategy embracing all
client-facing departments and even beyond. When an implementation is effective, people, processes, and
technology work in synergy to increase profitability, and reduce operational costs
Reasons for a correlation between two variables: Chance association, (the relationship is due to chance)
or causative association (one variable causes the other).
The information given by a correlation coefficient is not enough to define the dependence structure between
random variables. The correlation coefficient completely defines the dependence structure only in very
particular cases, for example when the distribution is a multivariate normal distribution. (See diagram above.)
In the case of elliptic distributions it characterizes the (hyper-)ellipses of equal density, however, it does not
completely characterize the dependence structure (for example, a multivariate t-distribution's degrees of
freedom determine the level of tail dependence).
Distance correlation and Brownian covariance / Brownian correlation [8][9] were introduced to address the
deficiency of Pearson's correlation that it can be zero for dependent random variables; zero distance
correlation and zero Brownian correlation imply independence.
The correlation ratio is able to detect almost any functional dependency, or the entropy-based mutual
information/total correlation which is capable of detecting even more general dependencies. The latter are
sometimes referred to as multi-moment correlation measures, in comparison to those that consider only 2nd
moment (pairwise or quadratic) dependence. The polychoric correlation is another correlation applied to
ordinal data that aims to estimate the correlation between theorised latent variables. One way to capture a
more complete view of dependence structure is to consider a copula between them.
Q 4. Briefly explain any two factors that affect the choice of a sampling technique. What are
the characteristics of a good sample?
Page 6
2010(Fall)
Ans.: The difference between non-probability and probability sampling is that non-probability sampling
does not involve random selection and probability sampling does. Does that mean that non-probability
samples aren't representative of the population? Not necessarily. But it does mean that non-probability
samples cannot depend upon the rationale of probability theory. At least with a probabilistic sample, we know
the odds or probability that we have represented the population well. We are able to estimate confidence
intervals for the statistic. With non-probability samples, we may or may not represent the population well, and
it will often be hard for us to know how well we've done so. In general, researchers prefer probabilistic or
random sampling methods over non probabilistic ones, and consider them to be more accurate and rigorous.
However, in applied social research there may be circumstances where it is not feasible, practical or
theoretically sensible to do random sampling. Here, we consider a wide range of non-probabilistic
alternatives.
We can divide non-probability sampling methods into two broad types:
Accidental or purposive.
Most sampling methods are purposive in nature because we usually approach the sampling problem
with a specific plan in mind. The most important distinctions among these types of sampling methods are the
ones between the different types of purposive sampling approaches.
Accidental, Haphazard or Convenience Sampling
One of the most common methods of sampling goes under the various titles listed here. I would
include in this category the traditional "man on the street" (of course, now it's probably the "person on the
street") interviews conducted frequently by television news programs to get a quick (although non
representative) reading of public opinion. I would also argue that the typical use of college students in much
psychological research is primarily a matter of convenience. (You don't really believe that psychologists use
college students because they believe they're representative of the population at large, do you?). In clinical
practice, we might use clients who are available to us as our sample. In many research contexts, we sample
simply by asking for volunteers. Clearly, the problem with all of these types of samples is that we have no
evidence that they are representative of the populations we're interested in generalizing to -- and in many
cases we would clearly suspect that they are not.
Purposive Sampling
In purposive sampling, we sample with a purpose in mind. We usually would have one or more
specific predefined groups we are seeking. For instance, have you ever run into people in a mall or on the
street who are carrying a clipboard and who are stopping various people and asking if they could interview
them? Most likely they are conducting a purposive sample (and most likely they are engaged in market
research). They might be looking for Caucasian females between 30-40 years old. They size up the people
passing by and anyone who looks to be in that category they stop to ask if they will participate. One of the
first things they're likely to do is verify that the respondent does in fact meet the criteria for being in the
sample. Purposive sampling can be very useful for situations where you need to reach a targeted sample
quickly and where sampling for proportionality is not the primary concern. With a purposive sample, you are
likely to get the opinions of your target population, but you are also likely to overweight subgroups in your
population that are more readily accessible.
All of the methods that follow can be considered subcategories of purposive sampling methods. We
might sample for specific groups or types of people as in modal instance, expert, or quota sampling. We
might sample for diversity as in heterogeneity sampling. Or, we might capitalize on informal social networks
to identify specific respondents who are hard to locate otherwise, as in snowball sampling. In all of these
methods we know what we want -- we are sampling with a purpose.
• Modal Instance Sampling

In statistics, the mode is the most frequently occurring value in a distribution. In sampling, when we do a
modal instance sample, we are sampling the most frequent case, or the "typical" case. In a lot of informal
public opinion polls, for instance, they interview a "typical" voter. There are a number of problems with this
sampling approach. First, how do we know what the "typical" or "modal" case is? We could say that the
modal voter is a person who is of average age, educational level, and income in the population. But, it's not
Page 7
2010(Fall)
clear that using the averages of these is the fairest (consider the skewed distribution of income, for instance).
And, how do you know that those three variables -- age, education, income -- are the only or even the most
relevant for classifying the typical voter? What if religion or ethnicity is an important discriminator? Clearly,
modal instance sampling is only sensible for informal sampling contexts.
• Expert Sampling
Expert sampling involves the assembling of a sample of persons with known or demonstrable experience
and expertise in some area. Often, we convene such a sample under the auspices of a "panel of experts."
There are actually two reasons you might do expert sampling. First, because it would be the best way to elicit
the views of persons who have specific expertise. In this case, expert sampling is essentially just a specific
sub case of purposive sampling. But the other reason you might use expert sampling is to provide evidence
for the validity of another sampling approach you've chosen. For instance, let's say you do modal instance
sampling and are concerned that the criteria you used for defining the modal instance are subject to criticism.
You might convene an expert panel consisting of persons with acknowledged experience and insight into
that field or topic and ask them to examine your modal definitions and comment on their appropriateness and
validity. The advantage of doing this is that you aren't out on your own trying to defend your decisions -- you
have some acknowledged experts to back you. The disadvantage is that even the experts can be, and often
are, wrong.
• Quota Sampling
In quota sampling, you select people non-randomly according to some fixed quota. There are two types of
quota sampling: proportional and non proportional. In proportional quota sampling you want to represent
the major characteristics of the population by sampling a proportional amount of each. For instance, if you
know the population has 40% women and 60% men, and that you want a total sample size of 100, you will
continue sampling until you get those percentages and then you will stop. So, if you've already got the 40
women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women
respondents come along, you will not sample them because you have already "met your quota." The
problem here (as in much purposive sampling) is that you have to decide the specific characteristics on
which you will base the quota. Will it be by gender, age, education race, religion, etc.?
Non-proportional quota sampling is a bit less restrictive. In this method, you specify the minimum number
of sampled units you want in each category. Here, you're not concerned with having numbers that match the
proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk
about even small groups in the population. This method is the non-probabilistic analogue of stratified random
sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.
• Heterogeneity Sampling
We sample for heterogeneity when we want to include all opinions or views, and we aren't concerned about
representing these views proportionately. Another term for this is sampling for diversity. In many
brainstorming or nominal group processes (including concept mapping), we would use some form of
heterogeneity sampling because our primary interest is in getting broad spectrum of ideas, not identifying the
"average" or "modal instance" ones. In effect, what we would like to be sampling is not people, but ideas. We
imagine that there is a universe of all possible ideas relevant to some topic and that we want to sample this
population, not the population of people who have the ideas. Clearly, in order to get all of the ideas, and
especially the "outlier" or unusual ones, we have to include a broad and diverse range of participants.
Heterogeneity sampling is, in this sense, almost the opposite of modal instance sampling.
• Snowball Sampling
In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your study.
You then ask them to recommend others who they may know who also meet the criteria. Although this
method would hardly lead to representative samples, there are times when it may be the best method
available. Snowball sampling is especially useful when you are trying to reach populations that are
inaccessible or hard to find. For instance, if you are studying the homeless, you are not likely to be able to
find good lists of homeless people within a specific geographical area. However, if you go to that area and
identify one or two, you may find that they know very well whom the other homeless people in their vicinity
are and how you can find them.
Characteristics of good Sample: The decision process is a complicated one. The researcher has to first
identify the limiting factor or factors and must judiciously balance the conflicting factors. The various criteria
governing the choice of the sampling technique are:
Page 8
2010(Fall)
11. Purpose of the Survey: What does the researcher aim at? If he intends to generalize the
findings based on the sample survey to the population, then an appropriate probability sampling
method must be selected. The choice of a particular type of probability sampling depends on the
geographical area of the survey and the size and the nature of the population under study.
2
32.Measurability: The application of statistical inference theory requires computation of the
sampling error from the sample itself. Only probability samples allow such computation. Hence,
where the research objective requires statistical inference, the sample should be drawn by
applying simple random sampling method or stratified random sampling method, depending on
whether the population is homogenous or heterogeneous.
43.Degree of Precision: Should the results of the survey be very precise, or could even rough
results serve the purpose? The desired level of precision is one of the criteria for sampling
method selection. Where a high degree of precision of results is desired, probability sampling
should be used. Where even crude results would serve the purpose (E.g., marketing surveys,
readership surveys etc), any convenient non-random sampling like quota sampling would be
enough.
54. Information about Population: How much information is available about the population to be
studied? Where no list of population and no information about its nature are available, it is
difficult to apply a probability sampling method. Then an exploratory study with non-probability
sampling may be done to gain a better idea of the population. After gaining sufficient knowledge
about the population through the exploratory study, an appropriate probability sampling design
may be adopted.
65. The Nature of the Population: In terms of the variables to be studied, is the population
homogenous or heterogeneous? In the case of a homogenous population, even simple random
sampling will give a representative sample. If the population is heterogeneous, stratified random
sampling is appropriate.
76. Geographical Area of the Study and the Size of the Population: If the area covered by a
survey is very large and the size of the population is quite large, multi-stage cluster sampling
would be appropriate. But if the area and the size of the population are small, single stage
probability sampling methods could be used.
87. Financial Resources: If the available finance is limited, it may become necessary to choose a
less costly sampling plan like multistage cluster sampling, or even quota sampling as a
compromise. However, if the objectives of the study and the desired level of precision cannot be
attained within the stipulated budget, there is no alternative but to give up the proposed survey.
Where the finance is not a constraint, a researcher can choose the most appropriate method of
sampling that fits the research objective and the nature of population.
98. Time Limitation: The time limit within which the research project should be completed restricts
the choice of a sampling method. Then, as a compromise, it may become necessary to choose
less time consuming methods like simple random sampling, instead of stratified
sampling/sampling with probability proportional to size; or multi-stage cluster sampling, instead
of single-stage sampling of elements. Of course, the precision has to be sacrificed to some
extent.
109. Economy: It should be another criterion in choosing the sampling method. It means achieving
the desired level of precision at minimum cost. A sample is economical if the precision per unit
cost is high, or the cost per unit of variance is low. The above criteria frequently conflict with
each other and the researcher must balance and blend them to obtain a good sampling plan.
The chosen plan thus represents an adaptation of the sampling theory to the available facilities
and resources. That is, it represents a compromise between idealism and feasibility. One should
use simple workable methods, instead of unduly elaborate and complicated techniques.
Page 9
2010(Fall)
Q 5. Select any topic for research and explain how you will use both secondary and primary
sources to gather the required information.
Ans.: Primary Sources of Data

Primary sources are original sources from which the researcher directly collects data that has not been
previously collected, e.g., collection of data directly by the researcher on brand awareness, brand
preference, and brand loyalty and other aspects of consumer behavior, from a sample of consumers by
interviewing them. Primary data is first hand information collected through various methods such as surveys,
experiments and observation, for the purposes of the project immediately at hand.
The advantages of primary data are –

1 It is unique to a particular research study
2 It is recent information, unlike published information that is already available
The disadvantages are –

1 It is expensive to collect, compared to gathering information from available sources
2 Data collection is a time consuming process
3 It requires trained interviewers and investigators
2 Secondary Sources of Data
These are sources containing data, which has been collected and compiled for another purpose. Secondary
sources may be internal sources, such as annual reports, financial statements, sales reports, inventory
records, minutes of meetings and other information that is available within the firm, in the form of a marketing
information system. They may also be external sources, such as government agencies (e.g. census reports,
reports of government departments), published sources (annual reports of currency and finance published by
the Reserve Bank of India, publications of international organizations such as the UN, World Bank and
International Monetary Fund, trade and financial journals, etc.), trade associations (e.g. Chambers of
Commerce) and commercial services (outside suppliers of information).
Methods of Data Collection:

The researcher directly collects primary data from its original sources. In this case, the researcher can collect
the required data precisely according to his research needs and he can collect them when he wants and in
the form that he needs it. But the collection of primary data is costly and time consuming. Yet, for several
types of social science research, required data is not available from secondary sources and it has to be
directly gathered from the primary sources.
Primary data has to be gathered in cases where the available data is inappropriate, inadequate or obsolete.
It includes: socio economic surveys, social anthropological studies of rural communities and tribal
communities, sociological studies of social problems and social institutions, marketing research, leadership
studies, opinion polls, attitudinal surveys, radio listening and T.V. viewing surveys, knowledge-awareness
practice (KAP) studies, farm management studies, business management studies etc.
There are various methods of primary data collection, including surveys, audits and panels, observation and
experiments.
1 Survey Research
A survey is a fact-finding study. It is a method of research involving collection of data directly from a
population or a sample at a particular time. A survey has certain characteristics:
1 It is always conducted in a natural setting. It is a field study.
2 It seeks responses directly from the respondents.
3 It can cover a very large population.
4 It may include an extensive study or an intensive study
5 It covers a definite geographical area.
A survey involves the following steps -

1 Selection of a problem and its formulation
2 Preparation of the research design
3 Operation concepts and construction of measuring indexes and scales
4 Sampling
5 Construction of tools for data collection
6 Field work and collection of data
Page 10
2010(Fall)
7 Processing of data and tabulation

8
9 Analysis of data
10 Reporting
There are four basic survey methods, which include:

1 Personal interview
2 Telephone interview
3 Mail survey and
4 Fax survey
5
Personal Interview
Personal interviewing is one of the prominent methods of data collection. It may be defined as a two-way
systematic conversation between an investigator and an informant, initiated for obtaining information relevant
to a specific study. It involves not only conversation, but also learning from the respondent’s gestures, facial
expressions and pauses, and his environment.
Interviewing may be used either as a main method or as a supplementary one in studies of persons.
Interviewing is the only suitable method for gathering information from illiterate or less educated
respondents. It is useful for collecting a wide range of data, from factual demographic data to highly personal
and intimate information relating to a person’s opinions, attitudes, values, beliefs, experiences and future
intentions. Interviewing is appropriate when qualitative information is required, or probing is necessary to
draw out the respondent fully. Where the area covered for the survey is compact, or when a sufficient
number of qualified interviewers are available, personal interview is feasible.
Interview is often superior to other data-gathering methods. People are usually more willing to talk than to
write. Once rapport is established, even confidential information may be obtained. It permits probing into the
context and reasons for answers to questions.
Interview can add flesh to statistical information. It enables the investigator to grasp the behavioral context of
the data furnished by the respondents. It permits the investigator to seek clarifications and brings to the
forefront those questions, which for some reason or the other the respondents do not want to answer.
Interviewing as a method of data collection has certain characteristics. They are:
1. The participants – the interviewer and the respondent – are strangers; hence, the
investigator has to get himself/herself introduced to the respondent in an appropriate manner.
2. The relationship between the participants is a transitory one. It has a fixed beginning
and termination points. The interview proper is a fleeting, momentary experience for them.
3. The interview is not a mere casual conversational exchange, but a conversation with
a specific purpose, viz., obtaining information relevant to a study.
4. The interview is a mode of obtaining verbal answers to questions put verbally.
5. The interaction between the interviewer and the respondent need not necessarily be
on a face-to-face basis, because the interview can also be conducted over the telephone.
6. Although the interview is usually a conversation between two persons, it need not be
limited to a single respondent. It can also be conducted with a group of persons, such as family
members, or a group of children, or a group of customers, depending on the requirements of the
study.
7. The interview is an interactive process. The interaction between the interviewer and
the respondent depends upon how they perceive each other.
8. The respondent reacts to the interviewer’s appearance, behavior, gestures, facial
expression and intonation, his perception of the thrust of the questions and his own personal needs.
As far as possible, the interviewer should try to be closer to the social-economic level of the
respondents.
9. The investigator records information furnished by the respondent in the interview.
This poses a problem of seeing that recording does not interfere with the tempo of conversation.
10. Interviewing is not a standardized process like that of a chemical technician; it is
rather a flexible, psychological process.
3 Telephone Interviewing Telephone interviewing is a non-personal method of data collection. It may be

used as a major method or as a supplementary method. It will be useful in the following situations:
Page 11
2010(Fall)
11. When the universe is composed of those persons whose names are listed in
telephone directories, e.g. business houses, business executives, doctors and other professionals.
12. When the study requires responses to five or six simple questions, e.g. a radio or
television program survey.
13. When the survey must be conducted in a very short period of time, provided the
units of study are listed in the telephone directory.
14. When the subject is interesting or important to respondents, e.g. a survey relating to
trade conducted by a trade association or a chamber of commerce, a survey relating to a profession
conducted by the concerned professional association.
15. When the respondents are widely scattered and when there are many call backs to
make.
4 Group Interviews A group interview may be defined as a method of collecting primary data in which a
number of individuals with a common interest interact with each other. In a personal interview, the flow of
information is multi dimensional. The group may consist of about six to eight individuals with a common
interest. The interviewer acts as the discussion leader. Free discussion is encouraged on some aspect of the
subject under study. The discussion leader stimulates the group members to interact with each other. The
desired information may be obtained through self-administered questionnaire or interview, with the
discussion serving as a guide to ensure consideration of the areas of concern. In particular, the interviewers
look for evidence of common elements of attitudes, beliefs, intentions and opinions among individuals in the
group. At the same time, he must be aware that a single comment by a member can provide important
insight. Samples for group interviews can be obtained through schools, clubs and other organized groups.
5 Mail Survey The mail survey is another method of collecting primary data. This method involves sending
questionnaires to the respondents with a request to complete them and return them by post. This can be
used in the case of educated respondents only. The mail questionnaires should be simple so that the
respondents can easily understand the questions and answer them. It should preferably contain mostly
closed-ended and multiple choice questions, so that it could be completed within a few minutes. The
distinctive feature of the mail survey is that the questionnaire is self-administered by the respondents
themselves and the responses are recorded by them and not by the investigator, as in the case of personal
interview method. It does not involve face-to-face conversation between the investigator and the respondent.
Communication is carried out only in writing and this requires more cooperation from the respondents than
verbal communication. The researcher should prepare a mailing list of the selected respondents, by
collecting the addresses from the telephone directory of the association or organization to which they belong.
The following procedures should be followed -  a covering letter should accompany a copy of the
questionnaire. It must explain to the respondent the purpose of the study and the importance of his
cooperation to the success of the project.  Anonymity must be assured.  The sponsor’s identity may be
revealed. However, when such information may bias the result, it is not desirable to reveal it. In this case, a
disguised organization name may be used.  A self-addressed stamped envelope should be enclosed in the
covering letter.
1 After a few days from the date of mailing the questionnaires to the respondents, the researcher can
expect the return of completed ones from them. The progress in return may be watched and at the
appropriate stage, follow-up efforts can be made.
The response rate in mail surveys is generally very low in developing countries like India. Certain techniques
have to be adopted to increase the response rate. They are:
11. Quality printing: The questionnaire may be neatly printed on quality light colored paper, so as to
attract the attention of the respondent.
2
32. Covering letter: The covering letter should be couched in a pleasant style, so as to attract and hold
the interest of the respondent. It must anticipate objections and answer them briefly. It is desirable to
address the respondent by name.
Page 12
2010(Fall)
43. Advance information: Advance information can be provided to potential respondents by a

telephone call, or advance notice in the newsletter of the concerned organization, or by a letter. Such
preliminary contact with potential respondents is more successful than follow-up efforts.
54. Incentives: Money, stamps for collection and other incentives are also used to induce respondents
to complete and return the mail questionnaire.
65. Follow-up-contacts: In the case of respondents belonging to an organization, they may be
approached through someone in that organization known as the researcher.
7
86. Larger sample size: A larger sample may be drawn than the estimated sample size. For example, if
the required sample size is 1000, a sample of 1500 may be drawn. This may help the researcher to
secure an effective sample size closer to the required size.
9
Q 6. Case Study: You are engaged to carry out a market survey on behalf of a leading
Newspaper that is keen to increase its circulation in Bangalore City, in order to ascertain
reader habits and interests. Develop a title for the study; define the research problem and
the objectives or questions to be answered by the study.
Ans.: Title: Newspaper reading choices
Research problem: A research problem is the situation that causes the researcher to feel apprehensive,
confused and ill at ease. It is the demarcation of a problem area within a certain context involving the WHO
or WHAT, the WHERE, the WHEN and the WHY of the problem situation.
There are many problem situations that may give rise to research. Three sources usually contribute to
problem identification. Own experience or the experience of others may be a source of problem supply. A
second source could be scientific literature. You may read about certain findings and notice that a certain
field was not covered. This could lead to a research problem. Theories could be a third source.
Shortcomings in theories could be researched.
Research can thus be aimed at clarifying or substantiating an existing theory, at clarifying contradictory
findings, at correcting a faulty methodology, at correcting the inadequate or unsuitable use of statistical
techniques, at reconciling conflicting opinions, or at solving existing practical problems
Types of questions to be asked :For more than 35 years, the news about newspapers and young readers
has been mostly bad for the newspaper industry. Long before any competition from cable television or
Nintendo, American newspaper publishers were worrying about declining readership among the young.
As early as 1960, at least 20 years prior to Music Television (MTV) or the Internet, media research scholars1
began to focus their studies on young adult readers' decreasing interest in newspaper content. The concern
over a declining youth market preceded and perhaps foreshadowed today's fretting over market penetration.
Even where circulation has grown or stayed stable, there is rising concern over penetration, defined as the
percentage of occupied households in a geographic market that are served by a newspaper.2 Simply put,
population growth is occurring more rapidly than newspaper readership in most communities.
This study looks at trends in newspaper readership among the 18-to-34 age group and examines some of
the choices young adults make when reading newspapers.
One of the underlying concerns behind the decline in youth newspaper reading is the question of how young
people view the newspaper. A number of studies explored how young readers evaluate and use newspaper
content.
Comparing reader content preferences over a 10-year period, Gerald Stone and Timothy Boudreau found
differences between readers ages 18-34 and those 35-plus.16 Younger readers showed increased interest in
Page 13
2010(Fall)
national news, weather, sports, and classified advertisements over the decade between 1984 and 1994,
while older readers ranked weather, editorials, and food advertisements higher. Interest in international news
and letters to the editor was less among younger readers, while older readers showed less interest in reports
of births, obituaries, and marriages.
David Atkin explored the influence of telecommunication technology on newspaper readership among
students in undergraduate media courses.17 He reported that computer-related technologies, including
electronic mail and computer networks, were unrelated to newspaper readership. The study found that
newspaper subscribers preferred print formats over electronic. In a study of younger, school-age children,
Brian Brooks and James Kropp found that electronic newspapers could persuade children to become news
consumers, but that young readers would choose an electronic newspaper over a printed one.18
In an exploration of leisure reading among college students, Leo Jeffres and Atkin assessed dimensions of
interest in newspapers, magazines, and books,19 exploring the influence of media use, non-media leisure,
and academic major on newspaper content preferences. The study discovered that overall newspaper
readership was positively related to students' focus on entertainment, job / travel information, and public
affairs. However, the students' preference for reading as a leisure-time activity was related only to a public
affairs focus. Content preferences for newspapers and other print media were related. The researchers
found no significant differences in readership among various academic majors, or by gender, though there
was a slight correlation between age and the public affairs readership index, with older readers more
interested in news about public affairs.
Methodology
Sample
Participants in this study (N=267) were students enrolled in 100- and 200-level English courses at a
midwestern public university. Courses that comprise the framework for this sample were selected because
they could fulfill basic studies requirements for all majors. A basic studies course is one that is listed within
the core curriculum required for all students. The researcher obtained permission from seven professors to
distribute questionnaires in the eight classes during regularly scheduled class periods. The students'
participation was voluntary; two students declined. The goal of this sampling procedure was to reach a cross-
section of students representing various fields of study. In all, 53 majors were represented.
Of the 267 students who participated in the study, 65 (24.3 percent) were male and 177 (66.3 percent) were
female. A total of 25 participants chose not to divulge their genders. Ages ranged from 17 to 56, with a mean
age of 23.6 years. This mean does not include the 32 respondents who declined to give their ages. A total of
157 participants (58.8 percent) said they were of the Caucasian race, 59 (22.1 percent) African American, 10
(3.8 percent) Asian, five (1.9 percent) African/Native American, two (.8 percent) Hispanic, two (.8 percent)
Native American, and one (.4 percent) Arabic. Most (214) of the students were enrolled full time, whereas a
few (28) were part-time students. The class rank breakdown was: freshmen, 45 (16.9 percent); sophomores,
15 (5.6 percent); juniors, 33 (12.4 percent); seniors, 133 (49.8 percent); and graduate students, 16 (6
percent).
Procedure
After two pre-tests and revisions, questionnaires were distributed and collected by the investigator. In each of
the eight classes, the researcher introduced herself to the students as a journalism professor who was
conducting a study on students' use of newspapers and other media. Each questionnaire included a cover
letter with the researcher's name, address, and phone number. The researcher provided pencils and was
available to answer questions if anyone needed further assistance. The average time spent on the
questionnaires was 20 minutes, with some individual students taking as long as an hour. Approximately six
students asked to take the questionnaires home to finish. They returned the questionnaires to the
researcher's mailbox within a couple of day.
Page 14
2010(Fall)
Assignment (Set-2)
Research Methodology
Q 1.Discuss the relative advantages and disadvantages of the different methods of

distributing questionnaires to the respondents of a study.
Ans.: There are some alternative methods of distributing questionnaires to the respondents. They are:
1) Personal delivery,
2) Attaching the questionnaire to a product,
3) Advertising the questionnaire in a newspaper or magazine, and
4) News-stand inserts.
Personal delivery: The researcher or his assistant may deliver the questionnaires to the potential
respondents, with a request to complete them at their convenience. After a day or two, the completed
questionnaires can be collected from them. Often referred to as the self-administered questionnaire method,
it combines the advantages of the personal interview and the mail survey. Alternatively, the questionnaires
may be delivered in person and the respondents may return the completed questionnaires through mail.
Attaching questionnaire to a product: A firm test marketing a product may attach a questionnaire to a
product and request the buyer to complete it and mail it back to the firm. A gift or a discount coupon usually
rewards the respondent.
Advertising the questionnaire: The questionnaire with the instructions for completion may be advertised on
a page of a magazine or in a section of newspapers. The potential respondent completes it, tears it out and
mails it to the advertiser. For example, the committee of Banks Customer Services used this method for
collecting information from the customers of commercial banks in India. This method may be useful for large-
scale studies on topics of common interest. Newsstand inserts: This method involves inserting the covering
letter, questionnaire and self addressed reply-paid envelope into a random sample of newsstand copies of a
newspaper or magazine.
Advantages and Disadvantages:
The advantages of Questionnaire are:
Page 15
2010(Fall)
 this method facilitates collection of more accurate data for longitudinal studies than any other method,
because under this method, the event or action is reported soon after its occurrence.
 this method makes it possible to have before and after designs made for field based studies. For example,
the effect of public relations or advertising campaigns or welfare measures can be measured by collecting
data before, during and after the campaign.
 the panel method offers a good way of studying trends in events, behavior or attitudes. For example, a
panel enables a market researcher to study how brand preferences change from month to month; it enables
an economics researcher to study how employment, income and expenditure of agricultural laborers change
from month to month; a political scientist can study the shifts in inclinations of voters and the causative
influential factors during an election. It is also possible to find out how the constituency of the various
economic and social strata of society changes through time and so on.
 A panel study also provides evidence on the causal relationship between variables. For example, a cross
sectional study of employees may show an association between their attitude to their jobs and their positions
in the organization, but it does not indicate as to which comes first - favorable attitude or promotion. A panel
study can provide data for finding an answer to this question.
 It facilities depth interviewing, because panel members become well acquainted with the field workers and
will be willing to allow probing interviews.
The major limitations or problems of Questionnaire method are:
 this method is very expensive. The selection of panel members, the payment of premiums, periodic
training of investigators and supervisors, and the costs involved in replacing dropouts, all add to the
expenditure.
 it is often difficult to set up a representative panel and to keep it representative. Many persons may be
unwilling to participate in a panel study. In the course of the study, there may be frequent dropouts. Persons
with similar characteristics may replace the dropouts. However, there is no guarantee that the emerging
panel would be representative.
 A real danger with the panel method is “panel conditioning” i.e., the risk that repeated interviews may
sensitize the panel members and they become untypical, as a result of being on the panel. For example, the
members of a panel study of political opinions may try to appear consistent in the views they express on
consecutive occasions. In such cases, the panel becomes untypical of the population it was selected to
represent. One possible safeguard to panel conditioning is to give members of a panel only a limited panel
life and then to replace them with persons taken randomly from a reserve list.
Q 2. In processing data, what is the difference between measures of central tendency and
measures of dispersion? What is the most important measure of central tendency and
dispersion?
Ans.: Measures of Central tendency:
Arithmetic Mean : The arithmetic mean is the most common measure of central tendency. It simply the sum
of the numbers divided by the number of numbers. The symbol m is used for the mean of a population. The
symbol M is used for the mean of a sample. The formula for m is shown below: m=
ΣX
Page 16
2010(Fall)
Where ΣX is the sum of all the numbers in the numbers in the sample and N is the number of numbers in
the sample. As an example, the mean of the numbers 1+2+3+6+8=
20
=4 regardless of whether the numbers constitute the entire population or just a sample from the
population.
The table, Number of touchdown passes, shows the number of touchdown (TD) passes thrown by each of
the 31 teams in the National Football League in the 2000 season. The mean number of touchdown passes
thrown is 20.4516 as shown below. m=
ΣX
634
31
=20.4516
37 33 33 32 29 28 28 23
22 22 22 21 21 21 20 20
19 19 18 18 18 18 16 15
14 14 14 12 12 9 6
Table 1: Number of touchdown passes
Although the arithmetic mean is not the only "mean" (there is also a geometric mean), it is by far the most
commonly used. Therefore, if the term "mean" is used without specifying whether it is the arithmetic mean,
the geometric mean, or some other mean, it is assumed to refer to the arithmetic mean.
Median
The median is also a frequently used measure of central tendency. The median is the midpoint of a
distribution: the same number of scores is above the median as below it. For the data in the table, Number of
touchdown passes, there are 31 scores. The 16th highest score (which equals 20) is the median because
there are 15 scores below the 16th score and 15 scores above the 16th score. The median can also be
thought of as the 50th percentile.
Let's return to the made up example of the quiz on which you made a three discussed previously in the
module Introduction to Central Tendency and shown in Table 2.
Student Dataset 1 Dataset 2 Dataset 3
You 3 3 3
John's 3 4 2
Page 17
2010(Fall)
Student Dataset 1 Dataset 2 Dataset 3
You 3 3 3
Maria's 3 4 2
Shareecia's 3 4 2
Luther's 3 5 1
Table 2: Three possible datasets for the 5-point make-up quiz
For Dataset 1, the median is three, the same as your score. For Dataset 2, the median is 4. Therefore, your
score is below the median. This means you are in the lower half of the class. Finally for Dataset 3, the
median is 2. For this dataset, your score is above the median and therefore in the upper half of the
distribution.
Computation of the Median: When there is an odd number of numbers, the median is simply the middle
number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median
is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is
4+7
=5.5.
Mode
The mode is the most frequently occurring value. For the data in the table, Number of touchdown passes, the
mode is 18 since more teams (4) had 18 touchdown passes than any other number of touchdown passes.
With continuous data such as response time measured to many decimals, the frequency of each value is one
since no two scores will be exactly the same (see discussion of continuous variables). Therefore the mode of
continuous data is normally computed from a grouped frequency distribution. The Grouped frequency
distribution table shows a grouped frequency distribution for the target response time data. Since the interval
with the highest frequency is 600-700, the mode is the middle of that interval (650).
Range Frequency
500-600 3
600-700 6
700-800 5
800-900 5
900-1000 0
1000-1100 1
Table 3: Grouped frequency distribution
Measures of Dispersion: A measure of statistical dispersion is a real number that is zero if all the data are
identical, and increases as the data becomes more diverse. It cannot be less than zero.
Page 18
2010(Fall)
Most measures of dispersion have the same scale as the quantity being measured. In other words, if the
measurements have units, such as metres or seconds, the measure of dispersion has the same units. Such
measures of dispersion include:
• Standard deviation
• Interquartile range
• Range
• Mean difference
• Median absolute deviation
• Average absolute deviation (or simply called average deviation)
• Distance standard deviation
These are frequently used (together with scale factors) as estimators of scale parameters, in which capacity
they are called estimates of scale.
All the above measures of statistical dispersion have the useful property that they are location-invariant, as
well as linear in scale. So if a random variable X has a dispersion of SX then a linear transformation
Y = aX + b for real a and b should have dispersion SY = |a|SX.
Other measures of dispersion are dimensionless (scale-free). In other words, they have no units even if the
variable itself has units. These include:
• Coefficient of variation
• Quartile coefficient of dispersion
• Relative mean difference, equal to twice the Gini coefficient
There are other measures of dispersion:
• Variance (the square of the standard deviation) — location-invariant but not linear in scale.
• Variance-to-mean ratio — mostly used for count data when the term coefficient of dispersion is used
and when this ratio is dimensionless, as count data are themselves dimensionless: otherwise this is
not scale-free.
Some measures of dispersion have specialized purposes, among them the Allan variance and the Hadamard
variance.
For categorical variables, it is less common to measure dispersion by a single number. See qualitative
variation. One measure that does so is the discrete entropy.
Sources of statistical dispersion
In the physical sciences, such variability may result only from random measurement errors: instrument
measurements are often not perfectly precise, i.e., reproducible. One may assume that the quantity being
measured is unchanging and stable, and that the variation between measurements is due to observational
error.
In the biological sciences, this assumption is false: the variation observed might be intrinsic to the
phenomenon: distinct members of a population differ greatly. This is also seen in the arena of manufactured
products; even there, the meticulous scientist finds variation.The simple model of a stable quantity is
preferred when it is tenable. Each phenomenon must be examined to see if it warrants such a simplification.
Q 3. What are the characteristics of a good research design? Explain how the research
design for exploratory studies is different from the research design for descriptive and
diagnostic studies.
Page 19
2010(Fall)
Ans.: Good research design:Much contemporary social research is devoted to examining whether a
program, treatment, or manipulation causes some outcome or result. For example, we might wish to know
whether a new educational program causes subsequent achievement score gains, whether a special work
release program for prisoners causes lower recidivism rates, whether a novel drug causes a reduction in
symptoms, and so on. Cook and Campbell (1979) argue that three conditions must be met before we can
infer that such a cause-effect relation exists:
1. Covariation. Changes in the presumed cause must be related to changes in the presumed effect.
Thus, if we introduce, remove, or change the level of a treatment or program, we should observe
some change in the outcome measures.
2. Temporal Precedence. The presumed cause must occur prior to the presumed effect.
3. No Plausible Alternative Explanations. The presumed cause must be the only reasonable
explanation for changes in the outcome measures. If there are other factors, which could be
responsible for changes in the outcome measures, we cannot be confident that the presumed cause-
effect relationship is correct.
In most social research the third condition is the most difficult to meet. Any number of factors other than the
treatment or program could cause changes in outcome measures. Campbell and Stanley (1966) and later,
Cook and Campbell (1979) list a number of common plausible alternative explanations (or, threats to internal
validity). For example, it may be that some historical event which occurs at the same time that the program
or treatment is instituted was responsible for the change in the outcome measures; or, changes in record
keeping or measurement systems which occur at the same time as the program might be falsely attributed to
the program. The reader is referred to standard research methods texts for more detailed discussions of
threats to validity.
This paper is primarily heuristic in purpose. Standard social science methodology textbooks (Cook and
Campbell 1979; Judd and Kenny, 1981) typically present an array of research designs and the alternative
explanations, which these designs rule out or minimize. This tends to foster a "cookbook" approach to
research design - an emphasis on the selection of an available design rather than on the construction of an
appropriate research strategy. While standard designs may sometimes fit real-life situations, it will often be
necessary to "tailor" a research design to minimize specific threats to validity. Furthermore, even if standard
textbook designs are used, an understanding of the logic of design construction in general will improve the
comprehension of these standard approaches. This paper takes a structural approach to research design.
While this is by no means the only strategy for constructing research designs, it helps to clarify some of the
basic principles of design logic.
Minimizing Threats to Validity
Good research designs minimize the plausible alternative explanations for the hypothesized cause-effect
relationship. But such explanations may be ruled out or minimized in a number of ways other than by design.
The discussion, which follows, outlines five ways to minimize threats to validity, one of which is by research
design:
1. By Argument. The most straightforward way to rule out a potential threat to validity is to simply
argue that the threat in question is not a reasonable one. Such an argument may be made either a
priori or a posteriori, although the former will usually be more convincing than the latter. For
example, depending on the situation, one might argue that an instrumentation threat is not likely
because the same test is used for pre and post test measurements and did not involve observers
who might improve, or other such factors. In most cases, ruling out a potential threat to validity by
argument alone will be weaker than the other approaches listed below. As a result, the most
plausible threats in a study should not, except in unusual cases, be ruled out by argument only.
2. By Measurement or Observation. In some cases it will be possible to rule out a threat by

measuring it and demonstrating that either it does not occur at all or occurs so minimally as to not be
a strong alternative explanation for the cause-effect relationship. Consider, for example, a study of
the effects of an advertising campaign on subsequent sales of a particular product. In such a study,
Page 20
2010(Fall)
history (i.e., the occurrence of other events which might lead to an increased desire to purchase the
product) would be a plausible alternative explanation. For example, a change in the local economy,
the removal of a competing product from the market, or similar events could cause an increase in
product sales. One might attempt to minimize such threats by measuring local economic indicators
and the availability and sales of competing products. If there is no change in these measures
coincident with the onset of the advertising campaign, these threats would be considerably
minimized. Similarly, if one is studying the effects of special mathematics training on math
achievement scores of children, it might be useful to observe everyday classroom behavior in order
to verify that students were not receiving any additional math training to that provided in the study.
3. By Design. Here, the major emphasis is on ruling out alternative explanations by adding treatment
or control groups, waves of measurement, and the like. This topic will be discussed in more detail
below.
4. By Analysis. There are a number of ways to rule out alternative explanations using statistical
analysis. One interesting example is provided by Jurs and Glass (1971). They suggest that one
could study the plausibility of an attrition or mortality threat by conducting a two-way analysis of
variance. One factor in this study would be the original treatment group designations (i.e., program
vs. comparison group), while the other factor would be attrition (i.e., dropout vs. non-dropout group).
The dependent measure could be the pretest or other available pre-program measures. A main
effect on the attrition factor would be indicative of a threat to external validity or generalizability, while
an interaction between group and attrition factors would point to a possible threat to internal validity.
Where both effects occur, it is reasonable to infer that there is a threat to both internal and external
validity.
The plausibility of alternative explanations might also be minimized using covariance analysis. For
example, in a study of the effects of "workfare" programs on social welfare caseloads, one plausible
alternative explanation might be the status of local economic conditions. Here, it might be possible to
construct a measure of economic conditions and include that measure as a covariate in the
statistical analysis. One must be careful when using covariance adjustments of this type -- "perfect"
covariates do not exist in most social research and the use of imperfect covariates will not
completely adjust for potential alternative explanations. Nevertheless causal assertions are likely to
be strengthened by demonstrating that treatment effects occur even after adjusting on a number of
good covariates.
5. By Preventive Action. When potential threats are anticipated some type of preventive action can
often rule them out. For example, if the program is a desirable one, it is likely that the comparison
group would feel jealous or demoralized. Several actions can be taken to minimize the effects of
these attitudes including offering the program to the comparison group upon completion of the study
or using program and comparison groups which have little opportunity for contact and
communication. In addition, auditing methods and quality control can be used to track potential
experimental dropouts or to insure the standardization of measurement.
The five categories listed above should not be considered mutually exclusive. The inclusion of
measurements designed to minimize threats to validity will obviously be related to the design structure and is
likely to be a factor in the analysis. A good research plan should, where possible. make use of multiple
methods for reducing threats. In general, reducing a particular threat by design or preventive action will
probably be stronger than by using one of the other three approaches. The choice of which strategy to use
for any particular threat is complex and depends at least on the cost of the strategy and on the potential
seriousness of the threat.
Design Construction
Basic Design Elements. Most research designs can be constructed from four basic elements:
1. Time. A causal relationship, by its very nature, implies that some time has elapsed between the
occurrence of the cause and the consequent effect. While for some phenomena the elapsed time
might be measured in microseconds and therefore might be unnoticeable to a casual observer, we
Page 21
2010(Fall)
normally assume that the cause and effect in social science arenas do not occur simultaneously, In
design notation we indicate this temporal element horizontally - whatever symbol is used to indicate
the presumed cause would be placed to the left of the symbol indicating measurement of the effect.
Thus, as we read from left to right in design notation we are reading across time. Complex designs
might involve a lengthy sequence of observations and programs or treatments across time.
2. Program(s) or Treatment(s). The presumed cause may be a program or treatment under the
explicit control of the researcher or the occurrence of some natural event or program not explicitly
controlled. In design notation we usually depict a presumed cause with the symbol "X". When
multiple programs or treatments are being studied using the same design, we can keep the
programs distinct by using subscripts such as "X 1" or "X2". For a comparison group (i.e., one which
does not receive the program under study) no "X" is used.
3. Observation(s) or Measure(s). Measurements are typically depicted in design notation with the
symbol "O". If the same measurement or observation is taken at every point in time in a design, then
this "O" will be sufficient. Similarly, if the same set of measures is given at every point in time in this
study, the "O" can be used to depict the entire set of measures. However, if different measures are
given at different times it is useful to subscript the "O" to indicate which measurement is being given
at which point in time.
4. Groups or Individuals. The final design element consists of the intact groups or the individuals who
participate in various conditions. Typically, there will be one or more program and comparison
groups. In design notation, each group is indicated on a separate line. Furthermore, the manner in
which groups are assigned to the conditions can be indicated by an appropriate symbol at the
beginning of each line. Here, "R" will represent a group, which was randomly assigned, "N" will
depict a group, which was nonrandom assigned (i.e., a nonequivalent group or cohort) and a "C" will
indicate that the group was assigned using a cutoff score on a measurement.
Q 4. How is the Case Study method useful in Business Research? Give two specific
examples of how the case study method can be applied to business research.
Ans.: While case study writing may seem easy at first glance, developing an effective case study (also
called a success story) is an art. Like other marketing communication skills, learning how to write a case
study takes time. What’s more, writing case studies without careful planning usually results in sub optimal
results?
Savvy case study writers increase their chances of success by following these ten proven techniques for
writing an effective case study:
Page 22
2010(Fall)
Involve the customer throughout the process. Involving the customer throughout the case study
development process helps ensure customer cooperation and approval, and results in an improved case
study. Obtain customer permission before writing the document, solicit input during the development, and
secure approval after drafting the document.
• Write all customer quotes for their review. Rather than asking the customer to draft their quotes,
writing them for their review usually results in more compelling material.
Case Study Writing Ideas

• Establish a document template. A template serves as a roadmap for the case study process, and
ensures that the document looks, feels, and reads consistently. Visually, the template helps build the
brand; procedurally, it simplifies the actual writing. Before beginning work, define 3-5 specific
elements to include in every case study, formalize those elements, and stick to them.
• Start with a bang. Use action verbs and emphasize benefits in the case study title and subtitle.
Include a short (less than 20-word) customer quote in larger text. Then, summarize the key points of
the case study in 2-3 succinct bullet points. The goal should be to tease the reader into wanting to
read more.
• Organize according to problem, solution, and benefits. Regardless of length, the time-tested,
most effective organization for a case study follows the problem-solution-benefits flow. First,
describe the business and/or technical problem or issue; next, describe the solution to this problem
or resolution of this issue; finally, describe how the customer benefited from the particular solution
(more on this below). This natural story-telling sequence resonates with readers.
• Use the general-to-specific-to-general approach. In the problem section, begin with a general
discussion of the issue that faces the relevant industry. Then, describe the specific problem or issue
that the customer faced. In the solution section, use the opposite sequence. First, describe how the
solution solved this specific problem; then indicate how it can also help resolve this issue more
broadly within the industry. Beginning more generally draws the reader into the story; offering a
specific example demonstrates, in a concrete way, how the solution resolves a commonly faced
issue; and concluding more generally allows the reader to understand how the solution can also
address their problem.
• Quantify benefits when possible. No single element in a case study is more compelling than the
ability to tie quantitative benefits to the solution. For example, “Using Solution X saved Customer Y
over $ZZZ, ZZZ after just 6 months of implementation;” or, “Thanks to Solution X, employees at
Customer Y have realized a ZZ% increase in productivity as measured by standard performance
indicators.” Quantifying benefits can be challenging, but not impossible. The key is to present
imaginative ideas to the customer for ways to quantify the benefits, and remain flexible during this
discussion. If benefits cannot be quantified, attempt to develop a range of qualitative benefits; the
latter can be quite compelling to readers as well.
• Use photos. Ask the customer if they can provide shots of personnel, ideally using the solution. The
shots need not be professionally done; in fact, “homegrown” digital photos sometimes lead to
surprisingly good results and often appear more genuine. Photos further personalize the story and
help form a connection to readers.
• Reward the customer. After receiving final customer approval and finalizing the case study, provide
a pdf, as well as printed copies, to the customer. Another idea is to frame a copy of the completed
case study and present it to the customer in appreciation for their efforts and cooperation.
Writing a case study is not easy. Even with the best plan, a case study is doomed to failure if the writer lacks
the exceptional writing skills, technical savvy, and marketing experience that these documents require. In
many cases, a talented writer can mean the difference between an ineffective case study and one that
provides the greatest benefit. If a qualified internal writer is unavailable, consider outsourcing the task to
professionals who specialize in case study writing.
Q 5. What are the differences between observation and interviewing as methods of data
collection? Give two specific examples of situations where either observation or
interviewing would be more appropriate.
Page 23
2010(Fall)
Ans.: Observation means viewing or seeing. Observation may be defined as a systematic viewing of a
specific phenomenon on its proper setting for the specific purpose of gathering data for a particular study.
Observation is classical method of scientific study.
The prerequisites of observation consist of:
• Observations must be done under conditions, which will permit accurate results. The observer must
be in vantage point to see clearly the objects to be observed. The distance and the light must be
satisfactory. The mechanical devices used must be in good working conditions and operated by
skilled persons.
• Observation must cover a sufficient number of representative samples of the cases.
• Recording should be accurate and complete.
• The accuracy and completeness of recorded results must be checked. A certain number of cases
can be observered again by another observer/another set of mechanical devices as the case may
be. If it is feasible two separate observers and set of instruments may be used in all or some of the
original observations. The results could then be compared to determine their accuracy and
completeness.
Advantages of observation
o The main virtue of observation is its directness it makes it possible to study behavior as it
occurs. The researcher needs to ask people about their behavior and interactions he can
simply watch what they do and say.
o Data collected by observation may describe the observed phenomena as they occur in their
natural settings. Other methods introduce elements or artificiality into the researched
situation for instance in interview the respondent may not behave in a natural way. There is
no such artificiality in observational studies especially when the observed persons are not
aware of their being observed.
o Observations in more suitable for studying subjects who are unable to articulate
meaningfully e.g. studies of children, tribal animals, birds etc.
o Observations improve the opportunities for analyzing the contextual back ground of
behavior. Furthermore verbal resorts can be validated and compared with behavior through
observation. The validity of what men of position and authority say can be verified by
observing what they actually do.
o Observations make it possible to capture the whole event as it occurs. For example only
observation can be providing an insight into all the aspects of the process of negotiation
between union and management representatives.
o Observation is less demanding of the subjects and has less biasing effect on their conduct
than questioning.
o It is easier to conduct disguised observation studies than disguised questioning.
o Mechanical devices may be used for recording data in order to secure more accurate data
and also of making continuous observations over longer periods.
Interviews are a crucial part of the recruitment process for all Organisations. Their purpose is to give the
interviewer(s) a chance to assess your suitability for the role and for you to demonstrate your abilities and
Page 24
2010(Fall)
personality. As this is a two-way process, it is also a good opportunity for you to ask questions and to make
sure the organisation and position are right for you.
Interview format
Interviews take many different forms. It is a good idea to ask the organisation in advance what format the
interview will take.
• Competency/criteria based interviews - These are structured to reflect the competencies or

qualities that an employer is seeking for a particular job, which will usually have been detailed in the
job specification or advert. The interviewer is looking for evidence of your skills and may ask such
things as: ‘Give an example of a time you worked as part of a team to achieve a common goal.’
The organisation determines the selection criteria based on the roles they are recruiting for and
then, in an interview, examines whether or not you have evidence of possessing these.
• Technical interviews - If you have applied for a job or course that requires technical knowledge, it is
likely that you will be asked technical questions or has a separate technical interview. Questions may
focus on your final year project or on real or hypothetical technical problems. You should be
prepared to prove yourself, but also to admit to what you do not know and stress that you are keen
to learn. Do not worry if you do not know the exact answer - interviewers are interested in your
thought process and logic.
• Academic interviews - These are used for further study or research positions. Questions are likely
to center on your academic history to date.
• Structured interviews - The interviewer has a set list of questions, and asks all the candidates the
same questions.
• Formal/informal interviews - Some interviews may be very formal, while others will feel more like
an informal chat about you and your interests. Be aware that you are still being assessed, however
informal the discussion may seem.
• Portfolio based interviews - If the role is within the arts, media or communications industries, you
may be asked to bring a portfolio of your work to the interview, and to have an in-depth discussion
about the pieces you have chosen to include.
• Senior/case study interviews - These ranges from straightforward scenario questions (e.g. ‘What
would you do in a situation where…?’) to the detailed analysis of a hypothetical business problem.
You will be evaluated on your analysis of the problem, how you identify the key issues, how you
pursue a particular line of thinking and whether you can develop and present an appropriate
framework for organising your thoughts.
Specific types of interview
The Screening Interview
Companies use screening tools to ensure that candidates meet minimum qualification requirements.
Computer programs are among the tools used to weed out unqualified candidates. (This is why you need a
digital resume that is screening-friendly. See our resume center for help.) Sometimes human professionals
are the gatekeepers. Screening interviewers often have honed skills to determine whether there is anything
that might disqualify you for the position. Remember-they does not need to know whether you are the best fit
for the position, only whether you are not a match. For this reason, screeners tend to dig for dirt. Screeners
will hone in on gaps in your employment history or pieces of information that look inconsistent. They also will
want to know from the outset whether you will be too expensive for the company.
Some tips for maintaining confidence during screening interviews:
• Highlight your accomplishments and qualifications.

• Get into the straightforward groove. Personality is not as important to the screener as verifying your
qualifications. Answer questions directly and succinctly. Save your winning personality for the person
making hiring decisions!
• Be tactful about addressing income requirements. Give a range, and try to avoid giving specifics by
replying, "I would be willing to consider your best offer."
Page 25
2010(Fall)
• If the interview is conducted by phone, it is helpful to have note cards with your vital information
sitting next to the phone. That way, whether the interviewer catches you sleeping or vacuuming the
floor, you will be able to switch gears quickly.
The Informational Interview
On the opposite end of the stress spectrum from screening interviews is the informational interview. A
meeting that you initiate, the informational interview is underutilized by job-seekers who might otherwise
consider themselves savvy to the merits of networking. Job seekers ostensibly secure informational
meetings in order to seek the advice of someone in their current or desired field as well as to gain further
references to people who can lend insight. Employers that like to stay apprised of available talent even when
they do not have current job openings, are often open to informational interviews, especially if they like to
share their knowledge, feel flattered by your interest, or esteem the mutual friend that connected you to
them. During an informational interview, the jobseeker and employer exchange information and get to know
one another better without reference to a specific job opening.
This takes off some of the performance pressure, but be intentional nonetheless:
• Come prepared with thoughtful questions about the field and the company.
• Gain references to other people and make sure that the interviewer would be comfortable if you
contact other people and use his or her name.
• Give the interviewer your card, contact information and resume.
• Write a thank you note to the interviewer.
The Directive Style
In this style of interview, the interviewer has a clear agenda that he or she follows unflinchingly. Sometimes
companies use this rigid format to ensure parity between interviews; when interviewers ask each candidate
the same series of questions, they can more readily compare the results. Directive interviewers rely upon
their own questions and methods to tease from you what they wish to know. You might feel like you are
being steam-rolled, or you might find the conversation develops naturally. Their style does not necessarily
mean that they have dominance issues, although you should keep an eye open for these if the interviewer
would be your supervisor.
Either way, remember:
• Flex with the interviewer, following his or her lead.

• Do not relinquish complete control of the interview. If the interviewer does not ask you for information
that you think is important to proving your superiority as a candidate, politely interject it.
The Meandering Style
This interview type, usually used by inexperienced interviewers, relies on you to lead the discussion. It might
begin with a statement like "tell me about yourself," which you can use to your advantage. The interviewer
might ask you another broad, open-ended question before falling into silence. This interview style allows you
tactfully to guide the discussion in a way that best serves you.
The following strategies, which are helpful for any interview, are particularly important when interviewers use
a non-directive approach:
• Come to the interview prepared with highlights and anecdotes of your skills, qualities and
experiences. Do not rely on the interviewer to spark your memory-jot down some notes that you can
reference throughout the interview.
• Remain alert to the interviewer. Even if you feel like you can take the driver's seat and go in any
direction you wish, remain respectful of the interviewer's role. If he or she becomes more directive
during the interview, adjust.
Page 26
2010(Fall)
• Ask well-placed questions. Although the open format allows you significantly to shape the interview,
running with your own agenda and dominating the conversation means that you run the risk of
missing important information about the company and its needs.
Q 6. Case Study: You are engaged to carry out a market survey on behalf of a leading
Newspaper that is keen to increase its circulation in Bangalore City, in order to ascertain
reader habits and interests. What type of research report would be most appropriate?
Develop an outline of the research report with the main sections.
Ans.: There are four major interlinking processes in the presentation of a literature review:
1. Critiquing rather than merely listing each item a good literature review is led by your own critical
thought processes - it is not simply a catalogue of what has been written.
Once you have established which authors and ideas are linked, take each group in turn and really
think about what you want to achieve in presenting them this way. This is your opportunity for
showing that you did not take all your reading at face value, but that you have the knowledge and
skills to interpret the authors' meanings and intentions in relation to each other, particularly if there
are conflicting views or incompatible findings in a particular area.
Rest assured that developing a sense of critical judgment in the literature surrounding a topic is a
gradual process of gaining familiarity with the concepts, language, terminology and conventions in
the field. In the early stages of your research you cannot be expected to have a fully developed
appreciation of the implications of all findings.
As you get used to reading at this level of intensity within your field you will find it easier and more
purposeful to ask questions as you read:
o What is this all about?

o Who is saying it and what authorities do they have?
o Why is it significant?
o What is its context?
o How was it reached?
o How valid is it?
o How reliable is the evidence?
o What has been gained?
o What do other authors say?
o How does it contribute?
o So what?
2. Structuring the fragments into a coherent body through your reading and discussions with your
supervisor during the searching and organising phases of the cycle, you will eventually reach a final
decision as to your own topic and research design.
As you begin to group together the items you read, the direction of your literature review will emerge
with greater clarity. This is a good time to finalise your concept map, grouping linked items, ideas
and authors into firm categories as they relate more obviously to your own study.
Now you can plan the structure of your written literature review, with your own intentions and
conceptual framework in mind. Knowing what you want to convey will help you decide the most
appropriate structure.
A review can take many forms; for example:
Page 27
2010(Fall)
o An historical survey of theory and research in your field

o A synthesis of several paradigms
o A process of narrowing down to your own topic
It is likely that your literature review will contain elements of all of these.
As with all academic writing, a literature review needs:
o An introduction
o A body
o A conclusion
The introduction sets the scene and lays out the various elements that are to be explored.
The body takes each element in turn, usually as a series of headed sections and subsections. The
first paragraph or two of each section mentions the major authors in association with their main ideas
and areas of debate. The section then expands on these ideas and authors, showing how each
relates to the others, and how the debate informs your understanding of the topic. A short conclusion
at the end of each section presents a synthesis of these linked ideas.
The final conclusion of the literature review ties together the main points from each of your sections
and this is then used to build the framework for your own study. Later, when you come to write the
discussion chapter of your thesis, you should be able to relate your findings in one-to-one
correspondence with many of the concepts or questions that were firmed up in the conclusion of
your literature review.
3. Controlling the 'voice' of your citations in the text (by selective use of direct quoting, paraphrasing
and summarizing)
You can treat published literature like any other data, but the difference is that it is not data you
generated yourself.
When you report on your own findings, you are likely to present the results with reference to their
source, for example:
o 'Table 2 shows that sixteen of the twenty subjects responded positively.'
When using published data, you would say:
o 'Positive responses were recorded for 80 per cent of the subjects (see table 2).'
o 'From the results shown in table 2, it appears that the majority of subjects responded
positively.'
In these examples your source of information is table 2. Had you found the same results on page 17
of a text by Smith published in 1988, you would naturally substitute the name, date and page number
for 'table 2'. In each case it would be your voice introducing a fact or statement that had been
generated somewhere else.
You could see this process as building a wall: you select and place the 'bricks' and your 'voice'
provides the ‘mortar’, which determines how strong the wall will be. In turn, this is significant in the
assessment of the merit and rigor of your work.
There are three ways to combine an idea and its source with your own voice:
o Direct quote
o Paraphrase
Page 28
2010(Fall)
o Summary
In each method, the author's name and publication details must be associated with the words in the
text, using an approved referencing system. If you don't do this you would be in severe breach of
academic convention, and might be penalized. Your field of study has its own referencing
conventions you should investigate before writing up your results.
Direct quoting repeats exact wording and thus directly represents the author:
o 'Rain is likely when the sky becomes overcast' (Smith 1988, page 27).
If the quotation is run in with your text, single quotation marks are used to enclose it, and it must be
an identical copy of the original in every respect.
Overuse or simple 'listing' of quotes can substantially weaken your own argument by silencing your
critical view or voice.
Paraphrasing is repeating an idea in your own words, with no loss of the author's intended meaning:
o As Smith (1988) pointed out in the late eighties, rain may well be indicated by the presence
of cloud in the sky.
Paraphrasing allows you to organize the ideas expressed by the authors without being rigidly
constrained by the grammar, tense and vocabulary of the original. You retain a degree of flexibility
as to whose voice comes through most strongly.
Summarizing means to shorten or crystallize a detailed piece of writing by restating the main points
in your own words and in the order in which you found them. The original writing is 'described' as if
from the outside, and it is your own voice that is predominant:
o Referring to the possible effects of cloudy weather, Smith (1988) predicted the likelihood of
rain.
o Smith (1988) claims that some degree of precipitation could be expected as the result of
clouds in the sky: he has clearly discounted the findings of Jones (1986).
4. Using appropriate language
Your writing style represents you as a researcher, and reflects how you are dealing with the
subtleties and complexities inherent in the literature.
Once you have established a good structure with appropriate headings for your literature review, and
once you are confident in controlling the voice in your citations, you should find that your writing
becomes more lucid and fluent because you know what you want to say and how to say it.
The good use of language depends on the quality of the thinking behind the writing, and on the
context of the writing. You need to conform to discipline-specific requirements. However, there may
still be some points of grammar and vocabulary you would like to improve. If you have doubts about
your confidence to use the English language well, you can help yourself in several ways:
o Ask for feedback on your writing from friends, colleagues and academics
o Look for specific language information in reference materials
o Access programs or self-paced learning resources which may be available on your campus
Grammar tips - practical and helpful
The following guidance on tenses and other language tips may be useful.
Which tense should I use?
Use present tense:
Page 29
2010(Fall)
o For generalizations and claims:

 The sky is blue.
o To convey ideas, especially theories, which exist for the reader at the time of reading:
 I think therefore I am.
o For authors' statements of a theoretical nature, which can then be compared on equal terms
with others:
 Smith (1988) suggests that...
o In referring to components of your own document:
 Table 2 shows...
Use present perfect tense for:
o Recent events or actions that are still linked in an unresolved way to the present:
 Several studies have attempted to...
Use simple past tense for:
o Completed events or actions:
 Smith (1988) discovered that...
Use past perfect tense for:
o Events which occurred before a specified past time:
 Prior to these findings, it had been thought that...
Use modals (may, might, could, would, should) to:
o Convey degrees of doubt
 This may indicate that ... this would imply that...

Other language tips
o Convey your meaning in the simplest possible way. Don't try to use an intellectual tone for
the sake of it, and do not rely on your reader to read your mind!
o Keep sentences short and simple when you wish to emphasise a point.
o Use compound (joined simple) sentences to write about two or more ideas which may be
linked with 'and', 'but', 'because', 'whereas' etc.
o Use complex sentences when you are dealing with embedded ideas or those that show the
interaction of two or more complex elements.
o Verbs are more dynamic than nouns, and nouns carry information more densely than verbs.
o Select active or passive verbs according to whether you are highlighting the 'doer' or the
'done to' of the action.
o Keep punctuation to a minimum. Use it to separate the elements of complex sentences in
order to keep subject, verb and object in clear view.
o Avoid densely packed strings of words, particularly nouns.
The total process

The story of a research study
Introduction
I looked at the situation and found that I had a question to ask about it. I wanted to investigate something in
particular.
Review of literature
So I read everything I could find on the topic - what was already known and said and what had previously
been found. I established exactly where my investigation would fit into the big picture, and began to realise at
this stage how my study would be different from anything done previously.
Methodology
I decided on the number and description of my subjects, and with my research question clearly in mind,
designed my own investigation process, using certain known research methods (and perhaps some that are
not so common). I began with the broad decision about which research paradigm I would work within (that is,
qualitative/quantitative, critical/interpretive/ empiricist). Then I devised my research instrument to get the best
out of what I was investigating. I knew I would have to analyse the raw data, so I made sure that the
instrument and my proposed method(s) of analysis were compatible right from the start. Then I carried out
the research study and recorded all the data in a methodical way according to my intended methods of
analysis. As part of the analysis, I reduced the data (by means of my preferred form of classification) to
Page 30
2010(Fall)
manageable thematic representation (tables, graphs, categories, etc). It was then that I began to realise what
I had found.
Findings/results
What had I found? What did the tables/graphs/categories etc. have to say that could be pinned down? It was
easy enough for me to see the salient points at a glance from these records, but in writing my report, I also
spelled out what I had found truly significant to make sure my readers did not miss it. For each display of
results, I wrote a corresponding summary of important observations relating only elements within my own set
of results and comparing only like with like. I was careful not to let my own interpretations intrude or voice my
excitement just yet. I wanted to state the facts - just the facts. I dealt correctly with all inferential statistical
procedures, applying tests of significance where appropriate to ensure both reliability and validity. I knew that
I wanted my results to be as watertight and squeaky clean as possible. They would carry a great deal more
credibility, strength and thereby academic 'clout' if I took no shortcuts and remained both rigorous and
scholarly.
Discussion
Now I was free to let the world know the significance of my findings. What did I find in the results that
answered my original research question? Why was I so sure I had some answers? What about the
unexplained or unexpected findings? Had I interpreted the results correctly? Could there have been any
other factors involved? Were my findings supported or contested by the results of similar studies? Where did
that leave mine in terms of contribution to my field? Can I actually generalise from my findings in a
breakthrough of some kind, or do I simply see myself as reinforcing existing knowledge? And so what, after
all? There were some obvious limitations to my study, which, even so, I'll defend to the hilt. But I won't
become over-apologetic about the things left undone, or the abandoned analyses, the fascinating byways
sadly left behind. I have my memories...
Conclusion
We'll take a long hard look at this study from a broad perspective. How does it rate? How did I end up
answering the question I first thought of? The conclusion needs to be a few clear, succinct sentences. That
way, I'll know that I know what I'm talking about. I'll wrap up with whatever generalizations I can make, and
whatever implications have arisen in my mind as a result of doing this thing at all. The more you find out, the
more questions arise. How I wonder what you are ... how I speculate. OK, so where do we all go from here?
Three stages of research

1. Reading
2. Research design and implementation
3. Writing up the research report or thesis
4.
Use an active, cyclical writing process: draft, check, reflect, revise, redraft.
Establishing good practice

1. Keep your research question always in mind.
2. Read widely to establish a context for your research.
3. Read widely to collect information, which may relate to your topic, particularly to your hypothesis or
research question.
4. Be systematic with your reading, note-taking and referencing records.
5. Train yourself to select what you do need and reject what you don't need.
6. Keep a research journal to reflect on your processes, decisions, state of mind, changes of mind,
reactions to experimental outcomes etc.
7. Discuss your ideas with your supervisor and interested others.
8. Keep a systematic log of technical records of your experimental and other research data,
remembering to date each entry, and noting any discrepancies or unexpected occurrences at the
time you notice them.
9. Design your research approaches in detail in the early stages so that you have frameworks to fit
findings into straightaway.
10. Know how you will analyse data so that your formats correspond from the start.
3Keep going back to the whole picture. Be thoughtful and think ahead about the way you will consider
and store new information as it comes to light.
Page 31
2010(Fall)
Assignment (Set-1)
Legal Aspects of Business
Q.1 Explain the concept and limitations of the theory of comparative costs.
Ans. Theory of comparative costs
In economics, the law of comparative advantage refers to the ability of a party (an individual, a firm, or a
country) to produce a particular good or service at a lower marginal cost and opportunity cost than another
party. It can be contrasted with absolute advantage which refers to the ability of a party to produce a
particular good at a lower absolute cost than another.
Comparative advantage explains how trade can create value for both parties even when one can produce all
goods with fewer resources than the other. The net benefits of such an outcome are called gains from trade.
Origins of the theory

David Ricardo explained comparative advantage in his 1817 book On the Principles of Political Economy and
Taxation in an example involving England and Portugal. In Portugal it is possible to produce both wine and
cloth with less labor than it would take to produce the same quantities in England. However the relative costs
of producing those two goods are different in the two countries. In England it is very hard to produce wine,
and only moderately difficult to produce cloth. In Portugal both are easy to produce. Therefore while it is
cheaper to produce cloth in Portugal than England, it is cheaper still for Portugal to produce excess wine,
and trade that for English cloth. Conversely, England benefits from this trade because its cost for producing
cloth has not changed but it can now get wine at a lower price, closer to the cost of cloth. The conclusion
Page 32
2010(Fall)
drawn is that each country can gain by specializing in the good where it has comparative advantage, and
trading that good for the other.
Example 1
Two men live alone on an isolated island. To survive they must undertake a few basic economic activities
like water carrying, fishing, cooking and shelter construction and maintenance. The first man is young,
strong, and educated. He is also, faster, better, more productive at everything. He has an absolute
advantage in all activities. The second man is old, weak, and uneducated. He has an absolute disadvantage
in all economic activities. In some activities the difference between the two is great; in others it is small.
Despite the fact that the younger man has absolute advantage in all activities, it is not in the interest of either
of them to work in isolation since they both can benefit from specialization and exchange. If the two men
divide the work according to comparative advantage then the young man will specialize in tasks at which he
is most productive, while the older man will concentrate on tasks where his productivity is only a little less
than that of the young man. Such an arrangement will increase total production for a given amount of labor
supplied by both men and it will benefit both of them.
Example 2
Suppose there are two countries of equal size, Northland and Southland, that both produce and consume
two goods, Food and Clothes. The productive capacities and efficiencies of the countries are such that if
both countries devoted all their resources to Food production, output would be as follows:
• Northland: 100 tonnes

• Southland: 400 tonnes
If all the resources of the countries were allocated to the production of Clothes, output would be:
• Northland: 100 tonnes
• Southland: 200 tonnes
Assuming each has constant opportunity costs of production between the two products and both economies
have full employment at all t imes. All factors of production are mobile within the countries between clothing
and food industries, but are immobile between the countries. The price mechanism must be working to
provide perfect competition. Southland has an absolute advantage over Northland in the production of Food
and Clothing. There seems to be no mutual benefit in trade between the economies, as Southland is more
efficient at producing both products. The opportunity costs shows otherwise. Northland's opportunity cost of
producing one tonne of Food is one tonne of Clothes and vice versa. Southland's opportunity cost of one
tonne of Food is 0.5 tonne of Clothes. The opportunity cost of one tonne of Clothes is 2 tonnes of Food.
Southland has a comparative advantage in food production, because of its lower opportunity cost of
production with respect to Northland. Northland has a comparative advantage over Southland in the
production of clothes, the opportunity cost of which is higher in Southland with respect to Food than in
Northland.
To show these different opportunity costs lead to mutual benefit if the countries specialize production and
trade, consider the countries produce and consume only domestically. The volumes are:
Production and consumption before trade

Food Clothes
Northland 50 50
Southland 200 100
TOTAL 250 150
This example includes no formulation of the preferences of consumers in the two economies which would
allow the determination of the international exchange rate of Clothes and Food. Given the production
capabilities of each country, in order for trade to be worthwhile Northland requires a price of at least one
tonne of Food in exchange for one tonne of Clothes; and Southland requires at least one tonne of Clothes for
two tonnes of Food. The exchange price will be somewhere between the two.
The remainder of the example works with an international trading price of one tonne of Food for 2/3 tonne of
Clothes. If both specialize in the goods in which they have comparative advantage, their outputs will be:
Production after trade
Page 33
2010(Fall)
Food Clothes
Northland 0 100
Southland 300 50
TOTAL 300 150
World production of food increased. Clothing production remained the same. Using the exchange rate of one
tonne of Food for 2/3 tonne of Clothes, Northland and Southland are able to trade to yield the following level
of consumption:
Consumption after trade

Food Clothes
Northland 75 50
Southland 225 100
World total 300 150
Northland traded 50 tonnes of Clothing for 75 tonnes of Food. Both benefited, and now consume at points
outside their production possibility frontiers.
Example 3
The economist Paul Samuelson provided another well known example in his Economics. Suppose that in a
particular city the best lawyer happens also to be the best secretary, that is he would be the most productive
lawyer and he would also be the best secretary in town. However, if this lawyer focused on the task of being
an attorney and, instead of pursuing both occupations at once, employed a secretary, both the output of the
lawyer and the secretary would increase. The example given by Greg Mankiw in his Economics textbook is
almost identical although instead of a lawyer and a secretary, it uses Tiger Woods who is supposed to be
both the best golf-player and the fastest lawnmower.
Limitation:
• Two countries, two goods - the theory is no different for larger numbers of countries and goods, but the
principles are clearer and the argument easier to follow in this simpler case.
• Equal size economies - again, this is a simplification to produce a clearer example.
• Full employment - if one or other of the economies has less than full employment of factors of production,
then this excess capacity must usually be used up before the comparative advantage reasoning can be
applied.
• Constant opportunity costs - a more realistic treatment of opportunity costs the reasoning is broadly the
same, but specialization of production can only be taken to the point at which the opportunity costs in the two
countries become equal. This does not invalidate the principles of comparative advantage, but it does limit
the magnitude of the benefit.
• Perfect mobility of factors of production within countries - this is necessary to allow production to be
switched without cost. In real economies this cost will be incurred: capital will be tied up in plant (sewing
machines are not sowing machines) and labour will need to be retrained and relocated. This is why it is
sometimes argued that 'nascent industries' should be protected from fully liberalised international trade
during the period in which a high cost of entry into the market (capital equipment, training) is being paid for.
• Immobility of factors of production between countries - why are there different rates of productivity?
The modern version of comparative advantage (developed in the early twentieth century by the Swedish
economists Eli Heckscher and Bertil Ohlin) attributes these differences to differences in nations' factor
endowments. A nation will have comparative advantage in producing the good that uses intensively the
factor it produces abundantly. For example: suppose the US has a relative abundance of capital and India
has a relative abundance of labor. Suppose further that cars are capital intensive to produce, while cloth is
labor intensive. Then the US will have a comparative advantage in making cars, and India will have a
comparative advantage in making cloth. If there is international factor mobility this can change nations'
relative factor abundance. The principle of comparative advantage still applies, but who has the advantage in
what can change.
• Negligible transport cost - Cost is not a cause of concern when countries decided to trade. It is ignored
and not factored in.
• Assume that half the resources are used to produce each good in each country. This takes place
before specialization
• Perfect competition - this is a standard assumption that allows perfectly efficient allocation of
productive resources in an idealized free market.
Page 34
2010(Fall)
Q.2 What are the different market entry strategies for a company which is interested to
enter International markets? Discuss briefly.
Ans: Definition
A market entry strategy is to find the best method of delivering your goods to your market
and of distributing them there. This applies to domestic and international sales.
The transactions associated with exporting are generally more complicated than those relating to the
domestic market. Remember you are dealing with different cultures underpinned by different legal systems.
Language barriers may also cause misunderstanding.
The market entry strategies into international markets can be listed as follows:
1. Direct Sales
This means total distribution and pricing control for the exporter; high profit potential due to elimination of any
middlemen. On the other hand, your company provides all services, including advertising, marketing,
customer service, translation, required labeling; you must become an expert in that market; credit risks are,
on average, the highest of any other strategy; potential sales volume is low.
2. Agent or Representative
An agent or representative is an individual or company legally authorized to act on your behalf in your target
market. If you perform due diligence and find the right agent, your export products are represented by an
expert in the local market with established customer contacts; sales potential increases. However, the
exporter must grant exclusive agreements regarding geographic regions or product lines; there is no control
over prices and profit rate is lowered due to sales commission.
3. Distributor
The exporter essentially deals with a single customer who takes ownership of the product, in exchange
assuming total responsibility for promotion, marketing, delivery, returns and customer relations. Sales
volume potential increases, credit risk decreases. But the relationship is harder to legally terminate than with
an agent or representative.
4. Licensing
When working on a licensing basis, the credit risk is low; there is a minimal level of commitment and risk for
the licensor company since the overseas licensee is responsible for all production, marketing, distribution,
credit and collections. Against this model, we have increased risks of loss of intellectual property. Despite
potential high sales volume, profit is limited to a small percentage on each sale.
5. Joint Venture
The production cost per unit can be significantly lowered by moving selected manufacturing overseas; higher
sales volume, market penetration and profit potential than any other strategy. However this results in a high
level of commitment, investment, resource allocation and risk. This high risk, high commitment, but
potentially high reward strategy is for exporters already experienced in the target market who are prepared to
walk the last meters to take maximum advantage of that market's potential. In some countries, a joint venture
is the only legal way for a foreign company to set up operations.
6. Franchising
This approach can result in a wide market coverage, quick market coverage; protection from copying and
reasonably profitable. You must consider also high cost of studying laws and regulations in different
countries; cost of frequent visits to support franchisees and potential to lose contract to major franchisee.
Franchising your idea, product and style of presentation to foreign franchisees carries with it a moderate
degree of risk.
7. Export Merchant
Page 35
2010(Fall)
You will have all financial and legal matters handled in your country, this helps reduce the risk level. But you
will never learn about exporting; there will be no input into marketing decision; there will be scant feedback
on product performance for future Research & Development.
8. Subsidiary or subcontracting
This requires either setting up your own facility or subcontracting the manufacturing of your products to an
assembly operator. It offers greater control over operations, lower transportation costs, low tariffs or duties
(as with imports), lower production costs and maybe foreign government investment incentives (e.g. tax
holidays). On the other hand, a subsidiary requires greater investment than joint ventures and licensing
manufacturing, a substantial commitment of time, exposition to local market risks, among other.
There are five points to be considered by firms for entry into new markets:
a. Technical innovation strategy - perceived and demonstrable superior products
b. Product adaptation strategy - modifications to existing products
c. Availability and security strategy - overcome transport risks by countering perceived risks
d. Low price strategy - penetration price and,
e. Total adaptation and conformity strategy - foreign producer gives a straight copy.
In marketing products from less developed countries to developed countries point “c” poses major problems.
Buyers in the interested foreign country are usually very careful as they perceive transport, currency, quality
and quantity problems.
9. Risks involved
The risks involved in a market entry strategy range from Systematic Credit Risk (different com Systemic
Risk), the Exchange Risk (also known as Currency Risk), the Liquidity Risk, the Country or Sovereign (or
Geographical) Risk and in some products, particularly agricultural commodities, we have the Weather Risk.
Q. 3 (a) What are the benefits of MNC’s
Ans: MNCs
A multinational corporation (MNC), also called a transnational corporation (TNC), or multinational enterprise
(MNE), is a corporation or an enterprise that manages production or delivers services in more than one
country. It can also be referred to as an international corporation. The International Labour Organization
(ILO) has defined[citation needed] an MNC as a corporation that has its management headquarters in one
country, known as the home country, and operates in several other countries, known as host countries.
The Dutch East India Company was the first multinational corporation in the world and the first company to
issue stock. It was also arguably the world's first mega corporation, possessing quasi-governmental powers,
including the ability to wage war, negotiate treaties, coin money, and establish colonies.[3]
The first modern multinational corporation is generally thought to be the East India Company.[4] Many
corporations have offices, branches or manufacturing plants in different countries from where their original
and main headquarters is located.
Benefits:
Multinational companies (MNCs) are not without benefits, which may be to the government, the economy,
and the people or even to itself. Cole (1996) stated that the size of multinational organization is enormous;
many of them have total sales well in excess of the GND of many of the world's nations. Cole also stated that
World Bank statistics of comparison between multinational companies and national GNPs shows, for
Page 36
2010(Fall)
example, that large oil firms such as Exxon and Shell are large in economic terms that nations such as South
Africa, Australia and Argentina are substantially greater than nations such as Greece, Bulgaria and Egypt.
Other large multinational companies include General Motors, British Petroleum, Ford and International
Business Machine (IBM). Some of the benefits of multinational companies are:
1. There is usually huge capital investment in major economic activities
2. The country enjoys varieties of products, services and facilities, brought to their door steps
3. There is creation of more jobs for the populace
4. The nation's pool of skills are best utilized and put to use effectively and efficiently
5. There is advancement in technology as these companies bring in state-of-the-art-technology for their

businesses
6. The demand for training and retraining and advancement in the people's education becomes absolutely
necessary. This will in turn help strengthen the economy of the nation
7. The living standard of the people is boosted
8. Friendliness between and among nations in trade i.e. it strengthen international relation
9. The balance of payments of nations in trade are improved on
In the words of Cole (1996), he stated that the sheer size (and wealth) of multinationals means that they can
have a significant effect on host country. To Cole, most of the effects are beneficial and include some of the
above or all. The Electronic Library of Scientific Literature (1996) explained the benefits of MNCs under a
theory known as 'The Theory of Externalities'. The theory considers the benefits of MNCs from the point of
view of those who maintain the importance of Foreign Direct Investment (FDI) as part of the engine
necessary for growth. In the contribution of Davies (1989), he gave some theories on the
benefits/advantages of multinational. Davies (1989:260) tagged this 'Economic Theory' and the multinational
where he took a comprehensive and critical look at the benefits of MNCs.
More benefits came along with these people's theories and some are:
1. There is significant injection into the local economy in respect to investment
2. Best utilization of the country's natural resources
3. They help in strengthening domestic competition
4. They are good source of technological expertise
5. Expansion of market in the host country
Q.3 ( b) Give a short note on OPEC.
Ans : OPEC :
The Organization of the Petroleum Exporting Countries (OPEC, pronounced /ˈo ʊpɛk/ OH-pek) is a cartel of
twelve developing countries made up of Algeria, Angola, Ecuador, Iran, Iraq, Kuwait, Libya, Nigeria, Qatar,
Saudi Arabia, the United Arab Emirates, and Venezuela. OPEC has maintained its headquarters in Vienna
since 1965, and hosts regular meetings among the oil ministers of its Member Countries. Indonesia withdrew
in 2008 after it became a net importer of oil, but stated it would likely return if it became a net exporter in the
world again.
Page 37
2010(Fall)
According to its statutes, one of the principal goals is the determination of the best means for safeguarding
the cartel's interests, individually and collectively. It also pursues ways and means of ensuring the
stabilization of prices in international oil markets with a view to eliminating harmful and unnecessary
fluctuations; giving due regard at all times to the interests of the producing nations and to the necessity of
securing a steady income to the producing countries; an efficient and regular supply of petroleum to
consuming nations, and a fair return on their capital to those investing in the petroleum industry.
OPEC's influence on the market has been widely criticized, since it became effective in determining
production and prices. Arab members of OPEC alarmed the developed world when they used the “oil
weapon” during the Yom Kippur War by implementing oil embargoes and initiating the 1973 oil crisis.
Although largely political explanations for the timing and extent of the OPEC price increases are also valid,
from OPEC’s point of view, these changes were triggered largely by previous unilateral changes in the world
financial system and the ensuing period of high inflation in both the developed and developing world. This
explanation encompasses OPEC actions both before and after the outbreak of hostilities in October 1973,
and concludes that “OPEC countries were only 'staying even' by dramatically raising the dollar price of oil.”
OPEC's ability to control the price of oil has diminished somewhat since then, due to the subsequent
discovery and development of large oil reserves in Alaska, the North Sea, Canada, the Gulf of Mexico, the
opening up of Russia, and market modernization. OPEC nations still account for two-thirds of the world's oil
reserves, and, as of April 2009, 33.3% of the world's oil production, affording them considerable control over
the global market. The next largest group of producers, members of the OECD and the Post-Soviet states
produced only 23.8% and 14.8%, respectively, of the world's total oil production. As early as 2003, concerns
that OPEC members had little excess pumping capacity sparked speculation that their influence on crude oil
prices would begin to slip.
Q.4. a. How will socio-cultural environment of a country have an impact on a multinational

business? Explain with an example.
Ans: Social and cultural environment
The socio-cultural environment of every nation is unique. Therefore, it is very essential for marketers to
consider the differences existing between the cultures in the home country and the host country.
The experience faced by The Coca Cola Company during the launch of its soft drink product in China is often
cited as an example for emphasizing the impact of cultural differences on global marketing. The company,
during the product’s launch in China, spelled Coca Cola as ‘Ke-Kou-ke-la’ in Chinese. Later, the company
searched from around 40,000 Chinese characters and came up with the word ‘ko-kou-ko-le,’ which when
translated in Chinese meant ‘happiness in the mouth.’
Equally important to the international manager are sociocultural elements. These include the attitudes,
values, norms, beliefs, behaviors, and demographic trends of the host country. Learning these things
frequently requires a good deal of self-awareness in order to recognize and control culturally specific
behaviors in one's self and in others. International managers must know how to relate to and motivate
foreign workers, since motivational techniques differ among countries. They must also understand how work
roles and attitudes differ. For instance, the boundaries and responsibilities of occupations sometimes have
subtle differences across cultures, even if they have equivalent names and educational requirements.
Managers must be attuned to such cultural nuances in order to function effectively. Moreover, managers
must keep perspective on cultural differences once they are identified and not subscribe to the fallacy that all
people in a foreign culture think and act alike.
The Dutch social scientist Geert Hofstede divided sociocultural elements into four categories: (1) power
distance, (2) uncertainty avoidance, (3) individualism-collectivism, and (4) masculinity-femininity.
International managers must understand all four elements in order to succeed.
Power distance is a cultural dimension that involves the degree to which individuals in a society accept
differences in the distribution of power as reasonable and normal. Uncertainty avoidance involves the extent
to which members of a society feel uncomfortable with and try to avoid situations that they see as
Page 38
2010(Fall)
unstructured, unclear, or unpredictable. Individualism-collectivism involves the degree to which individuals

concern themselves with their own interests and those of their immediate families as opposed to the interests
of a larger group. Finally, masculinity-femininity is the extent to which a society emphasizes traditional male
values, e.g., assertiveness, competitiveness, and material success, rather than traditional female values,
such as passivity, cooperation, and feelings. All of these dimensions can have a significant impact on a
manager's success in an international business environment.
The inability to understand the concepts Hofstede outlined can hinder managers' capacity to manage—and
their companies' chances of surviving in the international arena.
The social dimension or environment of a nation determines the value system of the society which, in turn
affects the functioning of the business. Sociological factors such as costs structure, customs and
conventions, cultural heritage, view toward wealth and income and scientific methods, respect for seniority,
mobility of labour etc. have far-reaching impact on the business. These factors determine the work culture
and mobility of labour, work groups etc. For instance, the nature of goods and services to be produced
depends upon the demand of the people which in turn is affected by their attitudes, customs, so as cultural
values fashion etc. Socio-cultural environment determines the code of conduct the business should follow.
The social groups such as trade unions or consumer forum will intervene if the business follows the unethical
practices. For instance, if the firm is not paying fair wages to its business in indulging in black marketing or
adulteration, consumers forums and various government agencies will take action against the business.
Q. 4(b). Discuss the origin of WTO and its principles.
Ans: WTO
The WTO is the successor to a previous trade agreement called the General Agreement on Tariffs
and Trade (GATT), which was created in 1948. The WTO has a larger membership than GATT, and covers
more subjects. Nevertheless, it was GATT that established, multilaterally, the principles underlying this
trading system. Box 3, on the next page, summarizes the history of GATT and the WTO. The WTO is both
an institution and a set of rules, called the “WTO law”. Each of the almost 150 WTO members are required to
implement these rules, and to provide other members with the specific trade benefits to which they have
committed themselves.
The main body of WTO law is composed of over sixty individual agreements and decisions. All of
these are overseen by councils and committees at the WTO’s headquarters in Geneva; the WTO doesn’t
have any local or regional offices. Large-scale negotiations, like the Doha Round, require their own special
negotiating forum. At least once every two years, WTO members meet at the ministerial level. For the rest of
the time, national delegates, who are usually diplomats and national trade officials,conduct the day-to-day
work. Box 2, below, shows the basic structure of WTO representative bodies.
All this amounts to a heavy burden for many small and poor WTO members. To help lighten the load
and ensure effective participation, technical assistance is available from the WTO and other international
agencies, including training courses for national trade officials. The assistance available, however, is
insufficient for a country like Cambodia to contribute actively in every area of the WTO. Cambodia will need
to prioritize its objectives in WTO membership, and the issues it raises before the organization. Occasionally
Cambodia may join a group, with other countries leading the negotiations. The group of least-developed
country WTO members1 works together when they have similar objectives. One recent example involved
seeking to make the WTO rules on special and differential treatment for developing countries more concrete
(see the section on non-discrimination on page 19). Additionally, a much larger group (the “G90”) of least-
developed and other relatively poor WTO members have worked together in the Doha Round negotiations,
particularly on agriculture.
PRIMARY WTO PRINCIPLES
A small number of relatively simple principles underlie the rules of the WTO as they affect Cambodia and, all
other members:
Page 39
2010(Fall)
1. LAWS AND REGULATIONS MUST BE TRANSPARENT
Transparency is the primary principle of the WTO. Nothing is more important to business people than
knowing and having confidence in the regulatory environment in which they operate, at home and overseas.
WTO agreements usually have some form of transparency requirement included that requires governments
and other authorities to publish all laws, regulations, and practices that can impact trade or investment.
2. NON-DISCRIMINATION
A second key principle of the WTO rulebook is non-discrimination. The principle applies at two levels.
At the first level, non-discrimination means that Cambodian goods cannot be discriminated against
in export markets with respect to the same goods arriving from competing countries. At the second
level, once they enter those export markets,Cambodian goods cannot be treated differently than
the same goods produced locally.
3. PROGRESSIVE TRADE LIBERALIZATION
A third principle is progressive trade liberalization through negotiation. The WTO is not a free-trade
agreement. As the following chapters will outline, there is scope for the legal protection of markets from
import competition. However, the underlying goal of the WTO is to create trade and investment through
increasingly open markets. Governments are free to open their markets independently of the WTO. After
accession, Cambodia can liberalize further to the extent, and at the speed, the government thinks is
appropriate.
4. SPECIAL AND DIFFERENTIAL TREATMENT
A fourth principle is of “special and differential treatment” for developing countries. In practice, this
permits easier conditions for poorer countries. This can mean not applying certain provisions of new
agreements to developing countries. It can also mean providing poorer nations with more time to implement
such provisions than for developed countries.This is an important aspect of the Doha Round.
Q. 5 (a). Explain the merits and demerits of BoP theory?
Ans: BOP Theory
A balance of payments (BOP) sheet is an accounting record of all monetary transactions between a country
and the rest of the world. These transactions include payments for the country's exports and imports of
goods, services, and financial capital, as well as financial transfers. The BOP summarises international
transactions for a specific period, usually a year, and is prepared in a single currency, typically the domestic
currency for the country concerned. Sources of funds for a nation, such as exports or the receipts of loans
and investments, are recorded as positive or surplus items. Uses of funds, such as for imports or to invest in
foreign countries, are recorded as a negative or deficit item.
When all components of the BOP sheet are included it must balance – that is, it must sum to zero – there
can be no overall surplus or deficit. For example, if a country is importing more than it exports, its trade
balance will be in deficit, but the shortfall will have to be counter balanced in other ways – such as by funds
earned from its foreign investments, by running down reserves or by receiving loans from other countries.
While the overall BOP sheet will always balance when all types of payments are included, imbalances are
possible on individual elements of the BOP, such as the current account. This can result in surplus countries
accumulating hoards of wealth, while deficit nations become increasingly indebted. Historically there have
been different approaches to the question of how to correct imbalances and debate on whether they are
something governments should be concerned about. With record imbalances held up as one of the
Page 40
2010(Fall)
contributing factors to the financial crisis of 2007–2010, plans to address global imbalances are now high on
the agenda of policy makers for 2010.
Q. 5 (b). Distinguish between fixed and flexible exchange rates.
Ans: Exchange rates allow trade between currencies. Exchange rates determine the value of money
when exchanged. A fixed exchange rate means the amount of currency received is set in advance. A floating
exchange rate means that the rate is moving and the currency received depends on the time of the
exchange.
Unitl 1971, governments with the major currencies in the world maintained fixed exchange rates. The rates
were originally based upon the price of gold, and then the value of the US dollar. Fixed exchange rates
allowed for stability. Everyone knew the cost of money. There was no uncertainty in the foreign trade of
goods. After 1971, governments with major currencies, such as the United States and European countries,
could no longer control the exchange rate and the rate was allowed to float. In many developing countries
governments continued to use a fixed exchange rate for their currency.
Fixed exchange rate

Fixed exchange rates are set by governments.A fixed exchange rate is based upon the government's view of
the value of its currency as well as the monetary policy. It has advantages. Stability is one. Another is
predictability. Businesses and individuals can plan their activities with the certainty of the value of money. A
businessman shipping goods overseas knows the value in advance. A tourist travelling in other countries can
budget knowing what his money will buy.
Floating exchange rate

The floating exchange rate, in its true form, allows the marketplace to set the rate. The forces of supply and
demand determine the value of a currency. For example, when the US dollar is considered strong it will take
more euros, the currency of most European countries, to buy. When the US dollar is considered weak or in
decline the amount of euros needed to buy it will fall.
In reality, floating rates do not solely change with the forces of the marketplace. Governments are constantly
trying to fix the floating rate by taking action in the marketplace. Government action cannot fix the rate, but it
can effect the rate through intervention. Such intervention involves either the buying or selling of currency,
depending on which way the government wants the rate to go. Some governments, like China, have a
modified fixed rate. They set a rate and then allow the rate to float within certain defined limits. Such limits
are usually very small. These small allowed changes mean that the rate will always come back to the set
figure after going up or down. For China, it is a way to put a small amount of free market in the currency
while maintaining government control.
Q. 6. Discuss the need for HRM Strategies and International employee relations strategies
in International business.
Ans: The environments within which international business is carried out in the first decade of the new
millenium is increasingly competitive.
The technological environment is such that technology supremacy is fleeting and since it does not last long -
cannot be considered a strong advantage of a company.
The economic environment is effected by too many uncontrollable factors which means a stable economic
situation is less certain. The economy can be effected negatively by things which large companies and
federal governments have no control over.
The political environment responds to the socio-cultural environment - which in many countries, is
undergoing the stresses of large immigration movements and cultural and religious frictions. Very few
regions of the world are free of conflict so no place has a distinctively advantageous political environment
Page 41
2010(Fall)
The geographic environment, long affected by rampant pollution, deforestation, greenhouses gases from
autos and factories, acid rain from coal fired generators, declining water reserves etc. etc. has seen a bit of
Mother Nature fighting back in 2003-2005 with some spectacular events such as a massive tidal wave,
numerous destructive tornadoes, larger and more frequent hurricanes, volcanoes, mudslides, sandstorms,
drought and crop failures an so on. As a consequence of the changes to and changes by the geographic
environment, almost everyplace on the planet has had to endure weather that has negatively effected
business and agricultural productivity.
The one area in which companies can become more competitive is having the best people and having those
people serve their customers in the best way.
Therefore one of the key things for companies in the "new new" economy is to focus on the people in the
company, and the customers they serve - ergo, Human Resource Management has become a "big issue" for
international business.
Although Dilbert has many jokes about Catbert, the "Evil H.R. Director", fact is, morale of employees is
increasingly important, especially in international business, since companies are more and more challenged
to cut expenses, and the # 1 expense cut is staff cuts - meaning, more productivity out of fewer people.
The way to get more productivity is partly by enhancing morale.
Expatriate Managers (expats) is simply defined as a citizen of one country working abroad in another
country. Another slang expression often used is "the expat community" to describe a group of foreign country
nationals, most often educated executives with good jobs, benefits and privelages, who can sometimes be
seen by the local population as behaving in a way that is "elite".
Types of Staffing policies
Ethnocentric Staffing Policy : when the company sends people from the country of the home company,
overseas.
Polycentric Staffing Policy: when the company allows local staff to rise to the executive level and be
managers.
Geocentric Staffing Policy: when the company uses staff in foreign operations - no matter what country
they come from.
Global HR Challenges
Things that make it difficult for companies to manage Human Resources situations in other countries.
o Compensation varies
o Labour Laws
o Social-Cultural Environment
Compensation varies :
Software development in India costs $15-$20/ hour including the cost of the hardware, software and a
satellite link. Compare this with $60-$80/ hour in the US
- still true in 2009?
from http://home.alltel.net/bsundquist1/gcib.html
Foundry workers (casting metal things) in India earn $1 for working an 8-hour day
Page 42
2010(Fall)
Mexican auto industry wages and benefits average $4.00 hr, vs. $30 hr in the US
Despite seemingly low wages and wretched conditions, Mexico is losing garment assembly jobs to Central
America, call centers to Argentina, data processing to India, and electronics manufacture to China
Joel Millman, David Luhnow, "Decade After NAFTA, Prospects for Mexico Seem to be Dimming", Wall Street
Journal, 2003
Japanese companies can hire 3 Chinese software engineers for the price of one in Japan (Thomas L.
Friedman, "Doing our homework", Pittsburgh Post Gazette, 6/25/04).
Vietnamese Nike workers earn $1.60/ day, while three simple meals cost $2.00
Labour Laws, Rules and Regulations :
Minimum wage in Mexico is $0.50/ hour. In one US auto plant in Mexico, the workers went on strike. The
Mexican police shot several and put the strikers back to work-and cut their wages 45%.
Nobody has ever been shot by police for being on strike in Canada !!!
Social-Cultural Environment
o Language issues
o Religious practices
Canada
- Christmas & New Years & Canada Day
+ Chinese New Year
+ Ramadan
+ Jewish High Holidays
o Gender issues
o Vacations and holidays
Social-Cultural Environment :
effects Canadian managers operating Canadian companies overseas
effects Canadian managers operating in large "multi-cultural" cities in Canada
effects Foreign Company managers operating Foreign Companiesin Canada
Canadian managers operating Canadian companies overseas
Canadian managers of "european background"
Canadian managers who's background matches the region in which the company operates eg. Canadian IT
company using Chinese-Canadian managers in China
Page 43
2010(Fall)
Canadian managers operating in large "multi-cultural" cities in Canada
Canadian managers of "european background"
trying to deal with employees of non-Northern European background - "Managing Diversity"
eg. Canadian Bank Vice Presidents dealing with branch managers from a "blended" community
Canadian managers operating in large "multi-cultural" cities in Canada
Canadian managers who's background matches the workers in the company eg. Canadian ISP company
using Canadian-Desi for call centre employees from from India, Pakistan, Bagladesh and Sri Lanka
Foreign Company managers operating Foreign Companies in Canada
Foreign Company managers using English to manage Canadians of European Background
Foreign Company managers communicating and managing employees in Canada, who do not have English
as a first language
eg. Japanese auto executives managing employes of South-Asian heritage
Social-Cultural Environment - "Managing Diversity"
"Programs or corporate environments that value multiculturalism must answer hard questions about
managing diversity."
Promoting Diversity
- equal treatment ?
- or differential treatment?
Antidiscrimination laws in Canada and other OECD countries require that employers do not treat applicants
for jobs, and employees, differently.
Treating people "equally" can be both a positive and negative for ethnic minorities and those who laud and
"celebrate diversity".
For example - If we treated people "equally", we'd have just one written drivers test - in English
from http://www.referenceforbusiness.com/encyclopedia/Mor-Off/
Multicultural-Workforce.html#MANAGING_DIVERSITY
Social-Cultural Environment - "Managing Diversity"
Page 44
2010(Fall)
"On the other hand, treating people differently often creates resentment and erodes morale with perceptions
of preferential treatment."
Some employees resent other employees who get special consideration for holidays, or prayer times, or
special food considerations.
"Other questions to be answered are: Will the company emphasize commonalities or differences in
facilitating a multicultural environment? Should the successful diverse workplace recognize differentiated
applicants as equals or some as unequals? How does the company achieve candor in breaking down
stereotypes and insensitivity towards women and minority groups?"
How do you make decisions about managing situations where it might be considered "favourtism" to make
allowances or considerations for a special category of "diversity"?
Assignment (Set-2)
Legal Aspects of Business
Q.1 Discuss the issues involved in international product policy and International branding
with a few examples.
Ans: International product policy
When going internationally product decisions are critical for the firm’s marketing activity, as they define its
business, customers, competitors, as well as the other marketing policies, such as pricing, distribution and
promotion.
Page 45
2010(Fall)
Improper product policy decisions are very easily made with negative consequences for the company as the
following examples illustrate:
• Ikea, the Swedish furniture chain insists that all its stores carry the basic product line with little or no
adaptation to local tastes. When it entered the USA market with the basic product line they did
not understand the reluctance of the USA customers to buy beds. Eventually the firm discovered
that the Ikea beds were a different size than the USA beds and the bed linen the consumer had
did not fit to the bed. They would have had to specially buy bed linen from Ikea to fit to the bed.
Ikea remedied the situation by ordering larger beds and bed linen from its suppliers.
• When Ford introduced the Pinto model in Brasil was unaware of the fact that pinto in the Brazilian
slang meant small male genitals. Not surprisingly sales were small. When the company found
out why the sales for the Pinto model were so small it changed its name to Corcel (that means
horse).
These examples show how easily companies, even the experienced ones commit international „blunders”,
and emphasize once again the importance of the product policy at international level. The main product
policy decisions that a company faces when going abroad comprises aspects such as:
1) What is the degree of adaptation /standardization of the company products on each foreign
market?
2) What are the products that the company is going to sell abroad (product portofolio decisions)?
3) What products have to be developed for what markets?
4) What is the branding strategy abroad?
We will start our discussion about product policy by first looking to what a product is and how it can be
defined.
Product Issues in International Marketing
Products and Services. Some marketing scholars and professionals tend to draw a strong distinction
between conventional products and services, emphasizing service characteristics such as heterogeneity
(variation in standards among providers, frequently even among different locations of the same firm),
inseperability from consumption, intangibility, and, in some cases, perishability—the idea that a service
cannot generally be created during times of slack and be “stored” for use later. However, almost all
products have at least some service component—e.g., a warranty, documentation, and distribution—and this
service component is an integral part of the product and its positioning. Thus, it may be more useful to look
at the product-service continuum as one between very low and very high levels of tangibility of the service.
Income tax preparation, for example, is almost entirely intangible—the client may receive a few printouts, but
most of the value is in the service. On the other hand, a customer who picks up rocks for construction from a
landowner gets a tangible product with very little value added for service. Firms that offer highly tangible
products often seek to add an intangible component to improve perception. Conversely, adding a tangible
element to a service—e.g., a binder with information—may address many consumers’ psychological need to
get something to show for their money.
On the topic of services, cultural issues may be even more prominent than they are for tangible goods. There
are large variations in willingness to pay for quality, and often very large differences in expectations. In some
countries, it may be more difficult to entice employees to embrace a firm’s customer service philosophy.
Labor regulations in some countries make it difficult to terminate employees whose treatment of customers is
substandard. Speed of service is typically important in the U.S. and western countries but personal
interaction may seem more important in other countries.
Page 46
2010(Fall)
Product Need Satisfaction. We often take for granted the “obvious” need that products seem to fill in our
own culture; however, functions served may be very different in others—for example, while cars have a large
transportation role in the U.S., they are impractical to drive in Japan, and thus cars there serve more of a role
of being a status symbol or providing for individual indulgence. In the U.S., fast food and instant drinks such
as Tang are intended for convenience; elsewhere, they may represent more of a treat. Thus, it is important
to examine through marketing research consumers’ true motives, desires, and expectations in buying a
product.
Approaches to Product Introduction. Firms face a choice of alternatives in marketing their products
across markets. An extreme strategy involves customization, whereby the firm introduces a unique product
in each country, usually with the belief tastes differ so much between countries that it is necessary more or
less to start from “scratch” in creating a product for each market. On the other extreme, standardization
involves making one global product in the belief the same product can be sold across markets without
significant modification—e.g., Intel microprocessors are the same regardless of the country in which they are
sold. Finally, in most cases firms will resort to some kind of adaptation, whereby a common product is
modified to some extent when moved between some markets—e.g., in the United States, where fuel is
relatively less expensive, many cars have larger engines than their comparable models in Europe and Asia;
however, much of the design is similar or identical, so some economies are achieved. Similarly, while
Kentucky Fried Chicken serves much the same chicken with the eleven herbs and spices in Japan, a lesser
amount of sugar is used in the potato salad, and fries are substituted for mashed potatoes.
There are certain benefits to standardization. Firms that produce a global product can obtain economies of
scale in manufacturing, and higher quantities produced also lead to a faster advancement along the
experience curve. Further, it is more feasible to establish a global brand as less confusion will occur when
consumers travel across countries and see the same product. On the down side, there may be significant
differences in desires between cultures and physical environments—e.g., software sold in the U.S. and
Europe will often utter a “beep” to alert the user when a mistake has been made; however, in Asia, where
office workers are often seated closely together, this could cause embarrassment.
Adaptations come in several forms. Mandatory adaptations involve changes that have to be made before
the product can be used—e.g., appliances made for the U.S. and Europe must run on different voltages, and
a major problem was experienced in the European Union when hoses for restaurant frying machines could
not simultaneously meet the legal requirements of different countries. “Discretionary” changes are changes
that do not have to be made before a product can be introduced (e.g., there is nothing to prevent an
American firm from introducing an overly sweet soft drink into the Japanese market), although products may
face poor sales if such changes are not made. Discretionary changes may also involve cultural adaptations
—e.g., in Sesame Street, the Big Bird became the Big Camel in Saudi Arabia.
Another distinction involves physical product vs. communication adaptations. In order for gasoline to be
effective in high altitude regions, its octane must be higher, but it can be promoted much the same way. On
the other hand, while the same bicycle might be sold in China and the U.S., it might be positioned as a
serious means of transportation in the former and as a recreational tool in the latter. In some cases,
products may not need to be adapted in either way (e.g., industrial equipment), while in other cases, it might
have to be adapted in both (e.g., greeting cards, where the both occasions, language, and motivations for
sending differ). Finally, a market may exist abroad for a product which has no analogue at home—e.g.,
hand-powered washing machines.
Branding. While Americans seem to be comfortable with category specific brands, this is not the case for
Asian consumers. American firms observed that their products would be closely examined by Japanese
consumers who could not find a major brand name on the packages, which was required as a sign of quality.
Note that Japanese keiretsus span and use their brand name across multiple industries—e.g., Mitsubishi,
among other things, sells food, automobiles, electronics, and heavy construction equipment.
Page 47
2010(Fall)
The International Product Life Cycle (PLC). Consumers in different countries differ in the speed with which
they adopt new products, in part for economic reasons (fewer Malaysian than American consumers can
afford to buy VCRs) and in part because of attitudes toward new products (pharmaceuticals upset the power
afforded to traditional faith healers, for example). Thus, it may be possible, when one market has been
saturated, to continue growth in another market—e.g., while somewhere between one third and one half of
American homes now contain a computer, the corresponding figures for even Europe and Japan are much
lower and thus, many computer manufacturers see greater growth potential there. Note that expensive
capital equipment may also cycle between countries—e.g., airlines in economically developed countries will
often buy the newest and most desired aircraft and sell off older ones to their counterparts in developing
countries. While in developed countries, “three part” canning machines that solder on the bottom with lead
are unacceptable for health reasons, they have found a market in developing countries.
Diffusion of innovation. Good new innovations often do not spread as quickly as one might expect—e.g.,
although the technology for microwave ovens has existed since the 1950s, they really did not take off in the
United States until the late seventies or early eighties, and their penetration is much lower in most other
countries. The typewriter, telephone answering machines, and cellular phones also existed for a long time
before they were widely adopted.
Certain characteristics of products make them more or less likely to spread. One factor is relative
advantage. While a computer offers a huge advantage over a typewriter, for example, the added gain from
having an electric typewriter over a manual one was much smaller. Another issue is compatibility, both in the
social and physical sense. A major problem with the personal computer was that it could not read the
manual files that firms had maintained, and birth control programs are resisted in many countries due to
conflicts with religious values. Complexity refers to how difficult a new product is to use—e.g., some people
have resisted getting computers because learning to use them takes time. Trialability refers to the extent to
which one can examine the merits of a new product without having to commit a huge financial or personal
investment—e.g., it is relatively easy to try a restaurant with a new ethnic cuisine, but investing in a global
positioning navigation system is riskier since this has to be bought and installed in one’s car before the
consumer can determine whether it is worthwhile in practice. Finally, observability refers to the extent to
which consumers can readily see others using the product—e.g., people who do not have ATM cards or
cellular phones can easily see the convenience that other people experience using them; on the other hand,
VCRs are mostly used in people’s homes, and thus only an owner’s close friends would be likely to see it.
At the societal level, several factors influence the spread of an innovation. Not surprisingly,
cosmopolitanism, the extent to which a country is connected to other cultures, is useful. Innovations are
more likely to spread where there is a higher percentage of women in the work force; these women both
have more economic power and are able to see other people use the products and/or discuss them.
Modernity refers to the extent to which a culture values “progress.” In the U.S., “new and improved” is
considered highly attractive; in more traditional countries, their potential for disruption cause new products to
be seen with more skepticism. Although U.S. consumers appear to adopt new products more quickly than
those of other countries, we actually score lower on homiphily, the extent to which consumers are relatively
similar to each other, and physical distance, where consumers who are more spread out are less likely to
interact with other users of the product. Japan, which ranks second only to the U.S., on the other hand,
scores very well on these latter two factors.
Branding strategies at international level
For a company that goes international branding is important, as it is more difficult than branding in the
domestic market. Branding is usually rooted in the culture of a country and brand names designed for one
country can have different meanings in other languages or no meaning at all. A brand is a name, a sign, a
symbol, a logo, a term or a combination of these used by a firm to differentiate its offerings from those of the
competitors. In most product categories, companies do not compete with products, but with brands, with the
way the augmented products are differentiated and positioned as compared to other brands. All brands are
products or services in that they serve a functional purpose, but not all products or services are brands. A
Page 48
2010(Fall)
product is a physical entity, but is not always a brand, as brands are created by marketers. A brand is a
product or a service that besides the functional benefits provides also some added value, such as11:
 f amiliarity, as brands identify products,
reliability and risk reduction, as brands in most instances offer a quality guarantee,
a
 ssociation with the kind of people who are known users of the brand, such as young and
glamorous or rich and snobbish.
For many firms the brands they own are their most valuable assets. Associated to the brand is the brand
equity that refers to brand name awareness, perceived quality or any association made by the customer with
the brand name. A brand can be an asset (for Coca Cola the brand is an asset) or a liability (for Nestle the
brand was a liability when the boycott for the infant milk formula was launched internationally).
How a company chooses a brand is an elaborated process. In France there is a company that specializes in
finding international brands names. Jeannet and Hennessey present the steps undertaken by this
company12:
1. The company brings citizens of many countries together and asks them to state names in their particular
language that they think would be suitable to the product to be named. Speakers of different languages can
immediately react in case names that sound unpleasant in their language or have unwanted connotations
appear.
2. The thousands of names that are accumulated in few such sessions, are than reduced to five hundred by
the company.
3. The client company is asked to choose fifty names from the five hundred.
4. The fifty chosen names are than searched to determine which ones have not been registered in any of the
countries under consideration.
5. From the usually ten names that still remain in the process after this phase, the company together with the
client will make the final decision.
When choosing a name for products to be marketed internationally a company may consider different
naming strategies, as those exemplified in box no. 9.1.
There are a number of branding strategies that a company may use at international level:
1. According to the existence or not of a brand there are:
 t he no branded products that have the advantage of lower production costs and lower
marketing costs but they have the disadvantage that do not have market identity and compete
severely on price,
products
 with brands that can benefit a lot from their brands if brand awareness is high and the
image is positive. Sometimes the brand can be considered the most valuable asset of the
company.
For instance, Coca-Cola brand’s equity was evaluated at over 35 bill. $ according to one source. The
fact that brands are assets for companies is illustrated by their market value. In 1987 Nestle bought
the UK chocolate maker Rowntree with 4.5 billion $, five times the book value, due to its ownership
of well known brands such as After Eight, Kit Kat and Rolo. Similarly Philip Morris bought Kraft with
12.9 billion $, a price four times the book value13.
Page 49
2010(Fall)
2. According to the number of products that have the same name, there are:
individual
 brands, when each company’s product has its own name usually with no association
with the company name. Individual brands are used when the company addresses different
market segments. In the cigarettes industry one producer has Camel, Winston and
Winchester brands, each of them addressing different market segments,
family/umbrella/
 corporate brands. When all products of the company or a group of products of
the company have the same name, we have the family or umbrella branding. When this name
is the corporate name, we have corporate branding. Such corporate brands are Shell, Levi’s,
Sony, Kodak, Daewoo, Virgin etc.
Sometimes companies use both a specific individual name with the name of the corporation. For
instance, Chocapic from Nestle, or Toyota Lexus.
3. According to the number of brands commercialised in one market, the company may have:
single
 brand and it is usually the case when there is a high market homogeneity. The main
disadvantage of having just one brand in a country is the limited shelf space at retailer level,
resulting in lower exposure of the company. The advantage is that brand confusion for the
customer is eliminated and more focused and efficient marketing is permitted for the
company,
multiple
 brands when a company has more brands in one market. This strategy is to be used
when the market is segmented and consumers have various needs. Coca-Cola company has
on the Romanian market multiple brands, among which Coca-Cola, Sprite, Cappy, Fanta, etc.
The advantage of this strategy is that more shelf space is gained by the company (if the
consumer does not buy Coca-Cola but buys Fanta the money goes to the same Coca-Cola
company). Among the disadvantages are the fact that there are higher marketing costs, as
different marketing plans and programs are designed for each brand and the economies of
scale are lost.
4. According to the owner of the brand
there
 are manufacturer’s brands as most brands we know: Levi’s, Coca-Cola, Nike, Levi’s,
Adidas, etc.
there are private brands that are retailers’ brands or store brands. Retailers started to buy
products and then resell them under their own name. The private brands recently became
very popular. They offer high margins for retailers as compared to the margins for
manufacturers’ brands, they have extensive and better shelf space, they benefit of heavy in-
store promotion and they are usually low price/good quality products. In UK they represent
one third of supermarket sales and their sale proportion increases in continental Europe, too.
5. According to the geographical spread of the brand, there are:
global
 brands that have been defined by Chee and Harris14 as brands that are marketed with
the same positioning and marketing approaches in every part of the world. Some other
authors consider that it is not so easy to define a global brand. However, using global brands
(at least the same name everywhere) offers some advantages to the company:
 o btaining economies of scale,
Page 50
2010(Fall)
b
 uilding easier brand awareness, as global brands are more visible than local brands,
b
 y using global brands the company can capitalize on media overlap that exist in many
regions (for instance, Germany with Austria),
u
 sing global brands contributes to increased prestige for the company, as it gives
consumers a signal that the company has the resources to compete globally and has the
will power and commitment to support the brand world wide.
local
 brands are more indicated to be used in certain conditions, such as the following:
there are legal constraints. A few years ago in India, Pepsi was called Lehar as the
legislation was asking that all brand names to be local.
if the brand name is already used for a similar or not similar product in that country, another
brand name has to be chosen. Budweiser is an American brand of beer, but in Europe a
Czech beer company owned the name. So the USA company called its beer Bud in
Europe.
when there are cultural barriers and the global name is either difficult to pronounce or has
an undesirable association. A company producing milk from New Zeeland, renamed its
powder milk sold in Malaysia from Anchor (domestic name) to Fern (local name) because
the name Anchor was a beer brand heavily advertised in Malaysia. The company
considered that the consumers will not buy this product used for children if its name would
be associated to an alcoholic beverage, especially that a large proportion of the population
is Muslim in Malaysia.
Many international companies have used local brands by adapting their domestic brands to the environment
of the new foreign markets. Procter and Gamble for instance, adapted the name of its household cleaner Mr.
Clean to the European markets by translating it. The brand became Monsieur Propre in France and Meister
Proper in Germany15. General Motors, also adapted its brand for Europe, even though was selling the same
product. The automobile became Opel in Germany and Vauxhall in U.K..16
Brand name selection procedures for international markets are therefore important, as the company has to
choose either to adapt or standardize its brand name. A key issue for companies in international marketing is
whether they should use global or local brands. The decision of either to use global or local names should be
taken according to what each market dictates. In the countries where patriotism is high and consumers have
a strong buy-local attitude local brands are recommended. Also, local brands are to be used in the countries
where global brands are not known and where local brands have a strong brand equity. When the brand is
strong companies should go global with it. A company should use global brands where is possible and to use
national/local brands where necessary.
Q.2 a. Why do you think International quality standards are essential in International
business?
Ans: QUALITY CONTROL

Quality control is a process within an organization designed to ensure a set level of quality for the
products or services offered by a company. This control includes the actions necessary to verify and control
the quality output of products and services. The overall goal includes meeting the customer's requirements,
product satisfaction, fiscally sound, and dependable output. Most companies provide a service or a product.
The control is important to determine that the output being provided is of overall top quality. Quality is
Page 51
2010(Fall)
important to companies for liability purposes, name recognition or branding, and maintaining a position
against the competition in the marketplace.
This process can be implemented with a company in many ways. Some organizations bring in a quality
assurance department and practice testing of products before they are delivered to the shelves. When
quality assurance is used, a set of requirements is determined and the quality assurance team will verify the
product not only meets all of the requirements but they will also perform faulty testing. Companies with a
customer service department often implement quality controls through recording phone conversations,
sending out customer surveys, and requiring employees to follow a specific set of guidelines when speaking
to customers over the phone. Implementing a quality control department or strategy allows a company to find
faults or problems with products or services before they reach the customer.
It is common for a company to send out products that have defects or problems or provide poor service to
customers. A good strategy and using techniques can help ensure the elimination of issues that give the
company a bad name. This is because quality control monitors the overall quality by comparing the product
or service with the requirements. Making sure the products or services meet or exceed the requirements set
forth allows a business to be more successful and improve the organization.
Quality control not only consists of products and services but how well an organization works as a whole
together within the organization and in the marketplace. A strategy to manage and improve the quality within
an organization can help a company become and remain a success. Quality is an ongoing effort that must be
consistent and improving every day. Every organization or business can benefit by using quality control for
their products or services, within the internal organization, and interacting in the marketplace.
To be competitive on both a national and a global basis, organizations must adopt a forward-thinking
approach in developing their management strategies. In this article, we will review ISO 9000 and ISO 14000
and suggest how these standards may be used to move an organization toward that paradigm and thus
enable it to compete more effectively in today's global marketplace.
Many of our current quality management and environmental management systems are reactive—that is, they
have been developed in response to federal, state, or local regulations. We need to ask ourselves, is this a
competitive way to work? When we are in this reactive mode, are we really listening to our customers? Are
we able to seek out innovative means of getting the job done?
International standards force companies to look at their processes in a new light and to take a more active
approach to management. For example, if a company wishes to pursue the new environmental standard,
ISO 14000, its environmental management system's pollution control policy will have to be revamped to
focus on prevention rather than command-and-control. As the company moves in that direction it will truly
become more competitive, and will do so on a global basis.
Q.2 b. Give a note on Robotics and flexible manufacturing.
Ans : The most powerful long-term technological trend impinging on the factory of the future is that toward
computer integrated manufacturing. Behind this trend lies the unique capability of the computer to automate,
optimize, and integrate the operations of the total system of manufacturing. The virility of this trend is
attested to by technological forecasts made over the past 10 years. The rapidity of the development is due
Page 52
2010(Fall)
not only to this technological virility, but also to powerful long-term economic and social forces impinging on
manufacturing. As a result many industrialized nations are pursuing large national programs of research,
development, and implementation of computer integrated manufacturing to hasten the technological
evolution of computer integrated automatic factories. Programs receiving major emphasis include
development and application of integrated manufacturing software systems, group technology and cellular
manufacturing, computer control of manufacturing processes and equipment, computer-controlled robots,
flexible manufacturing systems, and prototype computer automated factories. This evolution poses some
significant challenges to American industry.
The Role of Robotics in Flexible Manufacturing
When most engineers think about “flexibility,” they imagine robots. Because of programmable controls, end-
of-arm tooling and machine vision systems, the devices can perform a wide variety of repeatable tasks.
“Robotics is a key component of flexible manufacturing,” claims Ted Wodoslawsky, vice president of
marketing at ABB Robotics Inc. (Auburn Hills, MI). “Any applications that involve high-mix, high-volume
assembly require flexible automation. Manufacturers need the ability to run different products on the same
line. That’s much more difficult to do with hard automation.” The automotive industry is still considered to be
the role model for robotic flexibility. However, Wodoslawsky says many of the lessons learned by
automakers and suppliers can easily be applied to other industries and processes. “Automotive
manufacturers are faced with producing a greater mix of vehicles in a shrinking number of plants,” adds
Walter Saxe, automotive business development manager at Applied Robotics Inc. (Glenville, NY). “This
practice is driving the need for higher payloads, faster tool changeover and greater control of data to achieve
maximum flexibility and exacting production details. This in turn is challenging the makers of robots and tools
to stay ahead of the ever-increasing market needs by advancing technologies before they are needed.” For
instance, state-of-the-art robots feature force control, which offers an extra degree of flexibility for critical
applications such as powertrain assembly. Other new tools and features that make robots more suitable for
flexible production applications include open architecture that allows easy integration with commonly used
PLC platforms and offline simulation from desktop computers. “[Manufacturing engineers should ensure
their] controls platform has the ability to manage, manipulate and store all the data that is required with
flexible implementation schemes,” says David Huffstetler, market manager at Staubli Robotics (Duncan, SC).
“It can become a critical issue in places where you least expect it to happen.
It is true that the flexible manufacturing cuts the number of employees that are needed for production.
Quicker equipment changeover between production jobs will have a direct bearing on the improvement of
capital utilization. This will also reduce costs per production job due to the decrease in man hours needed for
st up of equipment. Automated control of the manufacturing process yields consistent and higher quality
output. Less man hours are needed for overall production which reduces the cost of products. There is
significant savings from the reduced indirect labor cost, errors in production, repairs, and product rejects.
Q.3 (a). What is transfer pricing?
Ans: Transfer pricing refers to the setting, analysis, documentation, and adjustment of charges made
between related parties for good, services, or use of property (including intangible property). Transfer prices
among components of an enterprise may be used to reflect allocation of resources among such components,
or for other purposes. OECD Transfer Pricing Guidelines state, “Transfer prices are significant for both
taxpayers and tax administrations because they determine in large part the income and expenses, and
therefore taxable profits, of associated enterprises in different tax jurisdictions.”
Many governments have adopted transfer pricing rules that apply in determining or adjusting income taxes of
domestic and multinational taxpayers. The OECD has adopted guidelines followed, in whole or in part, by
many of its member countries in adopting rules. United States and Canadian rules are similar in many
respects to OECD guidelines, with certain points of material difference. A few countries follow rules that are
materially different overall.
Page 53
2010(Fall)
The rules of nearly all countries permit related parties to set prices in any manner, but permit the tax
authorities to adjust those prices where the prices charged are outside an arm's length range. Rules are
generally provided for determining what constitutes such arm's length prices, and how any analysis should
proceed. Prices actually charged are compared to prices or measures of profitability for unrelated
transactions and parties. The rules generally require that market level, functions, risks, and terms of sale of
unrelated party transactions or activities be reasonably comparable to such items with respect to the related
party transactions or profitability being tested.
Most systems allow use of multiple methods, where appropriate and supported by reliable data, to test
related party prices. Among the commonly used methods are comparable uncontrolled prices, cost plus,
resale price or markup, and profitability based methods. Many systems differentiate methods of testing
goods from those for services or use of property due to inherent differences in business aspects of such
broad types of transactions. Some systems provide mechanisms for sharing or allocation of costs of
acquiring assets (including intangible assets) among related parties in a manner designed to reduce tax
controversy.
Most tax treaties and many tax systems provide mechanisms for resolving disputes among taxpayers and
governments in a manner designed to reduce the potential for double taxation. Many systems also permit
advance agreement between taxpayers and one or more governments regarding mechanisms for setting
related party prices.
Many systems impose penalties where the tax authority has adjusted related party prices. Some tax systems
provide that taxpayers may avoid such penalties by preparing documentation in advance regarding prices
charged between the taxpayer and related parties. Some systems require that such documentation be
prepared in advance in all cases.
Q.3(b). Write a short note on Bills of Exchange and Letters of credit.
Ans: A bill of exchange or "draft" is a written order by the drawer to the drawee to pay money to the payee.
A common type of bill of exchange is the cheque (check in American English), defined as a bill of exchange
drawn on a banker and payable on demand. Bills of exchange are used primarily in international trade, and
are written orders by one person to his bank to pay the bearer a specific sum on a specific date. Prior to the
advent of paper currency, bills of exchange were a common means of exchange. They are not used as often
today.
A bill of exchange is an unconditional order in writing addressed by one person to another, signed by the
person giving it, requiring the person to whom it is addressed to pay on demand or at fixed or determinable
future time a sum certain in money to order or to bearer. (Sec.126)
It is essentially an order made by one person to another to pay money to a third person. A bill of exchange
requires in its inception three parties—the drawer, the drawee, and the payee.
In brief, a "bill of exchange" or a "Hundi" is a kind of legal negotiable instrument used to settle a payment at a
future date. It is drawn by a drawer on a drawee wherein drawee accepts the payment liability at a date
stated in the instrument. The Drawer of the Bill of Exchange draw the bill on the drawee and send it to him
for his acceptance. Once accepted by the drawee, it becomes a legitimate negotiable instrument in the
financial market and a debt against the drawee. The drawer may, on acceptance, have the Bill of Exchange
discounted from his bank for immediate payment to have his working capital funds. On due date, the bill is
again presented to the drawee for the payment accepted by him, as stated therein the bill.
Letter of Credit (LC) is a declaration of financial soundness and commitment, by a bank for its client, for the
amount stated in the LC document, to the other party (beneficiary) named therein. The LCs may or may not
be endorse-able. In case of default of payment by the party under obligation to pay, the LC issuing Bank
undertakes to honour the payment - with or without conditions. Normally, there may be sight LCs or DA LCs
Page 54
2010(Fall)
containing a set of conditions in both the cases. There "may be" Bills of Exchange(s) drawn under the overall
limits of the LC amount for payment later on.
Q.4. Discuss the modern theory of international trade along with its criticisms.
Ans: Modern Trade Theories;-
The orthodox neo-classical trade theories are the basis of trade advocated by WTO and GATT.
However, the comparative advantage and specialization do not explain many real world trade patterns. As
well, the assumptions of these theories are very simplistic compared to the real world competitive and other
economic factors such as returns to scale, the impact of demand by income levels, sizes of firms involved in
trade etc. These limitations of the orthodox neo-classical trade policy gave birth to modern trade theories and
trade policies as well government assistance to industry in international trade.
The assumptions of Modern Trade Theories
The modern trade theories relax the assumptions of the orthodox trade theories. The assumptions of modern
trade theory are:
Non identical preferences by consumers
Not constant return to scale but economies of scale
Imperfect and other competitive market structure
The existence of externalities as opposed to no externalities assumed by the orthodox trade theory
The State of Modern Trade theories
At the present moment modern trade theories are not consistent. However, they are important building
blocks to create a consistent modern trade theory. For example the Linder hypothesis about preferences,
models with economies of scale and strategic trade policy even they are not consistent they are important
building blocks to create a consistent modern trade theory, which can assist firms and government to devise
policies and practices in the area of international trade.
A form of globalization and global trading where all nations prosper and develop fairly and equitably is
probably what most people would like to see.
It is common to hear of today’s world economic system as being “free trade” or “globalization”. Some
describe the historical events leading up to today’s global free trade and the existing system as “inevitable”.
The UK’s former Prime Minister, Margaret Thatcher, was famous for her TINA acronym. Yet, as discussed in
the Neoliberalism Primer page earlier, the modern world system has hardly been inevitable. Instead, various
factors such as political decisions, military might, wars, imperial processes and social changes throughout
the last few decades and centuries have pulled the world system in various directions. Today’s world
economic system is a result of such processes. Power is always a factor.
Capitalism has been successful in nurturing technological innovation, in promoting initiative, and in creating
wealth (and increasing poverty). Many economists are agreed that in general capitalism can be a powerful
engine for development. But, political interests and specific forms of capitalism can have different results.
The monopoly capitalism of the colonial era for example was very destructive. Likewise, there is growing
criticism of the current model of corporate-led neoliberalism and its version of globalization and capitalism
that has resulted. This criticism comes from many areas including many, many NGOs, developing nation
governments and ordinary citizens.
In March 2003, the IMF itself admitted in a paper that globalization may actually increase the risk of financial
crisis in the developing world. “Globalization has heightened these risks since cross-country financial
Page 55
2010(Fall)
linkages amplify the effects of various shocks and transmit them more quickly across national borders” the
IMF notes and adds that, “The evidence presented in this paper suggests that financial integration should be
approached cautiously, with good institutions and macroeconomic frameworks viewed as important.” In
addition, they admit that it is hard to provide a clear road-map on how this should be achieved, and instead it
should be done on a case by case basis. This would sound like a move slightly away from a “one size fits all”
style of prescription that the IMF has been long criticized for.
In critical respects I would argue that the problem with economic globalization is that it has not gone far
enough. Major barriers to trade remain in key sectors of export interest to developing countries such as
agriculture and textiles and clothing, and trade remedy actions (antidumping, countervail, and safeguards)
have proliferated (often directed at developing countries), in many cases replacing prior tariffs. Indeed, tariffs
facing developing country exports to high-income countries are, on average, four times those facing
industrial country exports for manufactured goods and much higher again for agricultural products.
Agricultural subsidies in developed countries further restrict effective market access by developing
countries.73 Economic estimates have found that the costs of protection inflicted on developing countries by
developed countries negate most or all of the entire value of foreign aid in recent years.
Q. 5 (a). Make a note of the functions and achievements of UNCTAD.
Ans: UNCTAD:-
UNCTAD was created in 1964 as an expression of the belief that a cooperative effort
of the international community was required to bring about changes in the world economic order that would
allow developing countries to participate more fully in a prospering world economy. UNCTAD was the
product of efforts aimed at countering self-perpetuating asymmetries and inequities in the world economy,
strengthening multilateral institutions and disciplines, and promoting sustained and balanced growth and
development. The creation of UNCTAD marked the commitment of Member States "to lay the foundations of
a better world economic order" through the recognition that "international trade is an important instrument for
economic development".
Despite profound economic and political transformations in the world in the last thirty years, the
essence of UNCTAD's development mission has not changed. Its thrust continues to be to enlarge
opportunities in particular for developing countries to create their own wealth and income and to assist them
to take full advantage of new opportunities.
Functions:
The themes addressed by UNCTAD over the years have included:
- expanding and diversifying the exports of goods and services of developing countries, which are their main
sources of external finance for development;
- encouraging developed countries to adopt supportive policies, particularly by opening their markets and
adjusting their productive structures;
- strengthening international commodity markets on which most developing countries depend for export
earnings and enhancing such earnings through their increased participation in the processing,
marketing and distribution of commodities, and the reduction of that dependence through the
diversification of their economies;
- expanding the export capacity of developing countries by mobilizing domestic and external resources,
including development assistance and foreign investment;
- strengthening technical capabilities and promoting appropriate national policies;
Page 56
2010(Fall)
- alleviating the impact of debt on the economies of developing countries and reducing their debt burden;
- supporting the expansion of trade and economic cooperation among developing countries as a mutually
beneficial complement to their traditional economic linkages with developed countries; and
- special measures in support of the world's poorest and most vulnerable countries.
UNCTAD's early years coincided with economic growth particularly in developed countries,
worsening terms of trade for developing countries' exports, especially for commodities, and an increasing
income gap between developed and developing countries. The situation became even more difficult through
the 1980s which came to be known as "the lost decade for development". One consequence was that the
multilateral economic negotiations between developed and developing countries became deadlocked in most
forums. As a result, a perceptible loss of confidence occurred in UNCTAD's role as a facilitator of consensus
and conciliator of divergent views. Multilateralism as a method of dealing with international trade and
development problems was eroded and several countries opted for bilateral approaches.
But the profound changes that took place in the world in the late 1980s forced a reassessment of
international economic cooperation. A fresh consensus emerged in the early 1990s on the need for new
actions to support the international trade and economic development of developing countries. UNCTAD, and
in particular UNCTAD VIII, added impetus to the forging of the development consensus for the 1990s and of
a new partnership for development as envisaged in the Declaration on International Economic Cooperation,
in particular the revitalization of Economic Growth and Development of the Developing Countries, adopted by
the General Assembly at its eighteenth special session held in April-May 1990.
Major achievements
The functions of UNCTAD comprise four building blocks:
(i) policy analysis;
(ii) intergovernmental deliberation, consensus-building and negotiations;
(iii) monitoring, implementation and follow-up; and
(iv) technical cooperation.
UNCTAD VIII added a new dimension, namely the exchange of experiences among Member States so as to
enable them to draw appropriate lessons for the formulation and implementation of policies at the national
and international levels. These functions are interrelated and call for constant cross-fertilization between the
relevant activities. Thus, UNCTAD is at once a negotiating instrument, a deliberative forum, a generator of
new ideas and concepts, and a provider of technical assistance. As a result of this multifaceted mandate,
UNCTAD was entrusted with a wide spectrum of activities cutting across several dimensions of development.
Its achievements have therefore been of different kinds and of varying impact. Among the most
significant achievements reported to the Inspector by the UNCTAD secretariat could be included:
- the agreement on the Generalized System of Preferences (GSP) (1971), under which over $70 billion worth
of developing countries' exports receive preferential treatment in most developed country markets
every year;
- the setting up of the Global System of Trade Preferences among Developing Countries (1989);
- the adoption of the Set of Multilaterally Agreed Principles for the Control of Restrictive Business Practices
(1980);
Page 57
2010(Fall)
- negotiations of International Commodity Agreements, including those for cocoa, sugar, natural rubber, jute
and jute products, tropical timber, tin, olive oil and wheat;
- the establishment of transparent market mechanisms in the form of intergovernmental commodity expert
and study groups, involving consumers and producers, including those for iron ore, tungsten, copper
and nickel;
- the negotiation of the Common Fund for Commodities (1989), set up to provide financial backing for the
operation of international stocks and for research and development projects in the field of
commodities, and which did not fulfil many expectations of the developing countries;
- the adoption of the resolution on the retroactive adjustment of terms of Official Development Assistance
(ODA) debt of low-income developing countries under which more than fifty of the poorer developing
countries have benefited from debt relief of over $6.5 billion;
- the establishment of guidelines for international action in the area of debt rescheduling (1980);
- the Agreement on a Special New Programme of Action for the Least Developed Countries (1981);
- the Programme of Action for the Least Developed Countries for the 1990s (1990);
- the negotiation of conventions in the area of maritime transport: United Nations Convention on a Code of
Conduct for Linear Conferences (1974), United Nations Convention on International Carriage of
Goods by Sea (1978), United Nations Convention on International Multimodal Transport of Goods
(1980), United Nations Convention on Conditions for Registration of Ships (1986), United Nations
Convention on Maritime Liens and Mortgages (1993).
In addition, UNCTAD made some contributions on matters for implementation in other fora, such as:
- the agreement on ODA targets, including the 0.7 per cent of GDP target for developing countries in general
and the 0.20 per cent target for LDCs;
- the improvement of the IMF's compensatory financial facility for export earnings shortfalls of developing
countries;
- the creation of the Special Drawing Rights (SDRs) by the IMF;
- the reduction of commercial bank debt for the highly indebted countries promoted by the World Bank;
- the principle of "enabling clause" for preferential treatment of developing countries which were later
reflected in GATT legal instruments, e.g., Part IV of GATT on trade and development.
UNCTAD has also made a valuable contribution at the practical level, especially in the formulation of
national policies, instruments, rules and regulations, as well as in the development of national institutions,
infrastructure and human resources, in practically all its fields of activity. These achievements, usually
involving an important technical cooperation component, have proved their value and have been much
appreciated by the Governments concerned. Special mention should be made of UNCTAD's computerized
systems in the area of customs (ASYCUDA) and debt management (DMFAS) which are considered among
the best products on the market.
Furthermore, UNCTAD supported the Uruguay Round negotiations by assisting developing countries
in understanding the implications for their economies of discussion on various issues or sectors and in
defining their position for the negotiations. For this purpose, UNCTAD prepared special studies on specific
Page 58
2010(Fall)
issues, provided relevant trade information and advice at regional and national level within its technical
assistance programme. Through its three annual flagship publications, nameyl the Trade and Development
Report, the World Investment Report and the Least Developed Countries Report, the UNCTAD secretariat
has made a signicant contribution to international understanding of major economic and development issues.
Q. 5 (b). Give reasons for the slow growth towards achieving international accounting
standards.
Ans:
The rapid growth of international trade and internationalization of firms, the Developments of new
communication technologies, the emergence of international competitive forces is perturbing the financial
environment to a great extent. Under this global business scenario, the residents of the business community
are in badly need of a common accounting language that should be spoken by all of them across the globe.
A financial reporting system of global standard is a pre-requisite for attracting foreign as well as present and
prospective investors at home alike that should be achieved through harmonization of accounting standards.
Accounting Standards are the policy documents (authoritative statements of best accounting practice) issued
by recognized expert accountancy bodies relating to various aspects of measurement, treatment and
disclosure of accounting transactions and events. As relate to the codification of Generally Accepted
Accounting Principles (GAAP). These are stated to be norms of accounting policies and practices by way of
codes or guidelines to direct as to how the items, which go to make up the financial statements should be
dealt with in accounts and presented in the annual accounts. The aim of setting standards is to bring about
uniformity in financial reporting and to ensure consistency and comparability in the data published by
enterprises.
Accounting standards prevalent all across the world:
* Accounting standards are being established both at national and international levels. But the variety of
accounting standards and principles among the nations of the world has been a sustainable problem for
globalizing the business environment.
* There are several standard setting bodies and organizations that are now actively involved in the process
of harmonization of accounting practices. The most remarkable phenomenon in the sphere of promoting
global harmonization process in accounting is the emergence of international accounting standards.
* In India the Accounting Standards Board (ASB) was constituted by the Institute of Chartered Accountants
of India (ICAI) on 21st April 1977 with the function of formulating accounting standards.
* Accounting standards vary from one country to another. There are various factors that are responsible for
this. Some of the important factors are
- legal structure
- sources of corporate finance
- maturity of accounting profession
- degree of conformity of financial accounts
- government participation in accounting and
- Degree of exposure to international market.
Page 59
2010(Fall)
* Diversity in accounting standards not only means additional cost of financial reporting but can cause
difficulties to multinational groups in the manner in which they undertake transactions. It is quite possible for
a transaction to give rise to a profit under the accounting standards of one country where as it may require a
deferral under the standards of another.
Issues in adopting global accounting standards: -

There seems to be a reluctance to adopt the International Accounting Standards Committee (IASC) norms in
the US?
This is definitely a problem. The US is the largest market and it is important for IASC standards to be
harmonized with those prevailing there. The US lobby is strong, and they have formed the G4 nations, with
the UK, Canada, and Australia (with New Zealand) as the other members. IASC merely enjoys observer
status in the meetings of the G4, and cannot vote. Even when the standards are only slightly different, the
US accounting body treats them as a big difference, the idea being to show that their standards are the best.
We have to work towards bringing about greater acceptance of the IASC standards.
How real is the threat from G4?

G4 has evolved as a standard setting body and has recently issued its first standard on pooling of interest
method. (Mergers can either be in the nature of purchase or in the form of pooling of interest like HLL-
BBLIL). It is also expected to publish new or revised papers on reporting financial performance, business
combinations, joint ventures, leases, and contributions. So far, the FASB (the US standard setting body) was
the world's standard setter because of mandatory compliance with US GAAP for listing on the New York
Stock Exchange (NYSE). The US congress had to, however, step in and overrule the FASB standard on
stock option.
The current status of IAS (Indian Accounting Standards):

In India, the Statements on Accounting Standards are issued by the Institute Of Chartered Accountants of
India (ICAI) to establish standards that have to be complied with to ensure that financial statements are
prepared in accordance with generally accepted accounting standards in India (India GAAP ). From 1973 to
2000 the IASC has issued 32 accounting standards. These standards, as a matter of fact, most of the
countries in the world, which are interested, and confidence in adopting these standards may be followed.
But it is observed that many countries are not adopting the standards in the presentation of accounting
information. With a view to examine the time gap for indianisation of International Accounting Standards, the
information is analyzed The average gap for indianisation of International Accounting Standards is 6.13
years. It shows that for adopting IAS in India, it is taking 6.13 years for one accounting standard. This
analysis points out the poor research work, and development in the accounting field.
A significant criticism of IAS;

* That the standards are too broad based and general to ensure that similar accounting method is applied in
similar circumstances. For Instance, the accounting for expenses incurred under a Voluntary Retirement
Scheme ( VRS ) , in which the methods used range from pay-as-you-go to Amortization of the present value
of future pension payments over the period of benefit.
* It may be noted that in several important areas, when the Indian Standards are implemented, the
accounting treatment in these areas could lead to differences in the restatement of accounts in accordance
with US GAAP . Some of these areas are:
- Consolidated financial statements
Page 60
2010(Fall)
- Accounting for taxes on income

- Financial Instruments
- Intangible Assets
Restatement to US GAAP :
A restatement of financial statements prepared under India GAAP to U.S. GAAP requires careful planning in
the following areas:
- Involvement of personnel within the accounts function and the time frame within which the task is to be
completed.
- Identification of significant accounting policies that would need to be disclosed under U.S. GAAP and the
differences that exist between India GAAP and U.S. GAAP
- The extent of training required within the organisation to create an awareness of the requirements under
U.S. GAAP
- Subsidiaries and associate companies and restatement of their accounts in conformity with U.S. GAAP
- Adjustment entries that are required for conversion of India GAAP accounts.
- Reconciliation of differences arising on restatement to U.S. GAAP in respect of income for the periods
under review and for the statement of Shareholder's equity.
The timetable for restatement of the financial statements to US GAAP would depend upon the size of the
company and the nature of its operations , the number of subsidiaries and associates . The process of
conversion would normally take up to 16 weeks in a large company in the initial year . It is thus necessary to
streamline the accounting systems to provide for restatement to U.S. GAAP on a continuing basis. At first
sight the restatement of financial statements in accordance with U.S. GAAP appears to be formidable.
However, as the Indian accounting standards are built on the foundation of international accounting
standards, on which a truly global GAAP might be built, there is no cause for concern .
Another reason for the prevailing divergent accounting practices is the Accounting Standards, the provisions
of the Income Tax Act 1961 and Indian Companies Act 1956 do not go together.
(a) Company law and Accounting Standards:

In India, though accounting standards setting is presently being done by ICAI, one could discern a tentative
and halfhearted foray by company legislation in to the making of accounting rules of Measurement and
reporting. This action by itself is not the sore point but the failure to keep pace with the changes and
simultaneously not allowing scope for some one else to do it is disturbing.
A study of the requirement of company law regarding the financial statements reveal several lacunae like
earning per share, information about future cash flows, consolidation, mergers, acquisitions etc.
(b) Income Tax Act and Accounting Standards:

The Income Tax Act does not recognize the accounting standards for most of the items while computing
income under the head "Profits & Gains of Business or Profession". Section 145(2) of the I.T. Act has
empowered the Central Government to prescribe accounting standards. The standards prescribed so far
constitute a rehash of the related accounting standards prescribed by ICAI for corporate accounting. On a
close scrutiny of these standards one is left wondering about the purpose and value of this effort. Examples
are application of prudence substance over form, adherence to principles of going concern etc.
Page 61
2010(Fall)
(c) Other regulations and accounting standards:

In respect of banks, financial institutions, and finance companies the Reserve Bank of India (RBI)
pronounces policies among others, revenue recognition, provisioning and assets classifications.
Similarly the Foreign Exchange Dealers Association (FEDAI) provides guidelines regarding accounting for
foreign exchange transactions. Since the Securities & Exchange Board of India (SEBI) is an important
regulatory body it would also like to have its own accounting standards and in fact, it has started the process
by notifying cash flow reporting format. It is also in the process of issuing a standard on the accounting
policies for mutual funds. It appears as if several authorities in our country are keen to have a say in the
matter of framing accounting rules of measurement and reporting. The tentative and half hearted legal and
regulatory intervention in accounting in our country, has come in the way of development of robust,
continuously evolving and dynamic accounting theory and standards.
India is slowly entering the arena of accounting standards. But the progress of formulation of accounting
standards has been very slow compared with the developments at international levels. Differences are still
there but they are narrowing. It is expected that the pace of progress in the sphere of harmonization will
accelerate further in the coming years.
Q. 6 (a). Give a note on the Japanese approach to HRM.
Ans: When it comes to Human Resource Management much adieu has been made in recent years over the
comparison between East and West. That comparison usually boils down to Japanese HRM practices that
are slow to change in the face of increasing global competitiveness and their more adaptable U.S.
counterparts. Optimists even laud that Japan is finally showing signs of "catching up." However, experts say
too often this comparison is oversimplified.
Take, for example, a study lead by Markus Pudelko of the University of Edinburgh Management School last
year. It confirms that the "seniority principle" for which Japan's traditional HRM model is famed is waning
more than any other principle and will likely continue to do so. It even echoes a 2002 study conducted at the
University of Melbourne, which also notes that while surveyed firms were undergoing changes in HRM only
one in four could be considered transformative, according to the Australian Human Resource Institute.
While the Melbourne study raises questions about what specific changes lie ahead for Japanese firms
adopting more performance-based promotion and compensation systems, Pudelko's suggested that
whatever changes are made in this area they should be suited to a Japanese context. Others agree, not only
regarding promotion and compensation, but also for broader aspects of Japanese HRM. Needless to say,
they also raise questions about which specific HR practices in Japan need changing.
While it's generally agreed that more emphasis on performance than seniority and a more equitable
assessment of employees could go a long way to improve morale and competitiveness, little has been
offered in the realm of revamping Japanese HRM.
"The perception that Japanese companies have to become more like U.S. companies to survive in this global
environment isn't born out by fact," Sanford M Jacoby, author and professor at UCLA Anderson School of
Management told Veritude newsletter in 2005. A survey of Japanese and U.S. firms he coauthored found
when it comes to HR, while some Japanese companies are more likely to hire mid-career staff, have boards
made up primarily of outsiders and use financial incentives like their U.S. counterparts, they are still in the
minority.
In fact, Jacoby's study shows that some U.S. companies are looking a bit more like Japanese firms, though
they also are a minority. Some, especially those more insulated from financial markets, are viewing HR as an
essential "resource-based" asset for business strategy and pay close attention to human assets or
intellectual capital. They invest heavily in training and retention and their HR executives play a major role in
grooming executives for senior posts - much as firms traditionally have done in Japan. The implication,
counter to conventional wisdom abroad, is that's a good thing.
Page 62
2010(Fall)
It is wise for HR executives to position themselves much in the same way that CFOs have in the past two
decades to further this, according to Jacoby. And in that respect Japanese HRM, which continues to play a
central role ranging from performance assessment and overall training to strategizing and senior promotions
(even before Enron and the Sarbanes-Oxley act helped cast doubt on outside executive hires as a panacea)
may have as much to teach as learn.
What seems to be missing from the typical Japan-U.S. comparison is recognition that HRM and other
business models naturally change over time and each has its pluses and minuses, Jacoby says. Just as the
shareholder-value model gave rise to the excesses that led to Enron and WorldCom, he notes, Japanese
firms such as Canon and Toyota have done well by the traditional Japanese model.
As Japanese companies turn more toward other Asian nations for trade - especially bourgeoning China -
pressure to adopt western HR models may decrease, affecting the trend to succumb to such pressure. And
as Japan's government continues to spearhead regional HR development through Official Development
Assistance (ODA) and the Employment and Human Resources Development Organization of Japan
(EHDO), it may in the long run do as much exporting of HR-management models as importing. It's a
possibility that extends well beyond Japanese governmental aid.
In the private sector, just last year Japanese HR giant Recruit Co., Ltd. tied up with its Chinese counterpart
51jobs.com to collaborate on the development of 51job's products and services in China. The deal allows
Recruit to buy up to a 40 percent stake in the firm and it will be sharing its management experience as well
as technical expertise, according to ChinaTechNews.com. It would seem prudent for those working in
Japan's HR industry to learn, as well as look, before they leap - especially if they are eyeing future prospects
elsewhere in the region.
Q. 6 (b). Explain briefly the Purchasing power parity theory.
Ans: Purchasing power parity (PPP) is a theory of long-term equilibrium exchange rates based on relative
price levels of two countries. The idea originated with the School of Salamanca in the 16th century and was
developed in its modern form by Gustav Cassel in 1918.[2] The concept is founded on the law of one price;
the idea that in absence of transaction costs, identical goods will have the same price in different markets.
In its "absolute" version, the purchasing power of different currencies is equalized for a given basket of
goods. In the "relative" version, the difference in the rate of change in prices at home and abroad—the
difference in the inflation rates—is equal to the percentage depreciation or appreciation of the exchange rate.
The best-known and most-used purchasing power parity exchange rate is the Geary-Khamis dollar (the
"international dollar").
PPP exchange rate (the "real exchange rate") fluctuations are mostly due to different rates of inflation
between the two economies. Aside from this volatility, consistent deviations of the market and PPP exchange
rates are observed, for example (market exchange rate) prices of non-traded goods and services are usually
lower where incomes are lower. (A U.S. dollar exchanged and spent in India will buy more haircuts than a
dollar spent in the United States). Basically, PPP deduces exchange rates between currencies by finding
goods available for purchase in both currencies and comparing the total cost for those goods in each
currency.
There can be marked differences between PPP and market exchange rates. For example, the World Bank's
World Development Indicators 2005 estimated that in 2003, one Geary-Khamis dollar was equivalent to
about 1.8 Chinese yuan by purchasing power parity[5]—considerably different from the nominal exchange
rate. This discrepancy has large implications; for instance, GDP per capita in the People's Republic of China
is about US$1,800 while on a PPP basis it is about US$7,204. This is frequently used to assert that China is
the world's second-largest economy, but such a calculation would only be valid under the PPP theory. At the
Page 63
2010(Fall)
other extreme, Denmark's nominal GDP per capita is around US$62,100, but its PPP figure is only
US$37,304.
Types of PPP
There are two types of PPP. They are:
Absolute Purchasing Power Parity that is based on the maintenance of equal prices in two concerned
countries.
Relative PPP describes the inflation rate. This describes the appreciation rate of a currency, which is decided
by calculating the difference between the exchange rates of two countries.
Calculation of PPP
Purchasing Power Parity is calculated by comparing the price of an identical good in both the countries. The
“Hamburger Index” in The Economist magazine presents the index in a jovial manner every year. But the
calculation is not free from problem because consumers in every country consume different types of
products. Another index is the iPOD Index. The iPOD is considered to be one of the standard consumer
products these days. Hence PPP can be calculated by comparing its price.
The PPP is unable to display the right picture of the standard of living. There are certain difficulties
since the PPP number vary with specific amount of goods. PPP is very often utilize
Assignment (Set-1)
Subject code: MI0033
Page 64
2010(Fall)
Software Engineering
Q. 1 : Discuss the Objectives & Principles Behind Software Testing.
Ans. Software Testing Fundamentals:
Testing presents an interesting anomaly for the software engineer. During earlier software engineering
activities, the engineer attempts to build software from an abstract concept to a tangible product. Now comes
testing. The engineer creates a series of test cases that are intended to “demolish” the software that has been
built. Testing is the one step in the software process that could be viewed (psychologically, at least) as
destructive rather than constructive.
(i) Testing Objectives :
In an excellent book on software testing, Glen Myers states a number of rules that can serve well as
testing objectives:
1. Testing is a process of executing a program with the intent of finding an error.
2. A good test case is one that has a high probability of finding an as return discovered error.
3. A successful test is one that uncovers an as-yet-undiscovered error. These objectives imply a
dramatic change in viewpoint. They move counter to the commonly held view that a successful test
is one in which no errors are found. Our objective is to design tests that systematically uncover
different classes of errors, and to do so with a minimum amount of time and effort. If testing is
conducted successfully (according to the objectives stated previously), it will uncover errors in the
software. As a secondary benefit, testing demonstrates that software functions appear to be
working according to specification, that behavioral and performance requirements appear to have
been met. In addition, data collected as testing is conducted provide a good indication of software
reliability, and some indication of software quality as a whole. But testing cannot show the absence
of errors and defects, it can show only that software errors and defects are present. It is important
to keep this (rather gloomy) statement in mind as testing is being conducted.
(ii) Testing Principles :
Before applying methods to design effective test cases, a software engineer must understand the basic
principles that guide software testing. Davis [DAV95] suggests a set of testing principles that have
been adapted for use in this book :
- All tests should be traceable to customer requirements. As we have seen, the objective of
software testing is to uncover errors. It follows that the most severe defects (from the customer’s
point of view) are those that cause the program to fail to meet its requirements.
- Tests should be planned long before testing begins. Test planning can begin as soon as the
requirements model is complete. Detailed definition of test cases can begin as soon as the design
Page 65
2010(Fall)
model has been solidified. Therefore, all tests can be planned and designed before any code has
been generated.
- The Pareto principle applies to software testing. Stated simply, the Pareto principle implies that
80 percent of all errors uncovered during testing will most likely be traceable to 20 percent of all
program components. The problem, of course, is to isolate these suspect components and to
thoroughly test them.
- Testing should begin “in the small” and progress toward testing “in the large”. The first
tests planned and executed generally focus on individual components. As testing progresses,
focus shifts in an attempt to find errors in integrated clusters of components and ultimately in the
entire system.
- Exhaustive testing is not possible. The number of path permutations for even a moderately
sized program is exceptionally large. For this season, it is impossible to execute every combination
of paths during testing. It is possible, however, to adequately cover program logic and to ensure
that all conditions in the component-level design have been exercised.
- To be most effective, testing should be conducted by an independent third party. By most

effective, we mean testing that has the highest probability of finding errors (the primary objective of
testing). For reasons that have been introduced earlier in this unit, the software engineer who
created the system is not the best person to conduct all tests for the software.
(iii) Testability :
In ideal circumstances, a software engineer designs a computer program, a system, or a product with
“testability” in mind. This enables the individuals charged with testing to design effective test cases
more easily. But what is testability ? James Bach describes testability in the following manner.
Software testability is simply how easily [a computer program] can be tested. Since testing is so
profoundly difficult, it pays to know what can be done to streamline it. Sometimes programmers are
willing to do things that will help the testing process and a checklist of possible design points, features,
etc., can be useful in negotiating with them. There are certainly metrics that could be used to measure
testability in most of its aspects. Sometimes, testability is used to mean how adequately a particular set
of tests will cover the product. It’s also used by the military to mean how easily a tool can be checked
and repaired in the field. Those two meanings are not the same as software testability. The checklist
that follows provides a set of characteristics that lead to testable software.
Q. 2. Discuss the CMM 5 Levels for Software Process.
Ans. The Software Process :
In recent years, there has been a significant emphasis on “process maturity”. The Software Engineering
Institute (SEI) has developed a comprehensive model predicated on a set of software engineering
capabilities that should be present as organizations reach different levels of process maturity. To determine
an organization’s current state of process maturity, the SEI uses an assessment that results in a five point
Page 66
2010(Fall)
grading scheme. The grading scheme determines compliance with a capability maturity model (CMM)
[PAU93] that defines key activities required at different levels of process maturity. The SEI approach
provides a measure of the global effectiveness of a company’s software engineering practices, and
establishes five process maturity levels that are defined in the following manner :
Level 1 : Initial – The Software process is characterized as ad hoc and occasionally even chaotic. Few
processes are defined, and success depends on individual effort.
Level 2 : Repeatable – Basic project management processes are established to track cost, schedule, and
functionality. The necessary process discipline is in place to repeat earlier successes on projects with similar
applications.
Level 3 : Defined – The software process for both management and engineering activities is documented,
standardized, and integrated into an organized-wide software process. All projects use a documented and
approved version of the organizations process for developing and supporting software. This level includes all
characteristic defined for level 2.
Level 4 : Managed – Detailed measures of the software process and product quality are collected. Both the
software process and products are quantitatively understood and controlled using detailed measures. This
level includes all characteristics defined for
level 3.
Level 5 : Optimizing – Continuous process improvement is enabled by quantitative feedback from the
process and from testing innovative ideas and technologies. This level includes all characteristics defined for
level 4. The five levels defined by the SEI were derived as a consequence of evaluating responses to the SEI
assessment questionnaire that is based on the CMM. The results of the questionnaire are distilled to a single
numerical grade that provides an indication of an organization’s process maturity.
The SEI has associated key process areas (KPAs) with each of the maturity levels. The KPAs describe those
software engineering functions (e.g., software project planning, requirements management) that must be
present to satisfy good practice at a particular level. Each KPA is described by identifying the following
characteristics :
- Goals – the overall objectives that the KPA must achieve.
- Commitments – requirements (imposed on the organization) that must be met to achieve the goals,
or provide proof of intent to comply with the goals.
- Abilities – those things must be in place (organizationally and technically) to enable the organization
to meet the commitments.
- Activities – the specific tasks required to achieve the KPA function.
- Methods for monitoring implementation – the manner in which the activities are monitored as
they are put into place.
- Methods for verifying implementation – the manner in which proper practice for the KPA can be
verified.
Page 67
2010(Fall)
Q.3. Discuss the Water Fall Model for Software Development.
Ans. The Linear Sequential Model :
Sometimes called the classic life cycle or the waterfall model, the linear sequential model suggests a
systematic, sequential approach to software development that begins at the system level and progresses
through analysis, design, coding testing, and support. The linear sequential model for software engineering.
Modeled after a conventional engineering cycle, the linear sequential model encompasses the following
activities :
System / information engineering and modeling – because software is always part of a larger system (or
business), work begins by establishing requirements for all system elements and then allocating some
subset of these requirements to software. This system view is essential when software must interact with
other elements such as hardware, people, and databases. System engineering and analysis encompass
requirements gathering at the system level, with a small amount of top level design and analysis. Information
engineering encompasses requirements gathering at the strategic business level and at the business area
level.
Software requirements analysis : The requirements gathering process is intensified and focused
specifically on software. To understand the nature of the program(s) to be built, the software engineer
(“analyst”) must understand the information domain for the software, as well as required function, behavior,
performance, and interface. Requirements for both the system and the software are documented and
reviewed with the customer.
Design – Software design is actually a multistep process that focuses on four distinct attributes of a program
: data structure, software architecture, interface representations, and procedural (algorithmic) detail. The
design process translates requirements into a representation of the software that can be assessed for quality
before coding begins. Like requirements, the design is documented and becomes part of the software
configuration.
Code generation – The design must be translated into a machine-readable form. The code generation step
performs this task. If design is performed in a detailed manner, code generation can be accomplished
mechanistically.
Test – Once the code has been generated, program testing begins. The testing process focuses on the
logical internals of the software, ensuring that all statements have been tested, and on the functional
externals; that is, conducting tests to uncover errors and ensure that defined input will produce actual results
that agree with the required results.
Support – Software will undoubtedly undergo change after it is delivered to the customer (a possible
exception is embedded software). Change will occur because errors have been encountered, because the
software must be adapted to accommodate changes in its external environment (e.g. a change required
because of a new operating system or peripheral device), or because the customer requires functional or
performance enhancements. Software support / maintenance reapplies each of the preceding phases to an
existing program rather than a new one. The linear sequential model is the oldest and the most widely used
Page 68
2010(Fall)
paradigm for software engineering. However, criticism of the paradigm has caused even active supporters to
questions its efficacy [HAN95]. Among the problems that are sometimes encountered when the linear
sequential model is applied are :
1. Real projects rarely follow the sequential flow that the model proposes. Although the linear model
can accommodate iteration, it does so indirectly. As a result, changes can cause confusion as the
project team proceeds.
2. It is often difficult for the customer to state all requirements explicitly. The linear sequential model
requires this and has difficulty accommodating the natural uncertainty that exists at the beginning of
many projects.
3. The customer must have patience. A working version of the program(s) will not be available until late
in the project time-span. A major blunder, if undetected until the working program is reviewed, can
be disastrous.
In an interesting analysis of actual projects Bradac [BRA94], found that the linear nature of the classic life
cycle leads to “blocking states” in which some project team members must wait for other members of the
team to complete dependent tasks. In fact, the time spent waiting can exceed the time spent on productive
work ! The blocking state tends to be more prevalent at the beginning and end of a linear sequential process.
Each of these problems is real. However, the classic life cycle paradigm has a definite and important place in
software engineering work. It provides a template into which methods for analysis, design, coding, testing,
and support can be placed. The classic life cycle remains a widely used procedural model for software
engineering. While it does have weaknesses, it is significantly better than a haphazard approach to software
development.
Q. 4 . Explain the different types of Software Measurement Techniques.
Ans. Software Measurement Techniques :

Measurements in the physical world can be categorized in two ways : direct measures (e.g. the length of a
bolt) and indirect measures (e.g. the “quality” of bolts produced, measured by counting rejects). Software
Page 69
2010(Fall)
metrics can be categorized similarly. Direct measures of the software engineering process include cost and
effort applied. Direct measures of the product include lines of code (LOC) produced, execution speed,
memory size, and defects reported over some set period of time. Indirect measures of the product include
functionality, quality, complexity, efficiency, reliability, maintainability, and many other “- abilities”.
1. Size Oriented Metrics :
Size-oriented software metrics are derived by normalizing quality and / or productivity measures by
considering the size of the software that has been produced. If a software organization maintains simple
records, a table of size-oriented measures can be created. The table lists each software development project
that has been completed over the past few years and corresponding measures for that project. 12,100 lines
of code were developed with 24 person-months of effort at a cost $168,000. It should be noted that the effort
and cost recorded in the table represent all software engineering activities (analysis, design, code, and test),
not just coding. Further information for project alpha indicates that 365 pages of documentation were
developed, 134 errors were recorded before the software was released, and 29 defects were encountered
after release to the customer within the first year of operation. Three people worked on the development of
software for project alpha.
2. Function Oriented Metrics :
Function-oriented software metrics use a measure of the functionality delivered by the application as a
normalization value. Since ‘functionality’ cannot be measured directly, it must be derived indirectly using
other direct measures. Function-oriented metrics were first proposed by Albrecht [ALB79], who suggested a
measure called the function point. Function points are derived using an empirical relationship based on
countable (direct) measures of software’s information domain and assessments of software complexity.
3. Extended Function Point Metrics :
The function point measure was originally designed to be applied to business information systems
applications. To accommodate these applications, the data dimension (the information domain values
discussed previously) was emphasized to the exclusion of the functional and behavioral (control)
dimensions. For this reason, the function point measure was inadequate for many engineering and
embedded systems (which emphasize function and control). A number of extensions to the basic function
point measure have been proposed to remedy this situation.
Q. 5. Explain the COCOMO Model & Software Estimation Technique.
Ans. Software Estimation Technique :
Software cost and effort estimation will never be an exact science. Too many variables – human, technical,
environmental, political – can affect the ultimate cost of software and effort applied to develop it. However,
software project estimation can be transformed from a black art to a series of systematic steps that provide
estimates with acceptable riks.
To achieve reliable cost and effort estimates, a number of options arise :
Page 70
2010(Fall)
1. Delay estimation until late in the project (obviously, we can achieve 100% accurate estimates after the
project is complete !).
2. Base estimates on similar projects that have already been completed.
3. Use relatively simple decomposition techniques to generate project cost and effort estimates.
4. Use one or more empirical models for software cost and effort estimation. Unfortunately, the first option,
however attractive, is not practical. Cost estimates must be provided “up front”. However, we should
recognize that the longer we wait, the more we know, and the more we know, the less likely we are to make
serious errors in our estimates.
The second option can work reasonably well, if the current project is quite similar to past efforts and other
project influences (e.g. the customer, business conditions, the SEE, deadlines) are equivalent. Unfortunately,
past experience has not always been a good indicator of future results.
The COCOMO Model :
In his classic book on “software engineering economics”, Barry Boehm [BOE81] introduced a hierarchy of
software estimation models bearing the name COCOMO, for Constructive Cost Model. The original
COCOMO model became one of the most widely used an discussed software cost estimation models in the
industry. It has evolved into a more comprehensive estimation model, called COCOMO II [BOE96, BOE00].
Like its predecessor, COCOMO II is actually a hierarchy of estimation models that address the following
areas :
Application composition model : Used during the early stages of software engineering, when prototyping
of user interfaces, consideration of software and system interaction, assessment of performance, and
evaluation of technology maturity are paramount.
Early design stage model : Used once requirements have been stabilized and basic software architecture
has been established.
Post-architecture-stage model : Used during the construction of the software.
Q. 6 : Write a note on myths of Software.

Ans. Most knowledgeable professionals recognize myths for what they are – misleading attitudes that
have caused serious problems for managers and technical people alike. However, old attitudes and habits
are difficult to modify, and remnants of software myths are till believed.
Primarily, there are three types of software myths, all the three are stated below :
1. Management Myths – Managers with software responsibility, like managers in most disciplines, are often
under pressure to maintain budgets, keep schedules from slipping, and improve quality. Like a drowning
person who grasps at a straw, a software manager often grasps at belief in a software myth, if that belief will
lessen the pressure (even temporarily).
Myth – We already have a book that’s full of standards and procedures for building software; won’t that
provide my people with everything they need to know ?
Page 71
2010(Fall)
Reality – The book of standards may very well exist, but is it used ? Are software practitioners aware of its
existence ? Does it reflect modern software engineering practice ? Is it complete? Is it streamlined to
improve time to delivery while still maintaining a focus on quality ? In many cases, the answer to all of these
questions is “no”.
Myth – May people have state-of-the-art software development tools, after all, we buy them the newest
computers.
Reality – It takes much more than the latest model mainframe, workstation, or PC to do high-quality software
development. Computer-aided software engineering (CASE) tools are more important than hardware for
achieving good quality and productivity, yet the majority of software developers still do not use them
effectively.
Myth – If we get behind schedule, we can add more programmers and catch up (sometimes called the
Mongolian horde concept).
Reality – Software development is not a mechanistic process like manufacturing. In the words of Brooks
[BR075] : “adding people to a late software project makes it later”. At first, this statement may seem
counterintuitive. However, as new people are added, people who were working must spend time educating
the newcomers, thereby reducing the amount of time spent on productive development effort. People can be
added but only in a planned and well-coordinated manner.
Myth – If I decide to outsource the software project to a third party, I can just relax and let that firm build it.
Reality – If an organization does not understand how to manage and control software projects internally, it
will invariable struggle when it outsource software projects.
2. Customer Myths – A customer who requests computer software may be a person at the next desk, a
technical group down the hall, the marketing / sales department, or an outside company that has requested
software under contract. In many cases, the customer believes myths about software because software
managers and practitioners do little to correct misinformation. Myths lead to false expectations (by the
customer) and ultimately, dissatisfaction with the developer.
Myth – A general statement of objectives is sufficient to begin writing programs – we can fill in the details
later.
Reality – A poor up-front definition is the major cause of failed software efforts. A formal and detailed
description of the information domain, function, behavior, performance, interfaces, design constraints, and
validation criteria is essential. These characteristics can be determined only after thorough communication
between customer and developer.
Myth – Project requirements continually change, but change can be easily accommodated because software
is flexible.
Reality – It is true that software requirements change, but the impact of change varies with the time at which
it is introduced. If serious attention is given to up-front definition, early requests for change can be
accommodated easily. The customer can review requirements and recommend modifications with relatively
Page 72
2010(Fall)
little impact on cost. When changes are requested during software design, the cost impact grows rapidly.
Resources have been committed and a design framework has been established. Change can cause
upheaval that requires additional resources and major design modification, that is, additional cost. Changes
in function, performance, interface, or other characteristics during implementation (code and test) have a
severe impact on cost. Change, when requested after software is in production, can be over an order of
magnitude more expensive than the same change requested earlier.
3. Practitioner’s Myths – Myths that are still believed by software practitioners have been fostered by 50
years of programming culture. During the early days of software, programming was viewed as an art form.
Old ways and attitudes die hard.
Myth – Once we write the program and get it to work, our job is done.
Reality – Someone once said that “the sooner you begin ‘writing code’, the longer it’ll take you to get done”.
Industry data ([LIE80], [JON91], [PUT97]) indicates that between 60 and 80 percent of all effort expended on
software will be expended after it is delivered to the customer for the first time.
Myth – Until I get the program “running” I have no way of assessing its quality.
Reality – One of the most effective software quality assurance mechanisms can be applied from the
inception of a project – the formal technical review. Software reviews are a “quality filter” that have been
found to be more effective than testing for finding certain classes of software defects.
Myth – The only deliverable work product for a successful project is the working program.
Reality – A working program is only one part of a software configuration that includes many elements.
Documentation provides a foundation for successful engineering and, more importantly, guidance for
software support.
Myth – Software engineering will make up creates voluminous and unnecessary documentation and will
invariably slow us down.
Reality – Software engineering is not about creating documents. It is about creating quality. Better quality
leads to reduced rework. And reduced rework results in faster delivery times. Many software professionals
recognize the fallacy of the myths just described. Regrettably, habitual attitudes and methods foster poor
management and technical practices, even when reality dictates a better approach. Recognition of software
realities is the first step towards formulation of practical solutions for software engineering.
Software Myths :
Myth is defined as "widely held but false notation" by the oxford dictionary, so as in other fields
software arena also has some myths to demystify. Pressman insists "Software myths- beliefs about software
and the process used to build it- can be traced to earliest days of computing. Myths have a number of
attributes that have made them insidious." So software myths prevail but though they do are not clearly
Page 73
2010(Fall)
visible they have the potential to harm all the parties involved in the software development process mainly
the developer team.
Tom DeMarco expresses “In the absence of meaningful standards, a new industry like software
comes to depend instead on folklore." The given statement points out that the software industry caught pace
just some decades back so it has not matured to a formidable level and there are no strict standards in
software development. There does not exist one best method of software development that ultimately
equates to the ubiquitous software myths.
Primarily, there are three types of software myths, all the three are stated below:
1. Management Myth
2. Customer Myth
3. Practitioner/Developer Myth
Before defining the above three myths one by one lets scrutinize why these myths occur on the first
place. The picture below tries to clarify the complexity of the problem of software development requirement
analysis mainly between the developer team and the clients.
Page 74
2010(Fall)
The above pictures elucidate that the techies understand the problem differently than what it really is and it
results to a different solution as the problem itself is misunderstood. So the problem understanding i.e.
requirement analysis must be done properly to avoid any problems in later stages as it will have devastating
effects.
1. Management Myths: Managers with software responsibility, like managers in most disciplines, are often
under pressure to maintain budgets, keep schedules from slipping, and improve quality. Like a drowning
person who grasps at a straw, a software manager often grasps at belief in a software myth, if those beliefs
will lessen the pressure (even temporarily). Some common managerial myths stated by Roger Pressman
include:
I. We have standards and procedures for building software, so developers have
everything they need to know.
II. We have state-of-the-art software development tools; after all, we buy the
Page 75
2010(Fall)
latest computers.
III. If we're behind schedule, we can add more programmers to catch up.
IV. A good manger can manage any project.
The managers completely ignore that fact that they are working on something intangible but very important to
the clients which invites more trouble than solution. So a software project manger must have worked well
with the software development process analyzing the minute deals associated with the field learning the
nitty-gritty and the tips and trick of the trade. The realities are self understood as it is already stated how
complex the software development process is.
2. Customer Myths: A customer who requests computer software may be a person at the next desk, a
technical group down the hall, the marketing/sales department, or an outside company that has requested
software under contract. In many cases, the customer believes myths about software because software
managers and practitioners do little to correct misinformation. Myths lead to false expectations (by the
customer) and, ultimately, dissatisfaction with the developer. Commonly held myths by the clients are:
I.A general statement of objectives is sufficient to begin writing programs - we can fill in the details later.
II. Requirement changes are easy to accommodate because software is flexible.
III. I know what my problem is; therefore I know how to solve it.
This primarily is seen evidently because the clients do not have a first hand
experience in software development and they think that it's an easy process.
3. Practitioner/ Developer Myths: Myths that are still believed by software practitioners have been fostered by
over 50 years of programming culture. During the early days of software, programming was viewed as an art
form. Old ways and attitudes die hard. A malpractice seen is developers are that they think they know
everything and neglect the peculiarity of each problem.
I. If I miss something now, I can fix it later.
II. Once the program is written and running, my job is done.
III. Until a program is running, there's no way of assessing its quality.
IV. The only deliverable for a software project is a working program.
Every developer should try to get all requirement is relevant detail to
effectively design and code the system.
Page 76
2010(Fall)
Some misplaced assumptions that intensify the myths are listed below:
1. All requirements can be pre-specified
2. Users are experts at specification of their needs
3. Users and developers are both good at visualization
4. The project team is capable of unambiguous communication
On the whole, realities are always different from the myths. So the myths must be demystified and work
should be based on systematic, scientific and logical bases than the irrational myths. The systemic view
must be considered to determine the success of any software project its not only the matter of hard skills but
soft skills of the developer team also matter to come up with a efficient system.
Assignment (Set-2)
Software Engineering
Q 1. Quality and reliability are related concepts but are fundamentally different in a number
of ways. Discuss them.
Ans: Software quality is defined as conformance to explicitly stated functional and non-functional
requirements, explicitly documented development standards, and implicit characteristics that are expected of
all professionally developed software.
This definition emphasizes upon three important points:
• Software requirements are the foundation from which quality is measured. Lack of conformance is lack of
quality
Page 77
2010(Fall)
• Specified standards define a set of development criteria that guide the manner in which software is
engineered. If the criteria are not followed, lack of quality will almost surely result.
• A set of implicit requirements often goes unmentioned (ease of use, good maintainability etc.)
DeMarco defines product quality as a function of how much it changes the world for the better.
So, there are many different way to look at the quality.
Quality Assurance
Goal of quality assurance is to provide the management with the necessary data to be informed about
product quality. It consists of auditing and reporting functions of management.
Cost of quality
does quality assurance add any value.
If we try to prevent problems, obviously we will have to incur cost. This cost includes:
• Quality planning
• Formal technical reviews
• Test equipment
• Training
The cost of appraisal includes activities to gain insight into the product condition. It compare these numbers
to the cost of defect removal once the product has been shipped to the customer. Mostly I think it is profitable
Software Reliability:
“Probability of failure free operation of a computer program in a specified environment for a specified time”.
For example, a program X can be estimated to have a reliability of 0.96 over 8 elapsed hours.
Software reliability can be measured, directed, and estimated using historical and development data. The
key to this measurement is the meaning of term failure. Failure is defined as non-conformance to software
requirements. It can be graded in many different ways as shown below:
• From annoying to catastrophic
• Time to fix from minutes to months
• Ripples from fixing
It is also pertinent to understand the difference between hardware and software reliability. Hardware
reliability is predicted on failure due to wear rather than failure due to design. In the case of software, there is
no wear and tear. The reliability of software is determined by Mean time between failure (MTBF). MTBF is
calculated as:
MTBF = MTTF + MTTR
Page 78
2010(Fall)
Where MTTF is the Mean Time to Failure and MTTR is the Mean time required to Repair.
Arguably MTBF is far better than defects/kloc as each error does not have the same failure rate and the user
is concerned with failure and not with total error count.
A related issue is the notion of availability. It is defined as the probability that a program is operating
according to requirements at a given point in time. It can be calculated as
Availability = (MTTF/MTBF) x 100
and clearly depends upon MTTR.
Q.2. Explain Version Control & Change Control.

Ans. Version Control:
Code evolves. As a project moves from first-cut prototype to deliverable, it goes through multiple cycles in
which you explore new ground, debug, and then stabilize what you've accomplished. And this evolution
doesn't stop when you first deliver for production. Most projects will need to be maintained and enhanced
past the 1.0 stage, and will be released multiple times. Tracking all that detail is just the sort of thing
computers are good at and humans are not.
Why Version Control?
Code evolution raises several practical problems that can be major sources of friction and drudgery — thus a
serious drain on productivity. Every moment spent on these problems is a moment not spent on getting the
design and function of your project right.
Perhaps the most important problem is reversion. If you make a change, and discover it's not viable, how
can you revert to a code version that is known good? If reversion is difficult or unreliable, it's hard to risk
making changes at all (you could trash the whole project, or make many hours of painful work for yourself).
Almost as important is change tracking. You know your code has changed; do you know why? It's easy to
forget the reasons for changes and step on them later. If you have collaborators on a project, how do you
know what they have changed while you weren't looking, and who was responsible for each change?
Amazingly often, it is useful to ask what you have changed since the last known-good version, even if you
have no collaborators. This often uncovers unwanted changes, such as forgotten debugging code. I now do
this routinely before checking in a set of changes.
-- Henry Spencer
Another issue is bug tracking. It's quite common to get new bug reports for a particular version after the code
has mutated away from it considerably. Sometimes you can recognize immediately that the bug has already
been stomped, but often you can't. Suppose it doesn't reproduce under the new version. How do you get
back the state of the code for the old version in order to reproduce and understand it?
To address these problems, you need procedures for keeping a history of your project, and annotating it with
comments that explain the history. If your project has more than one developer, you also need mechanisms
for making sure developers don't overwrite each others' versions.
Version Control by Hand
The most primitive (but still very common) method is all hand-hacking. You snapshot the project periodically
by manually copying everything in it to a backup. You include history comments in source files. You make
verbal or email arrangements with other developers to keep their hands off certain files while you hack them.
As with most hand-hacking, this method does not scale well. It restricts the granularity of change tracking,
and tends to lose metadata details such as the order of changes, who did them, and why. Reverting just a
part of a large change can be tedious and time consuming, and often developers are forced to back up
farther than they'd like after trying something that doesn't work.
Page 79
2010(Fall)
Automated Version Control

To avoid these problems, you can use a version-control system (VCS), a suite of programs that automates
away most of the drudgery involved in keeping an annotated history of your project and avoiding modification
conflicts.
Most VCSs share the same basic logic. To use one, you start by registering a collection of source files —
that is, telling your VCS to start archive files describing their change histories. Thereafter, when you want to
edit one of these files, you have to check out the file — assert an exclusive lock on it. When you're done, you
check in the file, adding your changes to the archive, releasing the lock, and entering a change comment
explaining what you did.
Most of the rest of what a VCS does is convenience: labeling, and reporting features surrounding these basic
operations, and tools which allow you to view differences between versions, or to group a given set of
versions of files as a named release that can be examined or reverted to at any time without losing later
changes.
Another problem is that some kinds of natural operations tend to confuse VCSs. Renaming files is a
notorious trouble spot; it's not easy to automatically ensure that a file's version history will be carried along
with it when it is renamed. Renaming problems are particularly difficult to resolve when the VCS supports
branching.
Change Control:
Change control within Quality management systems (QMS) and Information Technology (IT) systems is a
formal process used to ensure that changes to a product or system are introduced in a controlled and
coordinated manner. It reduces the possibility that unnecessary changes will be introduced to a system
without forethought, introducing faults into the system or undoing changes made by other users of software.
The goals of a change control procedure usually include minimal disruption to services, reduction in back-out
activities, and cost-effective utilization of resources involved in implementing change.
Change control is currently used in a wide variety of products and systems. For Information Technology (IT)
systems it is a major aspect of the broader discipline of change management. Typical examples from the
computer and network environments are patches to software products, installation of new operating systems,
upgrades to network routing tables, or changes to the electrical power systems supporting such
infrastructure.
Certain experts describe change control as a set of six steps[who?]:
Record / Classify
Assess
Plan
Build / Test
Implement
Close / Gain Acceptance
Q. 3. Discuss the SCM Process.
Ans: In software engineering, software configuration management (SCM) is the task of tracking and
controlling changes in the software. Configuration management practices include revision control and the
establishment of baselines.
SCM concerns itself with answering the question "Somebody did something, how can one reproduce it?"
Often the problem involves not reproducing "it" identically, but with controlled, incremental changes.
Answering the question thus becomes a matter of comparing different results and of analysing their
differences. Traditional configuration management typically focused on controlled creation of relatively simple
Page 80
2010(Fall)
products. Now, implementers of SCM face the challenge of dealing with relatively minor increments under
their own control, in the context of the complex system being developed.
The goals of SCM are generally:[citation needed]
Configuration identification - Identifying configurations, configuration items and baselines.
Configuration control - Implementing a controlled change process. This is usually achieved by setting up a
change control board whose primary function is to approve or reject all change requests that are sent against
any baseline.
Configuration status accounting - Recording and reporting all the necessary information on the status of the
development process.
Configuration auditing - Ensuring that configurations contain all their intended parts and are sound with
respect to their specifying documents, including requirements, architectural specifications and user manuals.
Build management - Managing the process and tools used for builds.
Process management - Ensuring adherence to the organization's development process.
Environment management - Managing the software and hardware that host the system.
Teamwork - Facilitate team interactions related to the process.
Defect tracking - Making sure every defect has traceability back to the source.
Effective Configuration Management can be defined as stabilising the evolution of software products and
process at key points in the life cycle. The focus of CM includes:
Identification of Artefacts
Early identification and change control of artefacts and work products is integral to the project. The
configuration manager needs to fully identify and control changes to all the elements that are required to
recreate and maintain the software product.
Version Control
The primary goal of version control is to identify and manage project elements as they change over time.
The Configuration Manager should establish a version control library to maintain all lifecycle entities. This
library will ensure that changes (deltas) are controlled at their lowest atomic level eg documents, source files,
scripts and kits etc.
Development Streaming (Branching)
To provide some level of stability and allow fluidity of parallel development (streaming) it is quite normal for
project development to be split into branches (development groups).
The CM manager has to identify what branches will be required and ensure they are appropriately set up (eg
security etc).
Baselining
Baselining provides the division with a concise picture of the project artifacts and relationships at a particular
instance in time. It provides an official foundation on which subsequent work can be based, and to which only
authorized changes can be made.
Page 81
2010(Fall)
Through baselining (i.e. labelling, tagging) all constituent project components are aligned, uniquely
identifiable and reproducible at both the atomic level (eg file) and at the higher kit levels.
Reasons for baselining include:
A baseline supports ease of roll back
A baseline improves CM managers ability to create change reports etc
A baseline supports creation of new parallel branches (e.g. dev branches)
A baseline supports troubleshooting and element comparison
A baseline provides a stable bill of material for the build system
Build Management
The fundamental objective of the build management process is to deliver a disciplined and automated build
process.
Activities to consider:
Create automated build scripts (i.e. fetching from repository)
Enforce baselining before all formal builds (support bill of materials/traceability)
Set up stable build machines
Packaging
Typically the packaging process (see next section) will by synonymous [or tightly coupled] with the build
process i.e. the build process will do packaging automatically after the build is complete.
Primary objectives of packaging are:
Manageable (i.e. often a single zipped up file or exe)
Reusable (i.e. Try to avoid need for rebuild)
Secure (i.e. Packages should be free from malicious or accidental modification)
Deployment
The configuration manager will typically be involved in the deployment process. Primary considerations
include:
Ensuring deployment automated (reducing possibility for manual errors).
Promoting best practice concepts concept like promotion based releases (opposed to environmental
rebuilds).
Ensuring releases are authorised and appropriate windows selected for deployment.
Providing streamlined rollback mechanism in case of problem.
Change Request Management
Change Request management can be described as management of change/enhancement requests.
Page 82
2010(Fall)
Typically the Configuration Manger should set up a repository to manage these requests and support
activities like status tracking, assignment etc.
Issue Tracking
Issue tracking is the formal tacking of problems/defects on your systems or environments.
Typically the Configuration Manger should set up a repository to track these problems as they occur, and
track their status to eventual closure.
Q 4. Explain
i. Software doesn’t Wear Out.
ii. Software is engineered & not manufactured.
Ans: Software:
It is a document that describe the operation and use of the program. Data structure that enables the
program to manipulate the information.Instructions that when executed provides the desired features or
function.
Software can be categorized in two types
Generic software:
Generic software is those which are developed for a broad category of customers. These are the users
whose environment is well understood and common for all. This type of software sold in the open market
where they face several competitors.
Customized software:
This type of software is meant or developed keeping in mind the needs of a particular customer e.g. hospital
management system. These types of users have their own unique domain, environment and requirements.
Following are the characteristics of software engineering, we may say that:-
(I)Software doesn't wear out:
The hardware can wear out whereas software can't. In case of hardware we have a "bathtub" like
curve, which is a curve that lies in between failure-rate and time. In this curve, in the starting time there is
relatively high failure rate. But, after some period of time, defects get corrected and failure-rate drops to a
steady-state for some time period. But, the failure-rate again rises due to the effects of rain, dust,
temperature extreme and many other environment effects. The hardware begins to wear out.
Figure above depicts failure rate as a function of time for

hardware. The relationship often called the "bath tub curve"
indicates that hardware exhibits relatively high failure rates
early in its life (these failures are often attributable to design or
manufacturing defects); defects are corrected and the failure
rate drops to a steady-state level (ideally, quite low) for some
period of time. As time passes, however, the failure rate rises
again as hardware components suffer from the cumulative
effects of dust, vibration, abuse, temperature extremes, and
any other environmental maladies. Stated simply, the hardware
begins to wear out.
Page 83
2010(Fall)
Bath tub curve
Software is not suspected able to the environmental maladies that cause hardware to wear out. In, theory,
therefore, the failure rate curve for the software should take the form of the "idealized curve". Undiscovered
defects will cause high failure early in the life of a program. However these are corrected (ideally, without
introducing other errors) and the curve flattens. However, the implication is clear--software doesn't wear out.
But it does deteriorate!
(II) Software is not manufactured in the classical sense, but it is developed or engineered:
Software or hardware both get manufactured in the same manner and both of them uses the design
model to implement the product. The only difference is in their implementation part. They both differ in their
coding part. So, it is said that software is not manufactured but it is developed or engineered. The only
difference lies in the cost of both the hardware and software.
Although some similarities exist between software development and hardware manufacture, the two
activities are fundamentally different. In both activities, high quality is achieved through good design, but the
manufacturing phase for hardware can introduce quality problems that are nonexistent (or easily corrected)
for software. Both activities are dependent on the people, but the relationship between people applied and
work accomplished is entirely different. Both activities require the construction of a "product" but the
approaches are different. Software costs are concentrated in engineering. This means that software projects
can not be managed as if they were manufacturing projects.
Q. 5. Explain the Advantages of Prototype Model, & Spiral Model in Contrast to Water Fall
model.
Ans: Many life cycle models have been proposed so far. Each of them has some advantages as well as
some disadvantages. A few important and commonly used life cycle models are as follows:
• Classical Waterfall Model
• Iterative Waterfall Model
• Prototyping Model
• Evolutionary Model
• Spiral Model
Classical Waterfall Model
The classical waterfall model is intuitively the most obvious way to develop software. Though the classical
waterfall model is elegant and intuitively obvious, we will see that it is not a practical model in the sense that
it can not be used in actual software development projects. Thus, we can consider this model to be a
theoretical way of developing software. But all other life cycle models are essentially derived from the
classical waterfall model. So, in order to be able to appreciate other life cycle models, we must first learn the
classical waterfall model.
Classical waterfall model divides the life cycle into the following phases as shown below:
Feasibility study
Page 84
2010(Fall)
Requirements analysis
and specification
Design
Coding
Testing
Maintenance
Feasibility Study
The main aim of feasibility study is to determine whether it would be financially and technically feasible to
develop the product
• At first project managers or team leaders try to have a rough understanding of what is required to be
done by visiting the client side. They study different input data to the system and output data to be
produced by the system. They study what kind of processing is needed to be done on these data
and they look at the various constraints on the behaviour of the system.
• After they have an overall understanding of the problem, they investigate the different solutions that are
possible. Then they examine each of the solutions in terms of what kinds of resources are required,
what would be the cost of development and what would be the development time for each solution.
• Based on this analysis, they pick the best solution and determine whether the solution is feasible
financially and technically. They check whether the customer budget would meet the cost of the
product and whether they have sufficient technical expertise in the area of development.
The following is an example of a feasibility study undertaken by an organization. It is intended to give one a
feel of the activities and issues involved in the feasibility study phase of a typical software project.
Requirements Analysis and Specification
The aim of the requirements analysis and specification phase is to understand the exact requirements of the
customer and to document them properly. This phase consists of two distinct activities, namely
• Requirements gathering and analysis, and
• Requirements specification
The goal of the requirements gathering activity is to collect all relevant information from the customer
regarding the product to be developed with a view to clearly understand the customer requirements and
weed out the incompleteness and inconsistencies in these requirements.
The requirements analysis activity is begun by collecting all relevant data regarding the product to be
developed from the users of the product and from the customer through interviews and discussions. For
example, to perform the requirements analysis of a business accounting software required by an
organization, the analyst might interview all the accountants of the organization to ascertain their
requirements. The data collected from such a group of users usually contain several contradictions and
ambiguities, since each user typically has only a partial and incomplete view of the system. Therefore it is
necessary to identify all ambiguities and contradictions in the requirements and resolve them through further
discussions with the customer. After all ambiguities, inconsistencies, and incompleteness have been
resolved and all the requirements properly understood, the requirements specification activity can start.
During this activity, the user requirements are systematically organized into a Software Requirements
Specification (SRS) document.
Page 85
2010(Fall)
The customer requirements identified during the requirements gathering and analysis activity are
organized into an SRS document. The important components of this document are functional requirements,
the non-functional requirements, and the goals of implementation.
Design
The goal of the design phase is to transform the requirements specified in the SRS document into a
structure that is suitable for implementation in some programming language. In technical terms, during the
design phase the software architecture is derived from the SRS document. Two distinctly different
approaches are available: the traditional design approach and the object-oriented design approach.
Traditional design approach: Traditional design consists of two different activities; first a structured
analysis of the requirements specification is carried out where the detailed structure of the problem is
examined. This is followed by a structured design activity. During structured design, the results of structured
analysis are transformed into the software design.
Object-oriented design approach: In this technique, various objects that occur in the problem
domain and the solution domain are first identified, and the different relationships that exist among these
objects are identified. The object structure is further refined to obtain the detailed design.
Coding and Unit Testing
The purpose of the coding and unit testing phase (sometimes called the implementation phase) of software
development is to translate the software design into source code. Each component of the design is
implemented as a program module. The end-product of this phase is a set of program modules that have
been individually tested.
During this phase, each module is unit tested to determine the correct working of all the individual modules. It
involves testing each module in isolation as this is the most efficient way to debug the errors identified at this
stage.
Integration and System Testing
Integration of different modules is undertaken once they have been coded and unit tested. During the
integration and system testing phase, the modules are integrated in a planned manner.
The different modules making up a software product are almost never integrated in one shot. Integration is
normally carried out incrementally over a number of steps. During each integration step, the partially
integrated system is tested and a set of previously planned modules are added to it. Finally, when all the
modules have been successfully integrated and tested, system testing is carried out. The goal of system
testing is to ensure that the developed system conforms to the requirements laid out in the SRS document.
System testing usually consists of three different kinds of testing activities:
• α – testing: It is the system testing performed by the development team.
• β – testing: It is the system testing performed by a friendly set of customers.
• Acceptance testing: It is the system testing performed by the customer himself after product delivery to
determine whether to accept or reject the delivered product.
System testing is normally carried out in a planned manner according to the system test plan document.
The system test plan identifies all testing-related activities that must be performed, specifies the schedule of
testing, and allocates resources. It also lists all the test cases and the expected outputs for each test case.
Maintenance
Page 86
2010(Fall)
Maintenance of a typical software product requires much more than the effort necessary to develop the
product itself. Many studies carried out in the past confirm this and indicate that the relative effort of
development of a typical software product to its maintenance effort is roughly in the 40:60 ratio. Maintenance
involves performing any one or more of the following three kinds of activities:
• Correcting errors that were not discovered during the product development phase. This is called
corrective maintenance.
• Improving the implementation of the system, and enhancing the functionalities of the system according
to the customer’s requirements. This is called perfective maintenance.
• Porting the software to work in a new environment. For example, porting may be required to get the
software to work on a new computer platform or with a new operating system. This is called adaptive
maintenance.
Shortcomings of the Classical Waterfall Model
The classical waterfall model is an idealistic one since it assumes that no development error is ever
committed by the engineers during any of the life cycle phases. However, in practical development
environments, the engineers do commit a large number of errors in almost every phase of the life cycle. The
source of the defects can be many: oversight, wrong assumptions, use of inappropriate technology,
communication gap among the project engineers, etc. These defects usually get detected much later in the
life cycle. For example, a design defect might go unnoticed till we reach the coding or testing phase. Once a
defect is detected, the engineers need to go back to the phase where the defect had occurred and redo
some of the work done during that phase and the subsequent phases to correct the defect and its effect on
the later phases. Therefore, in any practical software development work, it is not possible to strictly follow the
classical waterfall model.
Prototyping Model
A prototype is a toy implementation of the system. A prototype usually exhibits limited functional capabilities,
low reliability, and inefficient performance compared to the actual software. A prototype is usually built using
several shortcuts. The shortcuts might involve using inefficient, inaccurate, or dummy functions. The shortcut
implementation of a function, for example, may produce the desired results by using a table look-up instead
of performing the actual computations. A prototype usually turns out to be a very crude version of the actual
system.
The Need for a Prototype
There are several uses of a prototype. An important purpose is to illustrate the input data formats, messages,
reports, and the interactive dialogues to the customer. This is a valuable mechanism for gaining better
understanding of the customer’s needs.
• how screens might look like
• how the user interface would behave
• how the system would produce outputs, etc.
This is something similar to what the architectural designers of a building do; they show a prototype of the
building to their customer. The customer can evaluate whether he likes it or not and the changes that he
would need in the actual product. A similar thing happens in the case of a software product and its
prototyping model.
Page 87
2010(Fall)
Spiral Model
The Spiral model of software development is shown in fig. 33.8. The diagrammatic representation of this
model appears like a spiral with many loops. The exact number of loops in the spiral is not fixed. Each loop
of the spiral represents a phase of the software process. For example, the innermost loop might be
concerned with feasibility study; the next loop with requirements specification; the next one with design, and
so on. Each phase in this model is split into four sectors (or quadrants)..
First quadrant (Objective Setting):
• During the first quadrant, we need to identify the objectives of the phase.
• Examine the risks associated with these objectives
Second quadrant (Risk Assessment and Reduction):
• A detailed analysis is carried out for each identified project risk.
• Steps are taken to reduce the risks. For example, if there is a risk that the requirements are
inappropriate, a prototype system may be developed
Third quadrant (Objective Setting):
• Develop and validate the next level of the product after resolving the identified risks.
Fourth quadrant (Objective Setting):
• Review the results achieved so far with the customer and plan the next iteration around the spiral.
• With each iteration around the spiral, progressively a more complete version of the software gets built.
technically challenging software products that are prone to several kinds of risks. However, this model is
much more complex than the other models. This is probably a factor deterring its use in ordinary projects.
Comparison of Different Life Cycle Models
The classical waterfall model can be considered as the basic model and all other life cycle models as
embellishments of this model. However, the classical waterfall model can not be used in practical
development projects, since this model supports no mechanism to handle the errors committed during any of
the phases.
This problem is overcome in the iterative waterfall model. The iterative waterfall model is probably the most
widely used software development model evolved so far. This model is simple to understand and use.
However, this model is suitable only for well-understood problems; it is not suitable for very large projects
and for projects that are subject to many risks.
The prototyping model is suitable for projects for which either the user requirements or the underlying
technical aspects are not well understood. This model is especially popular for development of the user-
interface part of the projects.
The evolutionary approach is suitable for large problems which can be decomposed into a set of modules for
incremental development and delivery. This model is also widely used for object-oriented development
Page 88
2010(Fall)
projects. Of course, this model can only be used if the incremental delivery of the system is acceptable to the
customer.
The spiral model is called a meta-model since it encompasses all other life cycle models. Risk handling is
inherently built into this model. The spiral model is suitable for development of technically challenging
software products that are prone to several kinds of risks. However, this model is much more complex than
the other models. This is probably a factor deterring its use in ordinary projects.
The different software life cycle models can be compared from the viewpoint of the customer. Initially,
customer confidence in the development team is usually high irrespective of the development model
followed. During the long development process, customer confidence normally drops, as no working product
is immediately visible. Developers answer customer queries using technical slang, and delays are
announced. This gives rise to customer resentment. On the other hand, an evolutionary approach lets the
customer experiment with a working product much earlier than the monolithic approaches. Another important
advantage of the incremental model is that it reduces the customer’s trauma of getting used to an entirely
new system. The gradual introduction of the product via incremental phases provides time to the customer to
adjust to the new product. Also, from the customer’s financial viewpoint, incremental development does not
require a large upfront capital outlay. The customer can order the incremental versions as and when he can
afford them.
Q. 6. Write a Note on Spiral Model.
Ans: SPIRAL Model:
While the waterfall methodology offers an orderly structure for software development,demands for reduced
time-to-market make its series steps inappropriate. The next evolutionary step from the waterfall is where the
various steps are staged for multiple deliveries or handoffs. The ultimate evolution from the water fall is the
spiral, taking advantage of the fact that development projects work best when they are both incremental and
iterative, where the team is able to start small and benefit from enlightened trial and error along the way. The
spiral methodology reflects the relationship of tasks with rapid prototyping, increased parallelism, and
concurrency in design and build activities. The spiral method should still be planned methodically, with tasks
and deliverables identified for each step in the spiral.
Page 89
2010(Fall)
The Spiral Model is the neo approach in IT project system development and was originally devised by Barry
W. Boehm through his article published in 1985 "A Spiral Model of Software Development and
Enhancement".
This model of development unites the features of the prototyping model with an iterative approach of system
development; combining elements of design and prototyping-in-stages. This model is an effort to combine
the advantages of top-down and bottom-up concepts highly preferential for large, exclusive, volatile, and
complex projects.
The term "spiral" is used to describe the process that is followed in this model, as the development
of the system takes place, the mechanisms go back several times over to earlier sequences, over
and over again, circulating like a spiral.
The spiral model represents the evolutionary approach of IT project system development and carries the
same activities over a number of cycles in order to elucidate system requirements and its solutions.
Similar to the waterfall model, the spiral model has sequential cycles/stages, with each stage having to be
completed before moving on to next.
The prime difference between the waterfall model and the spiral model is that the project system
development cycle moves towards eventual completion in both the models but in the spiral model the cycles
go back several times over to earlier stages in a repetitive sequence.
Progress Cycles, IT Project Management Solutions
Page 90
2010(Fall)
The progress cycle of this model is divided into four quadrants, and each quadrant with a different purpose;
Determining Objectives (I)------------Evaluating Alternatives (II)
*************************************************************
Planning Next Phase (III)------------Planning Next Phase (IV)
First Quadrant: the top left quadrant determines and identifies the project objectives, alternatives, and
constrains of the project. Similar to the system conception stage in the Waterfall Model, here objectives are
determined with identifying possible obstacles and weighting alternative approaches.
Second Quadrant: the top right quadrant determines the different alternatives of the project risk analysis,
and evaluates their task with each alternative eventually resolving them. Probable alternatives are inspected
and associated risks are recognized. Resolutions of the project risks are evaluated, and prototyping is used
wherever necessary.
Third Quadrant: the bottom right quadrant develops the system and this quadrant corresponds to the
waterfall model with detailed requirements determined for the project.
Fourth Quadrant: the bottom left quadrant plans the next phase development process, providing opportunity
to analyze the results and feedback.
In each phase, it begins with a system design and terminates with the client reviewing the progress through
prototyping.
The major advantage of the spiral model over the waterfall model is the advance approach on setting project
objectives, project risk management and project planning into the overall development cycle. Additionally,
another significant advantage is, the user can be given some of the functionality before the entire system is
completed.
The spiral model addresses complexity of predetermined system performance by providing an

iterative approach to system development, repeating the same activities in order to clarify the
problem and provide an accurate classification of the requirement within the bounds of multiple
constraints.
Page 91
2010(Fall)
Assignment (Set-1)
Database Management Systems
Q.1 : Differentiate between Traditional File System & Modern Database System ? Describe
the properties of Database & the advantage of Database ?
Ans. Differentiate between Traditional File System & Modern Database System ?
Traditional File System Modern Database Management System
Traditional File System is the system that was This is the Modern way which has replaced the
followed before the advent of DBMS i.e. it is the older older concept of File System.
way.
In Traditional file processing, data definition is part of • Data definition is part of the DBMS.
the application program and works with only specific • Application is independent and can be used
application. with any application.
File systems are Design Driven; they require • One extra column (Attribute) can be added
design/coding change when new kind of data occurs. without any difficulty.
• Minor coding changes in the Application
E.g. : In a traditional employee the master file has
program may be required.
Emp_name, Emp_id, Emp_addr, Emp_design,
Emp_dept, Emp_sal, if we want to insert one more
column ‘Emp_Mob number’ then it requires a
complete restructuring of the file or redesign of the
application code, even though basically all the data
except that in one column is the same.
Traditional File system keeps redundant (duplicate) Redundancy is eliminated to the maximum extent in
information in many locations. This might result in the DBMS if properly defined.
loss of Data Consistency.
For e.g. : Employee names might exist in separate

files like Payroll Master File and also in Employee
Benefit Master File etc. Now if an employee changes
his or her last name, the name might be changed in
they pay roll master file but not be changed in
Employee Benefit Master File etc. This might result in
Page 92
2010(Fall)
the loss of Data Consistency.
In a File system data is scattered in various files, and This problem is completely solved here.
each of these files may be in different formats, making
it difficult to write new application programs to retrieve
the appropriate data.
Security features are to be coded in the Application Coding for security requirements is not required as
Program itself. most of them have been taken care by the DBMS.
• Hence, a data base management system is the software that manages a database, and is responsible
for its storage, security, integrity, concurrency, recovery and access.
• The DBMS has a data dictionary, referred to as system catalog, which stores data about everything it
holds, such as names, structure, locations and types. This data is also referred to as Meta data.
Describe the properties of Database & the advantage of Database ?
Properties of Database :
The following are the important properties of Database :
1. A database is a logical collection of data having some implicit meaning. If the data are not related
then it is not called as proper database. E.g. Student studying in class II got 5th rank.
Stud_name Class Rank obtained
Vijetha Class II 5th
2. A database consists of both data as well as the description of the database structure and
constraints.
E.g.
Field Name Type Description
Stud_name Character It is the student’s name
Class Alpha numeric It is the class of the student
3. A database can have any size and of various complexity. If we consider the above example of
employee database the name and address of the employee may consists of very few records each
with simple structure.
E.g.
Emp_name Emp_id Emp_addr Emp_desig Emp_Sal
Page 93
2010(Fall)
Prasad 100 “Shubhodaya”, Near Katariguppe Project Leader 40000

Big Bazaar, BSK II stage,
Bangalore
Usha 101 #165, 4th main Chamrajpet, Software 10000

Bangalore engineer
Nupur 102 #12, Manipal Towers, Bangalore Lecturer 30000
Peter 103 Syndicate house, Manipal IT executive 15000
Like this there may be ‘n’ number of records.
4. The DBMS is considered as general-purpose software system that facilitates the process of defining,
constructing and manipulating database for various applications.
5. A database provides insulation between programs, data and data abstraction. Data abstraction is a
feature that provides the integration of the data source of interest and helps to leverage the physical
data however the structure is.
6. The data in the database is used by variety of users for variety of purposes. For E.g. when you
consider a hospital database management system the view of usage of patient database is different
from the same used by the doctor. In this case the data are stored separately for the different users. In
fact it is stored in a single database. This property is nothing but multiple views of the database.
7. Multiple user DBMS must allow the data to be shared by multiple users simultaneously. For this
purpose the DBMS includes concurrency control software to ensure that the updation done to the
database by variety of users at single time must get updated correctly. This properly explains the
multiuser transaction processing.
Advantages of Database (DBMS) :
1. Redundancy is reduced.
2. Data located on a server can be shared by clients.
3. Integrity (accuracy) can be maintained.
4. Security features protect the Data from unauthorized access.
5. Modern DBMS support internet based application.
6. In DBMS the application program and structure of data are independent.
7. Consistency of Data is maintained.
8. DBMS supports multiple views. As DBMS has many users, and each one of them might use it for
different purposes, and may require to view and manipulate only on a portion of the database,
depending on requirement.
Page 94
2010(Fall)
Q. 2 : What is the disadvantages of sequential file organization ? How do you overcome it?
What are the advantages & disadvantages of Dynamic Hashing?
Ans. One disadvantage of sequential file organization is that we must use linear search or binary search
to locate the desired record and that results in more i/o operations. In this there are a number of unnecessary
comparisons. In hashing technique or direct file organization, the key value is converted into an address by
performing some arithmetic manipulation on the key value, which provides very fast access to records.
Key Value Hash function Address
Let us consider a hash function h that maps the key value k to the value h(k). The VALUE h(k) is used as an
address.
The basic terms associated with the hashing techniques are :
1) Hash table : It is simply an array that is having address of records.
2) Hash function : It is the transformation of a key into the corresponding location or address in the
hash table (it can be defined as a function that takes key as input and transforms it into a hash table
index).
3) Hash key : let ‘R’ be a record and it key hashes into a key value called hash key.
Internal Hashing :
For internal files, hash table is an array of records, having array in the range from 0 to M-1. Let as consider a
hash function H(K) such that H(K)=key mod M which produces a remainder between 0 and M-1 depending
on the value of key. This value is then used for the record address. The problem with most hashing function
is that they do not guarantee that distinct value will hash to distinct address, a situation that occurs when two
non-identical keys are hashed into the same location.
For example : let us assume that there are two non-identical keys k1=342 and k2=352 and we have some
mechanism to covert key values to address. Then the simple hashing function is :
h(k) = k mod 10
Here h(k) produces a bucket address.
To insert a record with key value k, we must have its key first. E.g. : Consider h
(K-1)=K1% 10 will get 2 as the hash value. The record with key value 342 is placed at the location 2, another
record with 352 as its key value produces the same has address i.e. h(k1) = h(k2). When we try to place the
record at the location where the record with key K1 is already stored, there occurs a collision. The process of
finding another position is called collision resolution. There are numerous methods for collision resolution.
1) Open addressing : With open addressing we resolve the hash clash by inserting the record in the
next available free or empty location in the table.
2) Chaining : Various overflow locations are kept, a pointer field is added to each record and the
pointer is set to address of that overflow location.
Page 95
2010(Fall)
External Hashing for Disk Files :
Handling Overflow for Buckets by Chaining :
Hashing for disk files is called external hashing. Disk storage is divided into buckets, each of which holds
multiple records. A bucket is either one disk block or a cluster of continuous blocks.
The hashing function maps a key into a relative bucket number. A table maintained in the file header
converts the bucket number into the corresponding disk block address.
The collision problem is less severe with buckets, because many records will fit in a same bucket. When a
bucket is filled to capacity and we try to insert a new record into the same bucket, a collision is caused.
However, we can maintain a pointer in each bucket to address overflow records.
Page 96
2010(Fall)
The hashing scheme described is called static hashing, because a fixed number of buckets ‘M’ is allocated.
This can be serious drawback for dynamic files. Suppose M be a number of buckets, m be the maximum
number of records that can fit in one bucket, then at most m*M records will fit in the allocated space. If the
records are fewer than m*M numbers, collisions will occur and retrieval will be slowed down.
What are the advantages & disadvantages of Dynamic Hashing ?
Advantages of Dynamic Hashing :
1. The main advantage is that splitting causes minor reorganization, since only the records in one
bucket are redistributed to the two new buckets.
2. The space overhead of the directory table is negligible.
3. The main advantage of extendable hashing is that performance does not degrade as the file grows.
The main space saving of hashing is that no buckets need to be reserved for future growth; rather
buckets can be allocated dynamically.
Disadvantages of Dynamic Hashing :
1. The index tables grow rapidly and too large to fit in main memory. When part of the index table is
stored on secondary storage, it requires extra access.
2. The directory must be searched before accessing the bucket, resulting in two-block access instead
of one in static hashing.
3. A disadvantages of extendable hashing is that it involves an additional level of indirection.
Q.3. What is relationship type ? Explain the difference among a relationship instance,
relationship type & a relation set ?
Ans. In the real world, items have relationships to one another. E.g. : A book is published by a particular
publisher. The association or relationship that exists between the entities relates data items to each other in
a meaningful way. A relationship is an association between entities. A collection of relationships of the same
type is called a relationship set.
A relationship type R is a set of associations between E, E2…….En entity types mathematically, R is a set of
relationship instances ri.
E.g. : Consider a relationship type WORKS_FOR between two entity types – employee and department,
which associates each employee with the department the employee works for. Each relationship instance in
WORKS_FOR associates one employee entity and one department entity, where each relationship instance
is ri which connects employee and department entities that participate in ri.
Employee el, e3 and e6 work for department d1, e2 and e4 work for d2 and e5 and e7 work for d3.
Relationship type R is a set of all relationship instances.
Page 97
2010(Fall)
Degree of relationship type :
The number of entity sets that participate in a relationship set. A unary relationship exists when an
association is maintained with a single entity.
A binary relationship exists when two entities are associated.
A tertiary relationship exists when there are three entities associated.
Role Name and Recursive Relationship :
Each entry type to participate in a relationship type plays a particular role in the relationship. The role name
signifies the role that a participating entity from the entity type plays in each relationship instance, e.g. : In the
WORKS FOR relationship type, the employee plays the role of employee or worker and the department
plays the role of department or employer. However in some cases the same entity type participates more
than once in a relationship type in different roles. Such relationship types are called recursive.
Page 98
2010(Fall)
E.g. : employee entity type participates twice in SUPERVISION once in the role of supervisor and once in the
role of supervisee.
Q. 4 : What is SQL ? Discuss.
Ans. Structured Query Language (SQL) is a specialized language for updating, deleting, and requesting
information from databases. SQL is an ANSI and ISO standard, and is the de facto standard database query
language. A variety of established database products support SQL, including products from Oracle and
Microsoft SQL Server. It is widely used in both industry and academia, often for enormous, complex
databases.
In a distributed database system, a program often referred to as the database's "back end" runs constantly
on a server, interpreting data files on the server as a standard relational database. Programs on client
computers allow users to manipulate that data, using tables, columns, rows, and fields. To do this, client
programs send SQL statements to the server. The server then processes these statements and returns
replies to the client program.
Examples
To illustrate, consider a simple SQL command, SELECT. SELECT retrieves a set of data from the database
according to some criteria, using the syntax :
SELECT list_of_column_names from list_of_relation_names where

conditional_expression_that_identifies_specific_rows
• The list_of_relation_names may be one or more comma-separated table names or an expression

operating on whole tables.
• The conditional_expression will contain assertions about the values of individual columns within
individual rows within a table, and only those rows meeting the assertions will be selected.
Conditional expressions within SQL are very similar to conditional expressions found in most
programming languages.
For example, to retrieve from a table called Customers all columns (designated by the asterisk) with a value
of Smith for the column Last_Name, a client program would prepare and send this SQL statement to the
server back end :
SELECT * FROM Customers WHERE Last_Name='Smith';

The server back end may then reply with data such as this :
+--------------+-----------------+---------------------+
| Cust_No | Last_Name | First_Name |
+--------------+-----------------+---------------------+
Page 99
2010(Fall)
| 1001 | Smith | John |

| 2039 | Smith | David |
| 2098 | Smith | Matthew |
+-------------+-----------------+----------------------+
3 rows in set (0.05 sec)
Following is an SQL command that displays only two columns, column_name_1 and column_name_3, from
the table myTable :
SELECT column_name_1, column_name_3 from myTable
Below is a SELECT statement displaying all the columns of the table myTable2 for each row whose
column_name_3 value includes the string "brain" :
SELECT * from column_name_3 where column_name_3 like '%brain%'
Q. 5 : What is Normalization ? Discuss various types of Normal Forms ?

Ans. Normalization is the process of building database structures to store data, because any application
ultimately depends on its data structures. If the data structures are poorly designed, the application will start
from a poor foundation. This will require a lot more work to create a useful and efficient application.
Normalization is the formal process for deciding which attributes should be grouped together in a relation.
Normalization serves as a tool for validating and improving the logical design, so that the logical design
avoids unnecessary duplication of data, i.e. it eliminates redundancy and promotes integrity. In the
normalization process we analyze and decompose the complex relations into smaller, simpler and well-
structured relations.
Discuss various types of Normal Forms ?
1. Normal forms based on Primary Keys :
A relation schema R is in first normal form if every attribute of R takes only single atomic values. We can also
define it as intersection of each row and column containing one and only one value. To transform the un-
normalized table (a table that contains one or more repeating groups) to first normal form, we identify and
remove the repeating groups within the table.
E.g.
Dept.
D.Name D.No. D.location
R&D 5 (England, London, Delhi)
HRD 4 Bangalore
Figure A
Page 100
2010(Fall)
Consider the figure that each dept can have number of locations. This is not in first normal form because
D.location is not an atomic attribute. The dormain of D location contains multi-values.
There is a technique to achieve the first normal form. Remove the attribute D.location that violates the first
normal form and place into separate relation Dept_location
Ex. : Dept Dept_location
Dept.no. D.Name Dept_location Dept_No
5 R&D
6 HRD
Functional dependency : The concept of functional dependency was introduced by Prof. Codd in 1970 during
the emergence of definitions for the three normal forms. A functional dependency is the constraint between
the two sets of attributes in a relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, in R, (X->Y)
if and only if each value of X is associated with one value of Y. X is called the determinant set and Y is the
dependant attribute.
For e.g. : Consider the example of STUDENT_COURSE database.
STUDENT_COURSE
Sid Sname Address Cid Course Max Marks Obtained

marks (%)
001 Nupur Lucknow MB010 Database Concepts 100 83
001 Nupur Lucknow MB011 C++ 100 90
002 Priya Chennai MB010 Database Concepts 100 85
002 Priya Chennai MB011 C++ 100 75
002 Priya Chennai MQ040 Computer Networks 75 65
003 Pal Bengal MB009 Unix 100 70
004 Prasad Bangalore MC011 System Software 100 85
In the STUDENT_COURSE database (Sid) student id does not uniquely identifies a tuple and therefore it
cannot be a primary key. Similarly (Cid) course id cannot be primary key. But the combination of (Sid, Cid)
uniquely identifies a row in STUDENT_COURSE. Therefore (Sid, Cid) is the primary key which uniquely
retrieves Sname, address, course, marks, which are dependent on the primary key.
Page 101
2010(Fall)
2. Second Normal Form (2NF) :
A second normal form is based on the concept of full functional dependency. A relation is in second normal
form if every non-prime attribute A in R is fully functionally dependent on the Primary Key of R.
Emp_Project : Emp_Project : 2NF and 3NF, (a) Normalizing EMP_PROJ into 2NF relations
Page 102
2010(Fall)
A Partial functional dependency is a functional dependency in which one or more non-key attributes are
functionally dependent on part of the primary key. It creates a redundancy in that relation, which results in
anomalies when the table is updated.
3. Third Normal Form (3NF) :
This is based on the concept of transitive dependency. We should design relational scheme in such a way
that there should not be any transitive dependencies, because they lead to update anomalies. A functional
dependence [FD] x->y in a relation schema ‘R’ is a transitive dependency. If there is a set of attributes ‘Z’ Le
x->, z->y is transitive. The dependency SSN->Dmgr is transitive through Dnum in Emp_dept relation
because SSN->Dnum and Dnum->Dmgr, Dnum is neither a key nor a subset [part] of the key.
According to codd’s definition, a relational schema ‘R’ is in 3NF if it satisfies 2NF and no no_prime attribute is
transitively dependent on the primary key. Emp_dept relation is not in 3NF, we can normalize the above
table by decomposing into E1 and E2.
Note : Transitive is a mathematical relation that states that if a relation is true between the first value and the
second value, and between the second value and the 3rd value, then it is true between the 1st and the 3rd
value.
Example 2 :
Consider a relation schema ‘Lots’ which describes the parts of land for sale in various countries of a state.
Suppose there are two candidate keys : properly_ID and {Country_name.lot#}; that is, lot numbers are
unique only within each country, but property_ID numbers are unique across countries for entire state.
Based on the two candidate keys property_ID and {country name, Lot} we know that functional
dependencies FD1 and FD2 hold. Suppose the following two additional functional dependencies hold in
LOTS.
FD3 : Country_name -> tax_rate
FD4 : Area -> price
Here, FD3 says that the tax rate is fixed for a given country countryname -> taxrate, FD4 says that price of a
Lot is determined by its area, area -> price. The Lots relation schema violates 2NF, because tax_rate is
partially dependent upon candidate key {Country_namelot#}. Due to this, it decomposes lots relation into two
relations – lots1 and lots 2.
Page 103
2010(Fall)
Lots1 violates 3NF, because price is transitively dependent on candidate key of Lots1 via attribute area.
Hence we could decompose LOTS1 into LOTS1A and LOTS1B.
1. It is fully functionally dependent on every key of ‘R’

2. It is non_transitively dependent on every key of ‘R’
Q. 6 : What do you mean by Shared Lock & Exclusive Lock ? Describe briefly two phase
locking protocol ?
Ans. Shared Lock :

It is used for read only operations, i.e. used for operations that do not change or update the data.
E.g. SELECT statement:,
Shared locks allow concurrent transaction to read (SELECT) a data. No other transactions can modify the
data while shared locks exist. Shared locks are released as soon as the data has been read.
Exclusive Locks :
Exclusive locks are used for data modification operations, such as UPDATE, DELETE and INSERT. It
ensures that multiple updates cannot be made to the same resource simultaneously. No other transaction
can read or modify data when locked by an exclusive lock.
Exclusive locks are held until transaction commits or rolls back since those are used for write operations.
There are three locking operations : read_lock(X), write_lock(X), and unlock(X). A lock associated with an
item X, LOCK(X), now has three possible states : “read locked”, “write-locked”, or “unlocked”. A read-locked
item is also called share-locked, because other transactions are allowed to read the item, whereas a write-
locked item is called exclusive-locked, because a single transaction exclusive holds the lock on the item.
Each record on the lock table will have four fields : <data item name, LOCK, no_of_reads,
locking_transaction(s)>, the value (state) of LOCK is either read-locked or write-locked.
Read_lock(X):
B, if LOCK(X)=’unlocked’
Then begin LOCK(X) “read-locked”
No_of_reads(x) 1
End
Else if LOCK(X)=”read-locked”
Then no_of_reads(X) no_of_reads(X)+1
else begin wait(until)LOCK(X)=”unlocked” and
the lock manager wakes up the transaction);
goto B
end;
write_lock(X):
B: if LOCK(X)=”unlocked”
Then LOCK(X) “wite-locked”;
else begin
Page 104
2010(Fall)
wait(until LOCK(X)=”unlocked” and

the lock manager wkes up the transaction);
goto B
end;
unlock(X):
if LOCK(X)=”wite-locked”
Then begin LOCK(X) “un-locked”;
Wakeup one of the waiting transactions, if any
End
else if LOCK(X)=”read-locked”
then begin
no_of_reads(X) no_of_reads(X)-1
if no_of_reads(X)=0
then begin LOCK(X)=”unlocked”;
wakeup one of the waiting transactions, if any
end
end;
The Two Phase Locking Protocol

The two phase locking protocol is a process to access the shared resources as their own without creating
deadlocks. This process consists of two phases.
1. Growing Phase : In this phase the transaction may acquire lock, but may not release any locks.
Therefore this phase is also called as resource acquisition activity.
2. Shrinking Phase : In this phase the transaction may release locks, but may not acquire any new
locks. This includes the modification of data and release locks. Here two activities are grouped
together to form second phase.
In the beginning, transaction is in growing phase. Whenever lock is needed the transaction acquires it. As
the lock is released, transaction enters the next phase and it can stop acquiring the new lock request.
Strict tow phase locking :
In the two phases locking protocol cascading rollback are not avoided. In order to avoid this slight
modification are made to two phase locking and called strict two phase locking. In this phase all the locks are
acquired by the transaction are kept on hold until the transaction commits.
Deadlock & starvation : In deadlock state there exists, a set of transactions in which every transaction in the
set is waiting for another transaction in the set.
Suppose there exists a set of transactions waiting
{T1, T2, T3,……………………, Tn) such that T1 is waiting for a data item existing in T2, T2 for T3 etc… and
Tn is waiting of T1. In this state none of the transaction will progress.
Page 105
2010(Fall)
Assignment (Set-2)
Database Management Systems
Q. 1. Define Data Model & discuss the categories of Data Models? What is the difference
between logical data Independence & Physical Data Independence?
Ans: DATA MODEL:
The product of the {database} design process which aims to identify and organize the required data logically
and physically. A data model says what information is to be contained in a database, how the information will
be used, and how the items in the database will be related to each other. For example, a data model might
specify that a customer is represented by a customer name and credit card number and a product as a
product code and price, and that there is a one-to-many relation between a customer and a product. It can
be difficult to change a database layout once code has been written and data inserted. A well thought-out
data model reduces the need for such changes. Data modelling enhances application maintainability and
future systems may re-use parts of existing models, which should lower development costs. A data modelling
language is a mathematical formalism with a notation for describing data structures and a set of operations
used to manipulate and validate that data. One of the most widely used methods for developing data models
is the {entity-relationship model}. The {relational model} is the most widely used type of data model. Another
example is {NIAM}
Catagaries of DATA Model:-
1. Conceptual (high-level, semantic ) data models:
A conceptual schema or conceptual data model is a map of concepts and their relationships. This
describes the semantics of an organization and represents a series of assertions about its nature.
Specifically, it describes the things of significance to an organization (entity classes), about which it is
inclined to collect information, and characteristics of (attributes) and associations between pairs of those
things of significance (relationships).
Because a conceptual schema represents the semantics of an organization, and not a database design, it
may exist on various levels of abstraction. The original ANSI four-schema architecture began with the set of
external schemas that each represent one person's view of the world around him or her. These are
consolidated into a single conceptual schema that is the superset of all of those external views. A data model
can be as concrete as each person's perspective, but this tends to make it inflexible. If that person's world
changes, the model must change. Conceptual data models take a more abstract perspective, identifying the
fundamental things, of which the things an individual deals with
are just examples.
A conceptual data model identifies the highest-level

relationships between the different entities. Features of
conceptual data model include:
• Includes the important entities and the relationships

among them.
Page 106
2010(Fall)
• No attribute is specified.
• No primary key is specified.
The figure is an example of a conceptual data model.
From the figure above, we can see that the only information shown via the conceptual data model is the
entities that describe the data and the relationships between those entities. No other information is shown
through the conceptual data model.
2. Physical (low -level, internal) data models
Features of physical data model include:
• Specification all tables and columns.

• Foreign keys are used to identify relationships between tables.
• Denormalization may occur based on user requirements.
• Physical considerations may cause the physical data model to be quite different from the logical data
model.
At this level, the data modeler will specify how the logical data model will be realized in the database
schema.
The steps for physical data model design are as follows:
1. Convert entities into tables.

2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical constraints / requirements.
3. Logical Data Model
Features of logical data model include:
Page 107
2010(Fall)
• Includes all entities and relationships among them.

• All attributes for each entity are specified.
• The primary key for each entity specified.
• Foreign keys (keys identifying the relationship between different entities) are specified.
• Normalization occurs at this level.
At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to
how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be combined
into a single step (deliverable).
The steps for designing the logical data model are as follows:
1. Identify all entities.

2. Specify primary keys for all entities.
3. Find the relationships between different entities.
4. Find all attributes for each entity.
5. Resolve many-to-many relationships.
6. Normalization.
Differences between a logical and physical data model
The difference between a logical and a physical data model are hard to grasp at first, but once you see the
difference it seems obvious. A logical data model describes your model entities and how they relate to each
other. A physical data model describes each entity in detail, including information about how you would
implement the model using a particular (database) product.
In a logical model describing a person in a family tree, each person node would have attributes such as
name(s), date of birth, place of birth, etc. The logical diagram would also show some kind of unique attribute
or combination of attributes called a primary key that describes exactly one entry (a row in SQL) within this
entity.
Page 108
2010(Fall)
The physical model for the person would contain implementation details. These details are things like data
types, indexes, constraints, etc.
The logical and physical model serves two different, but related purposes. A logical model is a way to draw
your mental roadmap from a problem specification to an entity-based storage system. The user (problem
owner) must understand and approve the
Q. 2. What is a B+Trees? Describe the structure of both internal and leaf nodes of a
B+Tree?
Ans. B+ Trees :
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in logarithmic amortized time. The B-tree is a generalization of a binary
search tree in that a node can have more than two children. (Comer, p. 123) Unlike self-balancing binary
search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly
used in databases and filesystems.
In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined
range. When data is inserted or removed from a node, its number of child nodes changes. In order to
maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is
permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may
waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child
nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred
to as a 2-3 tree), each internal node may have only 2 or 3 child nodes.
Each internal node of a B-tree will contain a number of keys. Usually, the number of keys is chosen to vary
between d and 2d. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that
nodes can be split or combined. If an internal node has 2d keys, then adding a key to that node can be
accomplished by splitting the 2d key node into two d key nodes and adding the key to the parent node. Each
split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each
have d keys, then a key may be deleted from the internal node by combining with its neighbor. Deleting the
key would make the internal node have d − 1 keys; joining the neighbor would add d keys plus one more key
brought down from the neighbor's parent. The result is an entirely full node of 2d keys.
The number of branches (or child nodes) from a node will be one more than the number of keys stored in the
node. In a 2-3 B-tree, the internal nodes will store either one key (with two child nodes) or two keys (with
three child nodes). A B-tree is sometimes described with the parameters (d + 1) — (2d + 1) or simply with the
highest branching order, (2d + 1).
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase
slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all
leaf nodes being one more node further away from the root.
B-trees have substantial advantages over alternative implementations when node access times far exceed
access times within nodes. This usually occurs when the nodes are in secondary storage such as disk
drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases
and the number of expensive node accesses is reduced. In addition, rebalancing the tree occurs less often.
The maximum number of child nodes depends on the information that must be stored for each child node
and the size of a full disk block or an analogous size in secondary storage. While 2-3 B-trees are easier to
explain, practical B-trees using secondary storage want a large number of child nodes to improve
performance.
Variants
Page 109
2010(Fall)
The term B-tree may refer to a specific design or it may refer to a general class of designs. In the narrow
sense, a B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves.
The general class includes variations such as the B+-tree and the B*-tree.
• In the B+-tree, copies of the keys are stored in the internal nodes; the keys and records are stored in
leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential
access.(Comer, p. 129)
• The B*-tree balances more neighboring internal nodes to keep the internal nodes more densely
packed.(Comer, p. 129) For example, a non-root node of a B-tree must be only half full, but a non-
root node of a B*-tree must be two-thirds full.
• Counted B-trees store, with each pointer within the tree, the number of nodes in the subtree below
that pointer.[1] This allows rapid searches for the Nth record in key order, or counting the number of
records between any two records, and various other related operations.
3. Describe Projection operation, Set theoretic operation & join operation?
Projection Operator
Projection is also a Unary operator.
The Projection operator is pi:
Projection limits the attributes that will be returned from the original relation.
The general syntax is: attributes R
Where attributes is the list of attributes to be displayed and R is the relation.
The resulting relation will have the same number of tuples as the original relation (unless there are duplicate
tuples produced).
The degree of the resulting relation may be equal to or less than that of the original relation.
Projection Examples
Assume the same EMP relation above is used.
Project only the names and departments of the employees:
name, dept (EMP)
Results:Name Dept
Smith CS
Jones Econ
Green Econ
Brown CS
Smith Fin
Combining Selection and Projection

The selection and projection operators can be combined to perform both operations.
Page 110
2010(Fall)
Show the names of all employees working in the CS department:

name ( Dept = 'CS' (EMP) )
Results: Name
Smith
Brown
Show the name and rank of those Employees who are not in the CS department or Adjuncts:
name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )
Result: Name Rank
Green Assistant
Smith Associate
Exercises
Evaluate the following expressions:
name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )
fname, age ( Age > 22 (R S) )
For this expression, use R and S from the Set Theoretic Operations section above.
office > 300 ( name, rank (EMP))
Aggregate Functions
We can also apply Aggregate functions to attributes and tuples:
SUM
MINIMUM
MAXIMUM
AVERAGE, MEAN, MEDIAN
COUNT
Aggregate functions are sometimes written using the Projection operator or the Script F character: as in the
Elmasri/Navathe book.
Aggregate Function Examples
Assume the relation EMP has the following tuples:
Name Office Dept Salary
Smith 400 CS 45000
Jones 220 Econ 35000
Green 160 Econ 50000
Brown 420 CS 65000
Smith 500 Fin 60000
Find the minimum Salary: MIN (salary) (EMP)

Results:MIN(salary)
35000
Find the average Salary: AVG (salary) (EMP)

Results:AVG(salary)
51000
Count the number of employees in the CS department: COUNT (name) ( Dept = 'CS' (EMP) )
Results:COUNT(name)
2
Find the total payroll for the Economics department: SUM (salary) ( Dept = 'Econ' (EMP) )
Results:SUM(salary)
85000
Set Theoretic Operations

Consider the following relations R and S
R First Last Age
Bill Smith 22
Page 111
2010(Fall)
Sally Green 28
Mary Keen 23
Tony Jones 32
S First Last Age

Forrest Gump 36
Sally Green 28
DonJuan DeMarco 27
Union: R S
Result: Relation with tuples from R and S with duplicates removed.
Difference: R - S
Result: Relation with tuples from R but not from S
Intersection: R S
Result: Relation with tuples that appear in both R and S.
R S First Last Age

Bill Smith 22
Sally Green 28
Mary Keen 23
Tony Jones 32
Forrest Gump 36
DonJuan DeMarco 27
R - S First Last Age

Bill Smith 22
Mary Keen 23
Tony Jones 32
R S First Last Age

Sally Green 28
Join Operation
Join operations bring together two relations and combine their attributes and tuples in a specific fashion.
The generic join operator (called the Theta Join is:
It takes as arguments the attributes from the two relations that are to be joined.
For example assume we have the EMP relation as above and a separate DEPART relation with (Dept,
MainOffice, Phone) :
EMP EMP.Dept = DEPART.Dept DEPART
The join condition can be

When the join condition operator is = then we call this an Equijoin
Note that the attributes in common are repeated.
Join Examples
Assume we have the EMP relation from above and the following DEPART relation: Dept MainOffice
Phone
CS 404 555-1212
Econ 200 555-1234
Fin 501 555-4321
Hist 100 555-9876
Find all information on every employee including their department info:

EMP emp.Dept = depart.Dept DEPART
Results:Name Office EMP.Dept Salary DEPART.Dept MainOffice Phone
Smith 400 CS 45000 CS 404 555-1212
Jones 220 Econ 35000 Econ 200 555-1234
Green 160 Econ 50000 Econ 200 555-1234
Brown 420 CS 65000 CS 404 555-1212
Page 112
2010(Fall)
Smith 500 Fin 60000 Fin 501 555-4321
Find all information on every employee including their department info where the employee works in an office
numbered less than the department main office:
EMP (emp.office < depart.mainoffice) (emp.dept = depart.dept) DEPART
Results:Name Office EMP.Dept Salary DEPART.Dept MainOffice Phone
Smith 400 CS 45000 CS 404 555-1212
Green 160 Econ 50000 Econ 200 555-1234
Smith 500 Fin 60000 Fin 501 555-4321
Natural Join
Notice in the generic (Theta) join operation, any attributes in common (such as dept above) are repeated.
The Natural Join operation removes these duplicate attributes.
The natural join operator is: *
We can also assume using * that the join condition will be = on the two attributes in common.
Example: EMP * DEPART

Results:Name Office Dept Salary MainOffice Phone
Smith 400 CS 45000 404 555-1212
Jones 220 Econ 35000 200 555-1234
Green 160 Econ 50000 200 555-1234
Brown 420 CS 65000 404 555-1212
Smith 500 Fin 60000 501 555-4321
Outer Join
In the Join operations so far, only those tuples from both relations that satisfy the join condition are included
in the output relation.
The Outer join includes other tuples as well according to a few rules.
Three types of outer joins:
Left Outer Join includes all tuples in the left hand relation and includes only those matching tuples from the
right hand relation.
Right Outer Join includes all tuples in the right hand relation and includes ony those matching tuples from
the left hand relation.
Full Outer Join includes all tuples in the left hand relation and from the right hand relation.
Examples:
Assume we have two relations: PEOPLE and MENU: PEOPLE:Name Age Food
Alice 21 Hamburger
Bill 24 Pizza
Carl 23 Beer
Dina 19 Shrimp
MENU: Food Day
Pizza Monday
Hamburger Tuesday
Chicken Wednesday
Pasta Thursday
Tacos Friday
PEOPLE people.food = menu.food MENU

Name Age people.Food menu.Food Day
Alice 21 Hamburger Hamburger Tuesday
Bill 24 Pizza Pizza Monday
Carl 23 Beer NULL NULL
Dina 19 Shrimp NULL NULL

Page 113
2010(Fall)
NULL NULL NULL Chicken Wednesday

NULL NULL NULL Pasta Thursday
NULL NULL NULL Tacos Friday

Carl 23 Beer NULL NULL
Dina 19 Shrimp NULL NULL
NULL NULL NULL Chicken Wednesday
NULL NULL NULL Pasta Thursday
NULL NULL NULL Tacos Friday
Q. 4. Discuss Multi Table Queries?
Ans: Multiple Table Queries :
Most of the queries you create in Microsoft Access will more that likely need to include the data from more
than one table and you will have to join the tables in the query. The capability to join tables is the power of
the relational database.
As you know, in order to join database tables, they must have a field in common. The fields on which you join
tables must be the same or compatible data types and they must contain the same kind of data, however
they do not have to have the same field name (although they probably will). Occasionally, the two database
tables that you want to bring the data from may not have a field in common and you will have to add another
table to the query with the sole purpose of joining the tables.
Different types of query joins will return different sets of results. When creating new queries, it is prudent to
test them on a set of records for which you know what the result should be. That’s a good way to be sure that
you have the correct join and are getting accurate results. Just because a query runs and doesn't give you
an error doesn't mean that the resulting data set is what you intended to return.
Failure to join tables in a database query will result in a cross or Cartesian product (A Cartesian product is
defined as all possible combinations of rows in all tables. Be sure you have joins before trying to return data,
because a Cartesian product on tables with many records and/or on many tables could take several hours to
complete.), in which every record in one table is joined with every record in the second table - probably not
very meaningful data.
There are inner joins and outer joins - each with variations on the theme.
Inner Join
A join of two tables that returns records for which there is a matching value in the field on which the tables
are joined.
The most common type of join is the inner join, or equi-join. It joins records in two tables when the values in
the fields on which they are joined are equal. For example, if you had the following Customers and Orders
tables and did an equi-join on (or, as is sometimes said, over) the CustomerID fields, you would see the set
of records that have the same CustomerID in both tables. With the following data, that would be a total of 7
records. Customers listed in the Customers table who had not placed an order would not be included in the
result. There has to be the same value in the CustomerID field in both tables.
Page 114
2010(Fall)
An Inner Join of the Customer and Order Data
If, in the query result, you eliminated redundant columns - that is, displayed the CustomerID column only
once in the result - this would be called a natural join.
An inner join returns the intersection of two tables. Following is a graphic of joining these tables. The
Customers table contains data in areas 1 and 2. The Orders table contains data in areas 2 and 3. An inner
join returns only the data in area 2.
Outer Joins
A join between two tables that returns all the records from one table and, from the second table, only those
records in which there is a matching values in the field on which the tables are joined.
An outer join returns all the records from one table and only the records from the second table where the
value in the field on which the tables are joined matches a value in the first table. Outer joins are referred to
as left outer joins and right outer joins. The left and right concept comes from the fact that, in a traditional
database diagram, the table on the one side of a 1:N relationship was drawn on the left.
Using our Customers and Orders tables again, if you performed a left outer join, the result would include a
listing of all Customers and, for those that had placed orders, the data on those orders. You would get a total
of 11 records from this data, which is a very different result from the 7 records provided by the inner join.
Page 115
2010(Fall)
An Outer Join of the Customers and Orders table
In the diagram below, a left outer join on the Customers table will return the data in areas 1 and 2. By the
way, this type of diagram is called a Venn diagram.
Not All Data Can Be Edited
Earlier, it was mentioned that the results of a query represent “live” data, meaning that a change to that data
is actually a change to the data in the base table. However, you will find that you cannot edit all data that is
returned by a query. You can edit values in all fields from a query based on a single table or on two tables
with a one-to-one relationship. But you can’t edit all fields in a query based on tables with a one-to-many
relationship nor from crosstab queries or
those with totals.
In general, you can edit:

all fields in a single table query
all fields in tables with a one-to-one relationship
all fields in the table on the many side of a one-to-many relationship
non-key fields in the table on the one side of a one-to-many relationship
You can’t edit:

fields in the primary key in the table on the one side of a one-to-many relationship
fields returned by a crosstab query
values in queries in which aggregate operations are performed calculated fields
There are ways to work around some of these editing limitations but the precise technique will depend on the
RDBMS you’re using.
Q.5. Discuss Transaction Processing Concept? 10.2 Describe properties of Transactions?
Page 116
2010(Fall)
Ans: In computer science, transaction processing is information processing that is divided into individual,
indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot
remain in an intermediate state.
Description
Transaction processing is designed to maintain a computer system (typically a database or some modern
filesystems) in a known, consistent state, by ensuring that any operations carried out on the system that are
interdependent are either all completed successfully or all canceled successfully.
For example, consider a typical banking transaction that involves moving $700 from a customer's savings
account to a customer's checking account. This transaction is a single operation in the eyes of the bank, but
it involves at least two separate operations in computer terms: debiting the savings account by $700, and
crediting the checking account by $700. If the debit operation succeeds but the credit does not (or vice
versa), the books of the bank will not balance at the end of the day. There must therefore be a way to ensure
that either both operations succeed or both fail, so that there is never any inconsistency in the bank's
database as a whole. Transaction processing is designed to provide this.
Transaction processing allows multiple individual operations to be linked together automatically as a single,
indivisible transaction. The transaction-processing system ensures that either all operations in a transaction
are completed without error, or none of them are. If some of the operations are completed but errors occur
when the others are attempted, the transaction-processing system “rolls back” all of the operations of the
transaction (including the successful ones), thereby erasing all traces of the transaction and restoring the
system to the consistent, known state that it was in before processing of the transaction began. If all
operations of a transaction are completed successfully, the transaction is committed by the system, and all
changes to the database are made permanent; the transaction cannot be rolled back once this is done.
Transaction processing guards against hardware and software errors that might leave a transaction partially
completed, with the system left in an unknown, inconsistent state. If the computer system crashes in the
middle of a transaction, the transaction processing system guarantees that all operations in any uncommitted
(i.e., not completely processed) transactions are cancelled.
Transactions are processed in a strict chronological order. If transaction n+1 intends to touch the same
portion of the database as transaction n, transaction n+1 does not begin until transaction n is committed.
Before any transaction is committed, all other transactions affecting the same part of the system must also
be committed; there can be no “holes” in the sequence of preceding transactions.
Methodology
The basic principles of all transaction-processing systems are the same. However, the terminology may vary
from one transaction-processing system to another, and the terms used below are not necessarily universal.
Rollback
Transaction-processing systems ensure database integrity by recording intermediate states of the database
as it is modified, then using these records to restore the database to a known state if a transaction cannot be
committed. For example, copies of information on the database prior to its modification by a transaction are
set aside by the system before the transaction can make any modifications (this is sometimes called a before
image). If any part of the transaction fails before it is committed, these copies are used to restore the
database to the state it was in before the transaction began.
Rollforward
It is also possible to keep a separate journal of all modifications to a database (sometimes called after
images); this is not required for rollback of failed transactions, but it is useful for updating the database in the
event of a database failure, so some transaction-processing systems provide it. If the database fails entirely,
it must be restored from the most recent back-up. The back-up will not reflect transactions committed since
the back-up was made. However, once the database is restored, the journal of after images can be applied
to the database (rollforward) to bring the database up to date. Any transactions in progress at the time of the
Page 117
2010(Fall)
failure can then be rolled back. The result is a database in a consistent, known state that includes the results
of all transactions committed up to the moment of failure.
Deadlocks
In some cases, two transactions may, in the course of their processing, attempt to access the same portion
of a database at the same time, in a way that prevents them from proceeding. For example, transaction A
may access portion X of the database, and transaction B may access portion Y of the database. If, at that
point, transaction A then tries to access portion Y of the database while transaction B tries to access portion
X, a deadlock occurs, and neither transaction can move forward. Transaction-processing systems are
designed to detect these deadlocks when they occur. Typically both transactions will be cancelled and rolled
back, and then they will be started again in a different order, automatically, so that the deadlock doesn't
occur again. Or sometimes, just one of the deadlocked transactions will be cancelled, rolled back, and
automatically re-started after a short delay.
Deadlocks can also occur between three or more transactions. The more transactions involved, the more
difficult they are to detect, to the point that transaction processing systems find there is a practical limit to the
deadlocks they can detect.
Compensating transaction
In systems where commit and rollback mechanisms are not available or undesirable, a Compensating
transaction is often used to undo failed transactions and restore the system to a previous state.
Transaction Properties
You can control the behavior of the transactions in your OpenAccess ORM applications by setting various
transaction properties. The properties are always set for a specific transaction, and they are valid until the
IObjectScope instance is disposed of. Transaction properties can be changed only if the transaction is not
active.
The previous sections showed some transaction properties. In the following code example, the RetainValues
property is set to true:C#
// prepare transaction properties
scope.TransactionProperties.RetainValues = true;
scope.Transaction.Begin();
...
ConsoleWriteLine( "RetainValues is "
+ scope.Transaction.Properties.RetainValues );
scope.Transaction.Commit();
// Properties are still valid
scope.Transaction.Begin();
...
VB.NET
' prepare transaction properties
scope.TransactionProperties.RetainValues = True
scope.Transaction.Begin()
'...
ConsoleWriteLine("RetainValues is " + scope.Transaction.Properties.RetainValues)
scope.Transaction.Commit()
' Properties are still valid
scope.Transaction.Begin()
Page 118
2010(Fall)
Following is a list of the transaction properties, their allowed and default values, and a brief
description.RetainValues
This property controls whether persistent class instances retain their values after commit of the transaction
and if read access is allowed. By default it is set to true. However, regardless of this setting, objects are
refreshed from the data store the next time the object is accessed within an active transaction.
RestoreValues
This property controls whether the values of objects are restored to their original values when a transaction
(or one particular nesting level) is rolled back. By default it is set to false.
No Automatic Refreshes
As described earlier in this chapter, OpenAccess ORM uses optimistic concurrency control by default (refer
to Concurrency Control Algorithms for more information about the various concurrency control mechanisms).
This means that OpenAccess ORM does not validate read (but unmodified) objects at commit time, and
therefore it is possible that if an object is read inside a transaction, it might be changed in the database, while
the transaction is running.
So, in order to avoid long-living stale objects, OpenAccess ORM will refresh such objects if they are
accessed in a subsequent transaction. This happens on the first access to such objects. Thus, only short-
living stale objects are possible, at the cost of an SQL call for refreshing the object in a subsequent
transaction.
It is possible to have more control over this refreshing behavior, by disabling the automatic refresh function,
which can be done as shown below:
scope.TransactionProperties.RefreshReadObjectsInNewTransaction = false;
The advantage of using this is that objects can keep their data for a long time, without the need for executing
an SQL Statement again.
However, if you enable "no automatic refreshes" then you are responsible for avoiding stale data, i.e. you will
need to call Refresh() or Evict() at appropriate times.
Therefore, please use this with care, since read (but not modified) objects, will not be refreshed
automatic`ally in new transactions of the same ObjectScope, i.e., if an object is fetched from the database in
the first transaction and is subsequently never explicitly refreshed, evicted or modified in subsequent
transactions, it will still have the values from the first transaction.
Concurrency
This property determines the concurrency settings of a transaction. The default is

TransactionMode.OPTIMISTIC|TransactionMode.NO_LOST_UPDATES.
AutomaticBegin
This property allows the user to specify that every Commit()/Rollback() of a transaction will start a new
transaction immediately. Therefore, the user does not need to call Begin(), and one can work with just
Commit() and Rollback() calls. In other words, there is always a started transaction. This is regardless of a
failure of a Commit() [the next transaction will yet be started].
This is especially useful for multithreaded applications, i.e., working with multiple threads in one object scope
by setting the <option.Multithreaded> to "true", since this allows an automatic synchronized Commit() +
Begin().
FailFast
This property determines whether a transaction commit or flush, should fail at the first failure. When this
property is set to true (default value), the transaction will fail on the occurrence of the first
OptimisticVerificationException. When this property is set to false the IObjectScope will collect all the
failures, i.e., the commit or flush will continue and collect all the OptimisticVerificationExceptions that occur. It
might be time consuming to collect all the failures, therefore this property is set to true by default.
Page 119
2010(Fall)
This property should be set to false only when the information about failing objects is necessary, since it
might be very time consuming to collect all the failures.
Q. 6. Describe the advantage of Distributed database? What is Client/server Model?

Discuss briefly the security and Internet violation?
Ans: A distributed database is a database that is under the control of a central database management
system (DBMS) in which storage devices are not all attached to a common CPU. It may be stored in multiple
computers located in the same physical location, or may be dispersed over a network of interconnected
computers.
Collections of data (e.g. in a database) can be distributed across multiple physical locations. A distributed
database can reside on network servers on the Internet, on corporate intranets or extranets, or on other
company networks. Replication and distribution of databases improve database performance at end-user
worksites.
Besides distributed database replication and fragmentation, there are many other distributed database
design technologies. For example, local autonomy, synchronous and asynchronous distributed database
technologies. These technologies' implementation can and does depend on the needs of the business and
the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is
willing to spend on ensuring data security, consistency and integrity.
Advantages of distributed databases
Management of distributed data with different levels of transparency.

Increase reliability and availability.
Easier expansion.
Reflects organizational structure — database fragments are located in the departments they relate to.
Local autonomy — a department can control the data about them (as they are the ones familiar with it.)
Protection of valuable data — if there were ever a catastrophic event such as a fire, all of the data would not
be in one place, but distributed in multiple locations.
Improved performance — data is located near the site of greatest demand, and the database systems
themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on
one module of the database won't affect other modules of the database in a distributed database.)
Economics — it costs less to create a network of smaller computers with the power of a single large
computer.
Modularity — systems can be modified, added and removed from the distributed database without affecting
other modules (systems).
Reliable transactions - Due to replication of database.
Hardware, Operating System, Network, Fragmentation, DBMS, Replication and Location Independence.
Continuous operation.
Distributed Query processing.
Distributed Transaction management.
Single site failure does not affect performance of system. All transactions follow A.C.I.D. property: a-
atomicity, the transaction takes place as whole or not at all; c-consistency, maps one consistent DB state to
another; i-isolation, each transaction sees a consistent DB; d-durability, the results of a transaction must
survive system failures. The Merge Replication Method used to consolidate the data between databases.
client–server model
The client–server model of computing is a distributed application structure that partitions tasks or workloads
between the providers of a resource or service, called servers, and service requesters, called clients.[1]
Often clients and servers communicate over a computer network on separate hardware, but both client and
server may reside in the same system. A server machine is a host that is running one or more server
programs which share their resources with clients. A client does not share any of its resources, but requests
a server's content or service function. Clients therefore initiate communication sessions with servers which
await incoming requests.
Page 120
2010(Fall)
The client–server characteristic describes the relationship of cooperating programs in an application. The
server component provides a function or service to one or many clients, which initiate requests for such
services.
Functions such as email exchange, web access and database access, are built on the client–server model.
Users accessing banking services from their computer use a web browser client to send a request to a web
server at a bank. That program may in turn forward the request to its own database client program that
sends a request to a database server at another bank computer to retrieve the account information. The
balance is returned to the bank database client, which in turn serves it back to the web browser client
displaying the results to the user. The client–server model has become one of the central ideas of network
computing. Many business applications being written today use the client–server model. So do the Internet's
main application protocols, such as HTTP, SMTP, Telnet, and DNS.
The interaction between client and server is often described using sequence diagrams. Sequence diagrams
are standardized in the Unified Modeling Language.
Specific types of clients include web browsers, email clients, and online chat clients.
Specific types of servers include web servers, ftp servers, application servers, database servers, name
servers, mail servers, file servers, print servers, and terminal servers. Most web services are also types of
servers.
Security
A condition that results from the establishment and maintenance of protective measures that ensures a state
of inviolability from hostile acts or influences.
INTERNET VIOLATIONS
Internet crime is among the newest and most constantly evolving areas of American law. Although the
Internet itself is more than three decades old, greater public usage began in the late 1980s with widespread
ADOPTION only following in the 1990s. During that decade the Net was transformed from its modest military
and academic roots into a global economic tool, used daily by over 100 million Americans and generating
upwards of $100 billion in domestic revenue annually. But as many aspects of business, social, political, and
cultural life moved online, so did crime, creating new challenges for lawmakers and law enforcement.
Crime on the Net takes both old and new forms. The medium has facilitated such traditional offenses as
FRAUD and child PORNOGRAPHY. But it has also given rise to unique technological crimes, such as
electronic intrusion in the form of hacking and computer viruses. High-speed Internet accounts helped fuel a
proliferation of COPYRIGHT INFRINGEMENT in software, music, and movie PIRACY. National security is
also threatened by the Internet's potential usefulness for TERRORISM. Taken together, these crimes have
earned a new name: when FBI Director Louis J. Freeh addressed the U. S. Senate in 2000, he used the
widely-accepted term "cybercrime."
Example: internet violation in school
Cause/Event Consequences
Hacking into school servers; other computers Suspension; administrative discretion and possible
legal actions (action level III)
Using e-mail or Web sites to intimidate students Detention/Suspension; immediately sent to

(cyber-bullying) administrator; considered harassment under district
Page 121
2010(Fall)
Rights & Responsibilities Handbook
Downloading illegal music/media files from the Possible civil legal actions; data files wiped
Internet
Using inappropriate instant messaging/chats during Loss of computer for one or more class
class teacher/team discretion
Using or carrying computer in an unsafe manner Loss of computer/ note to parents
Plagiarizing information by using the Internet Failed assignment; loss of points and possible legal
actions
Accessing pornographic/hate speech groups Administrative discretion and possible legal actions;
(others?) websites loss of computer privileges
Playing games on laptops or PDA’s during class ContentBarrier installed at parent’s expense.
Physically damaging a computer through misuse or Loss of computer; administrative

neglect (throwing, dropping, snagging) discretion/restitution
Posing as a another person; misrepresenting self Suspension

on the web, using another’s identity (ID theft)
Manipulating or changing settings without Administrative restrictions

authorization
Assignment (Set-1)
Computer Networks
Q. 1 : Explain all design issues for several layers in Computer. What is connection –
oriented and connectionless service ?
Ans. Design issues for the layers :

The various key design issues are present in several layers in computer networks. The important design issues
are :
1. Addressing – Mechanism for identifying senders and receivers, on the network need some form of
addressing. There are multiple processes running on one machine. Some means is needed for a
process on one machine to specify with whom it wants to communicate.
2. Error Control – There may be erroneous transmission due to several problems during communication.
These are due to problem in communication circuits, physical medium, due to thermal noise and
Page 122
2010(Fall)
interference. Many error detecting and error correcting codes are known, but both ends of the
connection must agree on which one being used. In addition, the receiver must have some mechanism
of telling the sender which messages have been received correctly and which has not.
3. Flow Control – If there is a fast sender at one end sending data to a slow receiver, then there must be
flow control mechanism to control the loss of data by slow receivers. There are several mechanisms
used for flow control such as increasing buffer size at receivers, slow down the fast sender, and so on.
Some process will not be in position to accept arbitrarily long messages. Then, there must be some
mechanism to disassembling, transmitting and then reassembling messages.
4. Multiplexing / de-multiplexing – If the data has to be transmitted on transmission media separately, it
is inconvenient or expensive to setup separate connection for each pair of communicating processes.
So, multiplexing is needed in the physical layer at sender end and de-multiplexing is need at the
receiver end.
5. Routing – When data has to be transmitted from source to destination, there may be multiple paths
between them. An optimized (shortest) route muse be chosen. This decision is made on the basis of
several routing algorithms, which chooses optimized route to the destination.
Connection Oriented and Connectionless Services :

Layers can offer two types of services namely connection oriented service and connectionless service.
Connection Oriented Service – The service user first establishes a connection, uses the connection and then
releases the connection. Once the connection is established between source and destination, the path is fixed.
The data transmission takes place through this path established. The order of the message sent will be same
at the receiver end. Services are reliable and there is no loss of data. Most of the time, reliable service provides
acknowledgement is an overhead and adds delay.
Connectionless Services – In this type of services, no connection is established between source and
destination. Here there is no fixed path. Therefore, the messages must carry full destination address and each
one of these messages are sent independent of each other. Messages sent will not be delivered at the
destination in the same order. Thus, grouping and ordering is required at the receiver end, and the services are
not reliable. There is no acknowledgement confirmation from the receiver. Unreliable connectionless service is
often called datagram service, which does not return an acknowledgement to the sender. In some cases,
establishing a connection to send one short messages is needed. But reliability is required, and then
acknowledgement datagram service can be used for these applications.
Another service is the request-reply service. In this type of service, the sender transmits a single datagram
containing a request from the client side. Then at the other end, server reply will contain the answer. Request-
reply is commonly used to implement communication in the client-server model.
Q.2 : Discuss OSI Reference model.

Ans. The OSI Reference Model :
The OSI model is based on a proposal developed by the International Standards Organization as a first step
towards international standardization of the protocols used in the various layers. The model is called the ISO
Page 123
2010(Fall)
– (International Standard Organization – Open Systems Interconnection) Reference Model because it deals
with connecting open systems – that is, systems that follow the standard are open for communication with
other systems, irrespective of a manufacturer.
Its main objectives were to :
- Allow manufacturers of different systems to interconnect equipment through a standard interfaces.
- Allow software and hardware to integrate well and be portable on different systems.
The OSI model has seven layers. The principles that were applied to arrive at the seven layers are as
follows :
1. Each layer should perform
a well-defined function.
2. The function of each layer
should be chosen with an
eye toward defining
internationally
standardized protocols.
3. The layer boundaries
should be chosen to
minimize the information
flow across the interfaces.
The set of rules for
communication between entities in
a layer is called protocol for that
layer.
Q. 3 : Describe different types of Data Transmission Modes.
Ans. Data Transmission Modes :

The transmission of binary data across a link can be accomplished in either parallel or serial mode. In
parallel mode, multiple bits are sent with each clock tick. In serial mode, 1 bit is sent with each clock tick.
While there is one way to send parallel data, there are three subclasses of serial transmission :
asynchronous, synchronous, and isochronous.
Page 124
2010(Fall)
Serial and Parallel
Serial Transmission :
In serial transmission one bit follows another, so we need only one communication channel rather than n to
transmit data between two communicating devices.
The advantages of serial over parallel transmission is that with only one communication channel, serial
transmission reduces cost of transmission over parallel by roughly a factor of n.
Since communication within devices is parallel, conversion devices are required at the interface between the
sender and the line (parallel-to-serial) and between the line and the receiver (serial-to-parallel). Serial
transmission occurs in one of three ways : asynchronous, synchronous, and isochronous.
Parallel Transmission :
Binary data, consisting of 1s and 0s, may be organized into groups of n bits each. Computers produce and
consume data in groups of bits much as we conceive of and use spoken language in the form of words
rather than letters. By grouping, we can send data n bits at a time instead of 1. This is called parallel
transmission.
The mechanism for parallel transmission is a simple one : Use n wires to send n bits at one time. That way
each bit has its own wire, and all n bits of one group can be transmitted with each clock tick from one device
to another.
The advantage of parallel transmission is speed. All else being equal, parallel transmission can increase the
transfer speed by a factor on n over serial transmission.
But there is a significant disadvantage : cost. Parallel transmission requires n communication lines just to
transmit the data stream. Because this is expensive, parallel transmission is usually limited to short
distances.
Simplex, Half-duplex and Full-duplex :
Page 125
2010(Fall)
There are three modes of data transmission that correspond to the three types of circuits available. These
are :
a) Simplex
b) Half-duplex
c) Full-duplex
Simplex :
Simplex communications imply a simple method of communicating, which they are. In simplex
communication mode, there is a one-way communication transmission. Television transmission is a good
example of simplex communications. The main transmitter sends out a signal (broadcast), but it does not
expect a reply as the receiving units cannot issue a reply back to the transmitter. A data collection terminal
on a factory floor or a line printer (receive only). Another example of simplex communication is a keyboard
attached to a computer because the keyboard can only send data to the computer.
At first thought it might appear adequate for many types of application in which flow of information is
unidirectional. However, in almost all data processing applications, communication in both directions is
required. Even for a “one-way” flow of information from a terminal to computer, the system will be designed
to allow the computer to signal the terminal that data has been received. Without this capability, the remote
used might enter data and never know that it was not received by the other terminal. Hence, simplex circuits
are seldom used because a return path is generally needed to send acknowledgement, control or error
signals.
Half-duplex :
In half-duplex mode, both units communicate over the same medium, but only one unit can send at a time.
While one is in send mode, the other unit is in receiving mode. It is like two polite people talking to each other
– one talks, the other listens, but neither one talks at the same time. Thus, a half-duplex line can alternately
send and receive data. It requires two wires. This is the most common type of transmission for voice
communications because only one person is supposed to speak at a time. It is also used to connect a
terminal with a computer. The terminal might transmit data and then the computer responds with an
acknowledgement. The transmission of data to and from a hard disk is also done in half-duplex mode.
Full-duplex :
Page 126
2010(Fall)
In a half-duplex system, the line must be “turned around” each time the direction is reversed. This involves a
special switching circuit and requires a small amount of time (approximately 150 milliseconds). With high
speed capabilities of the computer, this turn-around time is unacceptable in many instances. Also, some
applications require simultaneous transmission in both directions. In such cases, a full-duplex system is used
that allows information to flow simultaneously in both directions on the transmission path. Use of a full-duplex
line improves efficiency as the line turn-around time required in a half-duplex arrangement is eliminated. It
requires four wires.
Synchronous and Asynchronous Transmission :
Synchronous Transmission :
In synchronous transmission, the bit stream is combined into longer “frames”, which may contain multiple
bytes. Each byte, however, is introduced onto the transmission link without a gap between it and the next
one. It is left to the receiver to separate the bit stream into bytes for decoding purpose. In other words, data
are transmitted as an unbroken sting of 1s and 0s, and the receiver separates that string into the bytes, or
characters, it needs to reconstruct the information.
Without gaps and start and stop bits, there is no built-in mechanism to help the receiving device adjust its
bits synchronization midstream. Timing becomes very important, therefore, because the accuracy of the
received information is completely dependent on the ability of the receiving device to keep an accurate count
of the bits as they come in.
The advantage of synchronous transmission is speed. With no extra bits or gaps to introduce at the sending
end and remove at the receiving end, and, by extension, with fewer bits to move across the link,
synchronous transmission is faster than asynchronous transmission of data from one computer to another.
Byte synchronization is accomplished in the data link layer.
Asynchronous Transmission :
Asynchronous transmission is so named because the timing of a signal is unimportant. Instead, information
is received and translated by agreed upon patterns. As long as those patterns are followed, the receiving
Page 127
2010(Fall)
device can retrieve the information without regard to the rhythm in which it is sent. Patterns are based on
grouping the bit stream into bytes. Each group usually 8 bits, is sent along the link as a unit. The sending
system handles each group independently, relaying it to the link whenever ready, without regard to t timer.
Without synchronization, the receiver cannot use timing to predict when the next group will arrive. To alert
the receiver to the arrival of an new group, therefore, an extra bit is added to the beginning of each byte. This
bit, usually a 0, is called the start bit. To let the receiver know that the byte is finished, 1 or more additional
bits are appended to the end of the byte. These bits, usually 1s, are called stop bits.
By this method, each byte is increased in size to at least 10 bits, of which 8 bits is information and 2 bits or
more are signals to the receiver. In addition, the transmission of each byte may then be followed by a gap of
varying duration. This gap can be represented either by an idle channel or by a stream of additional stop bits.
The start and stop bits and the gap alert the receiver to the beginning and end of the each byte and also it to
synchronize with the data stream. This mechanism is called asynchronous because, at the byte level, the
sender and receiver do not have to be synchronized. But within each byte, the receiver must still by
synchronized with the incoming bit stream.
That is, some synchronization is required, but only for the duration of a single byte. The receiving device
resynchronizes at the onset of each new byte. When the receiver detects a start bit, it sets a timer and
begins counting bits as they come in. After n bits, the receiver looks for a stop bit. As soon as it detects the
stop bit, it waits until it detects the next start bit.
Isochronous Transmission :
In real-time audio and video, in which uneven delays between frames are not acceptable, synchronous
transmission fails. For example, TV images are broadcast at the rate of 30 images per second; they must be
viewed at the same rate. If each image is send by using one or more frames, there should be no delays
between frames. For this type of application, synchronization between characters is not enough; the entire
stream of bits must be synchronized. The isochronous transmission guarantees that the data arrive at a fixed
rate.
Page 128
2010(Fall)
Q. 4 : Define Switching. What is the difference between Circuit Switching and Packet
Switching ?
Ans. Switching :
A network is a set of connected devices. Whenever we have multiple devices, we have the problem of how
to connect them to make one-to-one communication possible. One of the better solutions is switching. A
switch is network consists of a series of interlinked nodes, called switches. Switches are devices capable of
crating temporary connections between two or more devices linked to the switch. In a switched network,
some of these nodes are connected to the end systems (computers or telephones). Others are used only for
routing. Switched networks are divided.
Difference between Circuit Switching and Packet Switching
Item Circuit Switching Packet Switching
What is send Voice Message (divided)
Call setup Required Not required
Dedicated Physical Path Yes No
Each packet follows the same

Yes No
route
Packets arrive in order Yes No
Is a switch crash is fatal Yes No
Bandwidth available Yes No
Time of possible congestion At setup time On every packet
Store-and-forward No Yes
Q.5 : Classify Guided Medium (wired). Compare Fiber Optics and Copper Wire.
Page 129
2010(Fall)
Ans. Guided Transmission Medium (wired) :
Guided media, which are those that provide a conduit form one device to another, include twisted-pair cable,
coaxial cable, and fiber-optic cable. A single traveling along any of these media is directed and contained by
the physical limits of the medium. Twisted-pair and coaxial cable use metallic (copper) conductors that
accept and transport signals on the form of electric current. Optical fiber is a cable that accepts and
transports signals in the form of light.
1. Twisted Pair :
A twisted pair consists of two insulted copper wires, typically about 1 mm thick. The wires are twisted
together in a helical form, just like a DNA molecule.
Twisting is done because two parallel wires constitute a fine antenna. When the wires are twisted, the waves
from different twists cancel out, so the wire radiates less effectively.
The most common application of the twisted pair is the telephone system. All the telephones are connected
to the telco office is by twisted pair. It runs several kilometers without amplification, but for long distance,
repeaters are needed. If many wires are coming from one building or apartment, they are bundled together
and encased in a protective sheath.
Twisted pairs can be used for transmitting either analog or digital signals. The bandwidth depends on the
thickness of the wire and the distance traveled, but several megabits/sec can be achieved for a new
kilometers. Due to their adequate performance and low cost, twisted pairs are widely used and are likely to
remain so for years to come.
Twisted pair cabling comes in several varieties, two of which are important for computer networks :
1. Category 3 twisted pairs consist of two insulated wires gently twisted together. Four such pairs are
typically grouped in a plastic sheath to protect the wires and keep them together. Which are capable
of handling signals with bandwidth of 16 MHz. This scheme allowed up to four regular telephones or
two multi-line telephones in each office to connect to the telephone company equipment in the wiring
closet.
2. Category 5 twisted pairs similar to category 3 pairs, but with more twists per centimeter, which result
in less crosstalk and a better-quality signals over longer distances, making them more suitable for
Page 130
2010(Fall)
high-speed computer communication. Which are capable of handling signals with bandwidth of 100
MHz ?
2. Coaxial Cable :
The coaxial cable consists of a stiff copper wire as the core, surrounded by an insulating material. The
insulator is encased by a cylindrical conductor, often as a closely-woven braided mesh. The outer conductor
is covered in a protective plastic sheath.
The construction and shielding of the coaxial cable give it a good combination of high bandwidth and
excellent noise immunity. The bandwidth possible depends on the cable quality, length, and single-to-noise
ratio of the data signal, but coaxial cables have close bandwidth of 1 GHz.
Coaxial cable widely used within the telephone system for long-distance lines but now replaced by the fiber
optics. Coax is still widely used for cable television and metropolitan area networks.
3. Optical Fiber :
Page 131
2010(Fall)
A fiber-optic cable is made of glass or plastic and transmits signals in the form of light. To understand optical
fiber, we need to components of optical fiber.
An optical transmission system has three key components :
1. The light source
2. The transmission medium
3. The detector
A pulse of light indicates a 1 bit and the absence of light indicates a 0 bit. The transmission medium is an
ultra-thin fiber of glass. The detector generates an electrical pulse when light falls on it. By attaching a light
source to one end of an optical fiber and a detector to other, we have unidirectional data transmission
system that accepts an electrical signal, converts and transmits it by light pulses, and then reconverts the
output to an electrical signal at the receiving end.
When light passes from one medium to another, for example, from fused silica to air, the ray is refracted
(bend) at the silica/air boundary.
Here we see a light ray incident on boundary at an angle s1 emerging at an angle a1. The amount of
refraction depends on the properties of the two media. For angles of incidence above certain critical value,
the light is refracted back into the silica; none of its escapes into the air. Thus, a light ray incident at or above
the critical angle is trapped inside the fiber, and can propagate for many kilometers with virtually no loss.
As the light ray trapped by total internal reflection in a medium. Like this many different rays will be bouncing
around at different angles. Each ray is said to have a different modes. So a fiber having this property is
called a multimode fiber.
If the fiber’s diameter is reduced to a few wavelengths of light, the fiber acts like a wave guide, and the light
can propagate only in a straight line, without bouncing, yielding a single-mode fiber.
Fiber optic cables are similar to coax. At the center is the glass core through which the light propagates. In
multimode fibers, the core is typically 50 microns in diameter, about the thickness of human hair. In single-
mode fibers, the core is 8 to 10 microns.
The core is surrounded by a glass cladding with a lower index of refraction than the core, to keep all the light
in the core. Next comes a thin plastic jacket to protect the classing. Fibers are typically grouped in bundles,
protected by an outer sheath.
Page 132
2010(Fall)
Comparison of Fiber Optics and Copper Wire :
Fiber has many advantages over copper wire as a transmission media. These are :
• It can handle much higher band widths than copper. Due to the low attenuation, repeaters are
needed only about every 30 km. on long lines, versus about every 5 km. for copper.
• Fiber is not being affected by the power surges, electromagnetic interference, or power failures. Not
it is affected by corrosive chemicals in the air, making it deal for harsh factory environment.
• Fiber is lighter the copper. One thousand twisted pairs copper cables of 1 km. long weight 8000 kg.
But two fibers have more capacity and weigh only 100 kg., which greatly reduces the need for
expensive mechanical support systems that must be maintained.
• Fibers do not leak light and are quite difficult to tap. This gives them excellent security against
potential wire-tappers.
• If new routes designed, the fiber is the first choice because of lower installation cost.
Q. 6 : What are different types of Satellites ?

Ans. Classification of Satellites :
Four different types of satellite orbits can be identified depending on the shape and diameter of the orbit :
- GEO (Geostationary Orbit)

- LEO (Low Earth Orbit)
- MEO (Medium Earth Orbit) or ICO (Intermediate Circular Orbit)
- HEO (Highly Elliptical Orbit) elliptical orbits
Page 133
2010(Fall)
Van-Allen-Belts; ionized particles 2000 – 6000 km. and 15000 – 30000 km. above earth surface.
1. GEO (Geostationary Orbit) :

Altitude :
Ca. 36000 km. above earth surface.
Coverage :
Ideally suited for continuous, regional coverage using a single satellite. Can also be used equally
effectively for global coverage using a minimum of three satellites.
Visibility :
Mobile to satellite visibility decreases with increased latitude of the user. Poor Visibility in built-up,
urban regions.
2. LEO (Low Earth Orbit) :
Altitude :
Ca. 500 – 1500 km.
Coverage :
Multi-satellite constellations of upwards of 30-50 satellites are required for global, continuous
coverage. Single satellites can be used in store and forward mode for localized coverage but only
appear for short periods of time.
Visibility :
The use of satellite diversity, by which more than one satellite is visible at any given time, can be
used to optimize the link. This can be achieved by either selecting the optimum link or combining the
reception of two or more links. The higher the guaranteed minimum elevation angle to the user, the
more satellites is needed in the constellation.
Page 134
2010(Fall)
3. MEO (Medium Earth Orbit) :
Altitude :
Ca. 6000 – 20000 km.
Coverage :
Multi-satellite constellations of between 10 and 20 satellites are required for global coverage.
Visibility :
Good to excellent global visibility, augmented by the use of satellite diversity techniques.
4. HEO (Highly Elliptical Orbit) :
Altitude :
Apogee : 40 000 – 50 000 km., Perigee : 1000-20 000 km.
Coverage :
Three or four satellites are needed to provide continuous coverage to a region.
Visibility :
Particularly designed to provide high guaranteed elevation angle to satellite for Northern and
Southern temperate latitudes.
Page 135
2010(Fall)
Assignment (Set-2)
Computer Networks
Q.1 Write down the features of Fast Ethernet and Gigabit Ethernet.
Ans: Fast Ethernet
In computer networking, Fast Ethernet is a collective term for a number of Ethernet standards that carry
traffic at the nominal rate of 100 Mbit/s, against the original Ethernet speed of 10 Mbit/s. Of the fast Ethernet
standards 100BASE-TX is by far the most common and is supported by the vast majority of Ethernet
hardware currently produced. Fast Ethernet was introduced in 1995[1] and remained the fastest version of
Ethernet for three years before being superseded by gigabit Ethernet.[2]
A fast Ethernet adapter can be logically divided into a Media Access Controller (MAC) which deals with the
higher level issues of medium availability and a Physical Layer Interface (PHY). The MAC may be linked to
the PHY by a 4 bit 25 MHz synchronous parallel interface known as a Media Independent Interface (MII) or a
2 bit 50 MHz variant Reduced Media Independent Interface (RMII). Repeaters (hubs) are also allowed and
connect to multiple PHYs for their different interfaces.
The MII may (rarely) be an external connection but is usually a connection between ICs in a network adapter
or even within a single IC. The specs are written based on the assumption that the interface between MAC
and PHY will be a MII but they do not require it.
The MII fixes the theoretical maximum data bit rate for all versions of fast Ethernet to 100 Mbit/s. The data
signaling rate actually observed on real networks is less than the theoretical maximum, due to the necessary
header and trailer (addressing and error-detection bits) on every frame, the occasional "lost frame" due to
noise, and time waiting after each sent frame for other devices on the network to finish transmitting
100BASE-TX is the predominant form of Fast Ethernet, and runs over two wire-pairs inside a category 5 or
above cable (a typical category 5 cable contains 4 pairs and can therefore support two 100BASE-TX links).
Like 10BASE-T, the proper pairs are the orange and green pairs (canonical second and third pairs) in
TIA/EIA-568-B's termination standards, T568A or T568B. These pairs use pins 1, 2, 3 and 6.
In T568A and T568B, wires are in the order 1, 2, 3, 6, 4, 5, 7, 8 on the modular jack at each end. The color-
order would be green/white, green, orange/white, blue, blue/white, orange, brown/white, brown for T568A,
and orange/white, orange, green/white, blue, blue/white, green, brown/white, brown for T568B.
Each network segment can have a maximum distance of 100 metres (328 ft). In its typical configuration,
100BASE-TX uses one pair of twisted wires in each direction, providing 100 Mbit/s of throughput in each
direction (full-duplex). See IEEE 802.3 for more details.
Gigabit Ethernet
Gigabit Ethernet (GbE or 1 GigE) is a term describing various technologies for transmitting Ethernet frames
at a rate of a gigabit per second, as defined by the IEEE 802.3-2008 standard. Half-duplex gigabit links
connected through hubs are allowed by the specification but in the marketplace full-duplex with switches are
normal.
Intel PRO/1000 GT PCI network interface cardContents
IEEE 802.3ab, ratified in 1999, defines gigabit Ethernet transmission over unshielded twisted pair (UTP)
category 5, 5e, or 6 cabling and became known as 1000BASE-T. With the ratification of 802.3ab, gigabit
Page 136
2010(Fall)
Ethernet became a desktop technology as organizations could use their existing copper cabling
infrastructure.
IEEE 802.3ah, ratified in 2004 added two more Gigabit fiber standards, 1000BASE-LX10 (which was already
widely implemented as vendor specific extension) and 1000BASE-BX10. This was part of a larger group of
protocols known as Ethernet in the First Mile.
Initially, gigabit Ethernet was deployed in high-capacity backbone network links (for instance, on a high-
capacity campus network). In 2000, Apple's Power Mac G4 and PowerBook G4 were the first mass
produced personal computers featuring the 1000BASE-T connection.[1] It quickly became a built-in feature in
many other computers. As of 2009 Gigabit NICs (1000BASE-T) are included in almost all desktop and server
computer systems.
Higher bandwidth 10 Gigabit Ethernet standards have since become available as the IEEE ratified a fiber-
based standard in 2002, and a twisted pair standard in 2006. As of 2009 10Gb Ethernet is replacing 1Gb as
the backbone network and has begun to migrate down to high-end server systems.[citation needed]
Varieties
There are five different physical layer standards for gigabit Ethernet using optical fiber (1000BASE-X),
twisted pair cable (1000BASE-T), or balanced copper cable (1000BASE-CX).
The IEEE 802.3z standard includes 1000BASE-SX for transmission over multi-mode fiber, 1000BASE-LX for
transmission over single-mode fiber, and the nearly obsolete 1000BASE-CX for transmission over balanced
copper cabling. These standards use 8b/10b encoding, which inflates the line rate by 25%, from 1,000–1,250
Mbit/s to ensure a DC balanced signal. The symbols are then sent using NRZ.
IEEE 802.3ab, which defines the widely used 1000BASE-T interface type, uses a different encoding scheme
in order to keep the symbol rate as low as possible, allowing transmission over twisted pair.
Ethernet in the First Mile later added 1000BASE-LX10 and -BX10.
Q.2 Differentiate the working between pure ALOHA and slotted ALOHA.
Ans: ALOHA :
ALOHA is a medium access protocol that was originally designed for ground based radio broadcasting
however it is applicable to any system in which uncoordinated users are competing for the use of a shared
channel. Pure ALOHA and slotted ALOHA are the two versions of ALOHA.
Pure ALOHA uses a very simple idea that is to let users transmit whenever they have data to send. Pure
ALOHA is featured with the feedback property that enables it to listen to the channel and finds out whether
the frame was destroyed. Feedback is immediate in LANs but there is a delay of 270 msec in the satellite
transmission. It requires acknowledgment if listening to the channel is not possible due to some reason. It
can provide a channel utilization of 18 percent that is not appealing but it gives the advantage of transmitting
any time.
Slotted ALOHA divides time into discrete intervals and each interval corresponds to a frame of data. It
requires users to agree on slot boundaries. It does not allow a system to transmit any time. Instead the
system has to wait for the beginning if the next slot.
Q.3 Write down distance vector algorithm. Explain path vector protocol.
Ans: Distance Vector Algorithms
Page 137
2010(Fall)
Routing is the task of finding a path from a sender to a desired destination. In the IP "Catenet model" this
reduces primarily to a matter of finding gateways between networks. As long as a message remains on a
single network or subnet, any routing problems are solved by technology that is specific to the network. For
example, the Ethernet and the ARPANET each define a way in which any sender can talk to any specified
destination within that one network. IP routing comes in primarily when messages must go from a sender on
one such network to a destination on a different one. In that case, the message must pass through gateways
connecting the networks. If the networks are not adjacent, the message may pass through several
intervening networks, and the gateways connecting them. Once the message gets to a gateway that is on
the same network as the destination, that network's own technology is used to get to the destination.
Throughout this section, the term "network" is used generically to cover a single broadcast network (e.g., an
Ethernet), a point to point line, or the ARPANET. The critical point is that a network is treated as a single
entity by IP. Either no routing is necessary (as with a point to point line), or that routing is done in a manner
that is transparent to IP, allowing IP to treat the entire network as a single fully-connected system (as with an
Ethernet or the ARPANET). Note that the term "network" is used in a somewhat different way in discussions
of IP addressing. A single IP network number may be assigned to a collection of networks, with "subnet"
addressing being used to describe the individual networks. In effect, we are using the term "network" here to
refer to subnets in cases where subnet addressing is in use.
A number of different approaches for finding routes between networks are possible. One useful way of
categorizing these approaches is on the basis of the type of information the gateways need to exchange in
order to be able to find routes. Distance vector algorithms are based on the exchange of only a small amount
of information. Each entity (gateway or host) that participates in the routing protocol is assumed to keep
information about all of the destinations within the system. Generally, information about all entities connected
to one network is summarized by a single entry, which describes the route to all destinations on that network.
This summarization is possible because as far as IP is concerned, routing within a network is invisible. Each
entry in this routing database includes the next gateway to which datagrams destined for the entity should be
sent. In addition, it includes a "metric" measuring the total distance to the entity. Distance is a somewhat
generalized concept, which may cover the time delay in getting messages to the entity, the dollar cost of
sending messages to it, etc. Distance vector algorithms get their name from the fact that it is possible to
compute optimal routes when the only information exchanged is the list of these distances. Furthermore,
information is only exchanged among entities that are adjacent, that is, entities that share a common
network.
Although routing is most commonly based on information about networks, it is sometimes necessary to keep
track of the routes to individual hosts. The RIP protocol makes no formal distinction between networks and
hosts. It simply describes exchange of information about destinations, which may be either networks or
hosts. (Note however, that it is possible for an implementor to choose not to support host routes. See section
3.2.) In fact, the mathematical developments are most conveniently thought of in terms of routes from one
host or gateway to another. When discussing the algorithm in abstract terms, it is best to think of a routing
entry for a network as an abbreviation for routing entries for all of the entities connected to that network. This
sort of abbreviation makes sense only because we think of networks as having no internal structure that is
visible at the IP level. Thus, we will generally assign the same distance to every entity in a given network.
We said above that each entity keeps a routing database with one entry for every possible destination in the
system. An actual implementation is likely to need to keep the following information about each destination:
address: in IP implementations of these algorithms, this will be the IP address of the host or network.
gateway: the first gateway along the route to the destination.
interface: the physical network which must be used to reach the first gateway.
metric: a number, indicating the distance to the destination.
timer: the amount of time since the entry was last updated.
In addition, various flags and other internal information will probably be included. This database is initialized
with a description of the entities that are directly connected to the system. It is updated according to
information received in messages from neighboring gateways.
The most important information exchanged by the hosts and gateways is that carried in update messages.
Each entity that participates in the routing scheme sends update messages that describe the routing
database as it currently exists in that entity. It is possible to maintain optimal routes for the entire system by
Page 138
2010(Fall)
using only information obtained from neighboring entities. The algorithm used for that will be described in the
next section.
As we mentioned above, the purpose of routing is to find a way to get datagrams to their ultimate
destinations. Distance vector algorithms are based on a table giving the best route to every destination in the
system. Of course, in order to define which route is best, we have to have some way of measuring
goodness. This is referred to as the "metric".
In simple networks, it is common to use a metric that simply counts how many gateways a message must go
through. In more complex networks, a metric is chosen to represent the total amount of delay that the
message suffers, the cost of sending it, or some other quantity which may be minimized. The main
requirement is that it must be possible to represent the metric as a sum of "costs" for individual hops.
Formally, if it is possible to get from entity i to entity j directly (i.e., without passing through another gateway
between), then a cost, d(i,j), is associated with the hop between i and j. In the normal case where all entities
on a given network are considered to be the same, d(i,j) is the same for all destinations on a given network,
and represents the cost of using that network. To get the metric of a complete route, one just adds up the
costs of the individual hops that make up the route. For the purposes of this memo, we assume that the
costs are positive integers.
Let D(i,j) represent the metric of the best route from entity i to entity j. It should be defined for every pair of
entities. d(i,j) represents the costs of the individual steps. Formally, let d(i,j) represent the cost of going
directly from entity i to entity j. It is infinite if i and j are not immediate neighbors. (Note that d(i,i) is infinite.
That is, we don't consider there to be a direct connection from a node to itself.) Since costs are additive, it is
easy to show that the best metric must be described by
D(i,i) = 0, all i
D(i,j) = min [d(i,k) + D(k,j)], otherwise
k
and that the best routes start by going from i to those neighbors k for which d(i,k) + D(k,j) has the minimum
value. (These things can be shown by induction on the number of steps in the routes.) Note that we can limit
the second equation to k's that are immediate neighbors of i. For the others, d(i,k) is infinite, so the term
involving them can never be the minimum.
It turns out that one can compute the metric by a simple algorithm based on this. Entity i gets its neighbors k
to send it their estimates of their distances to the destination j. When i gets the estimates from k, it adds d(i,k)
to each of the numbers. This is simply the cost of traversing the network between i and k. Now and then i
compares the values from all of its neighbors and picks the smallest.
A proof is given in [2] that this algorithm will converge to the correct estimates of D(i,j) in finite time in the
absence of topology changes. The authors make very few assumptions about the order in which the entities
send each other their information, or when the min is recomputed. Basically, entities just can't stop sending
updates or recomputing metrics, and the networks can't delay messages forever. (Crash of a routing entity is
a topology change.) Also, their proof does not make any assumptions about the initial estimates of D(i,j),
except that they must be non-negative. The fact that these fairly weak assumptions are good enough is
important. Because we don't have to make assumptions about when updates are sent, it is safe to run the
algorithm asynchronously. That is, each entity can send updates according to its own clock. Updates can be
dropped by the network, as long as they don't all get dropped. Because we don't have to make assumptions
about the starting condition, the algorithm can handle changes. When the system changes, the routing
algorithm starts moving to a new equilibrium, using the old one as its starting point. It is important that the
algorithm will converge in finite time no matter what the starting point. Otherwise certain kinds of changes
might lead to non-convergent behavior.
The statement of the algorithm given above (and the proof) assumes that each entity keeps copies of the
estimates that come from each of its neighbors, and now and then does a min over all of the neighbors. In
fact real implementations don't necessarily do that. They simply remember the best metric seen so far, and
the identity of the neighbor that sent it. They replace this information whenever they see a better (smaller)
metric. This allows them to compute the minimum incrementally, without having to store data from all of the
neighbors.
Page 139
2010(Fall)
There is one other difference between the algorithm as described in texts and those used in real protocols
such as RIP: the description above would have each entity include an entry for itself, showing a distance of
zero. In fact this is not generally done. Recall that all entities on a network are normally summarized by a
single entry for the network. Consider the situation of a host or gateway G that is connected to network A. C
represents the cost of using network A (usually a metric of one). (Recall that we are assuming that the
internal structure of a network is not visible to IP, and thus the cost of going between any two entities on it is
the same.) In principle, G should get a message from every other entity H on network A, showing a cost of 0
to get from that entity to itself. G would then compute C + 0 as the distance to H. Rather than having G look
at all of these identical messages, it simply starts out by making an entry for network A in its table, and
assigning it a metric of C. This entry for network A should be thought of as summarizing the entries for all
other entities on network A. The only entity on A that can't be summarized by that common entry is G itself,
since the cost of going from G to G is 0, not C. But since we never need those 0 entries, we can safely get
along with just the single entry for network A. Note one other implication of this strategy: because we don't
need to use the 0 entries for anything, hosts that do not function as gateways don't need to send any update
messages. Clearly hosts that don't function as gateways (i.e., hosts that are connected to only one network)
can have no useful information to contribute other than their own entry D(i,i) = 0. As they have only the one
interface, it is easy to see that a route to any other network through them will simply go in that interface and
then come right back out it. Thus the cost of such a route will be greater than the best cost by at least C.
Since we don't need the 0 entries, non- gateways need not participate in the routing protocol at all.
Let us summarize what a host or gateway G does. For each destination in the system, G will keep a current
estimate of the metric for that destination (i.e., the total cost of getting to it) and the identity of the neighboring
gateway on whose data that metric is based. If the destination is on a network that is directly connected to G,
then G simply uses an entry that shows the cost of using the network, and the fact that no gateway is needed
to get to the destination. It is easy to show that once the computation has converged to the correct metrics,
the neighbor that is recorded by this technique is in fact the first gateway on the path to the destination. (If
there are several equally good paths, it is the first gateway on one of them.) This combination of destination,
metric, and gateway is typically referred to as a route to the destination with that metric, using that gateway.
The method so far only has a way to lower the metric, as the existing metric is kept until a smaller one shows
up. It is possible that the initial estimate might be too low. Thus, there must be a way to increase the metric.
It turns out to be sufficient to use the following rule: suppose the current route to a destination has metric D
and uses gateway G. If a new set of information arrived from some source other than G, only update the
route if the new metric is better than D. But if a new set of information arrives from G itself, always update D
to the new value. It is easy to show that with this rule, the incremental update process produces the same
routes as a calculation that remembers the latest information from all the neighbors and does an explicit
minimum. (Note that the discussion so far assumes that the network configuration is static. It does not allow
for the possibility that a system might fail.)
To summarize, here is the basic distance vector algorithm as it has been developed so far. (Note that this is
not a statement of the RIP protocol. There are several refinements still to be added.) The following procedure
is carried out by every entity that participates in the routing protocol. This must include all of the gateways in
the system. Hosts that are not gateways may participate as well.
Keep a table with an entry for every possible destination in the system. The entry contains the distance D to
the destination, and the first gateway G on the route to that network. Conceptually, there should be an entry
for the entity itself, with metric 0, but this is not actually included.
Periodically, send a routing update to every neighbor. The update is a set of messages that contain all of the
information from the routing table. It contains an entry for each destination, with the distance shown to that
destination.
When a routing update arrives from a neighbor G', add the cost associated with the network that is shared
with G'. (This should be the network over which the update arrived.) Call the resulting distance D'. Compare
the resulting distances with the current routing table entries. If the new distance D' for N is smaller than the
existing value D, adopt the new route. That is, change the table entry for N to have metric D' and gateway G'.
If G' is the gateway from which the existing route came, i.e., G' = G, then use the new metric even if it is
larger than the old one.
Page 140
2010(Fall)
A path vector protocol is a computer network routing protocol which maintains the path information that gets
updated dynamically. Updates which have looped through the network and returned to the same node are
easily detected and discarded. This algorithm is sometimes used in Bellman–Ford routing algorithms to avoid
"Count to Infinity" problems.
It is different from the distance vector routing and link state routing. Each entry in the routing table contains
the destination network, the next router and the path to reach the destination.
Path Vector Messages in BGP: The autonomous system boundary routers (ASBR), which participate in path
vector routing, advertise the reachability of networks. Each router that receives a path vector message must
verify that the advertised path is according to its policy. If the messages comply with the policy, the ASBR
modifies its routing table and the message before sending it to the next neighbor. In the modified message it
sends its own AS number and replaces the next router entry with its own identification.
BGP is an example of a path vector protocol. In BGP the routing table maintains the autonomous systems
that are traversed in order to reach the destination system. Exterior Gateway Protocol (EGP) does not use
path vectors.
Path vector protocols are a class of distance vector protocol in contrast to link state protoco
Q.4 State the working principle of TCP segment header and UDP header.
Ans: Transmission Control Protocol:
Transmission Control Protocol, or TCP as it is commonly referred to, is a transport-layer protocol that runs on
top of IP. TCP is a connection-oriented, end-to-end reliable protocol designed to fit into a layered hierarchy of
protocols which support multi-network applications. The TCP provides for reliable inter-process
communication between pairs of processes in host computers attached to distinct but interconnected
computer communication networks. Very few assumptions are made as to the reliability of the
communication protocols below the TCP layer. TCP assumes it can obtain a simple, potentially unreliable
datagram service from the lower level protocols. In principle, the TCP should be able to operate above a
wide spectrum of communication systems ranging from hard-wired connections to packet-switched or circuit-
switched networks.
TCP was specifically designed to be a reliable end-to-end byte stream transmission protocol over an
unreliable network. The IP layer does not provide any guarantees that datagrams will be delivered with any
degree of reliability. Hence it is up to the upper-layer protocol to provide this reliability. The key functionality
associated with TCP is basic data transfer.
Basic Data Transfer. From an application perspective, TCP transfers a contiguous stream of bytes through
the network. The application does not have to bother with chopping the data into basic blocks or datagrams.
TCP does this by grouping the bytes in TCP segments, which are passed to IP for transmission to the
destination.
Reliability. TCP assigns a sequence number to each byte transmitted and expects a positive
acknowledgment (ACK) from the receiving TCP. If the ACK is not received within a timeout interval, the data
are retransmitted. Since the data are transmitted in blocks (TCP segments), only the sequence number of
the first data byte in the segment is sent to the destination host.
Flow Control. The receiving TCP, when sending an ACK back to the sender, also indicates to the sender the
number of bytes it can receive beyond the last received TCP segment, without causing overrun and overflow
in its internal buffers. This is sent in the ACK in the form of the highest sequence number it can receive
without problems.
Multiplexing. Multiplexing is achieved through the concept of ports. A port is a 16-bit number used by the
host-to-host protocol to identify to which higher-level protocol or application process it must deliver incoming
messages. Two types of ports exist: (1) Well-known: these ports belong to standard applications servers
Page 141
2010(Fall)
such as telnet, ftp, and http. The well-known ports are controlled and assigned by the Internet Assigned
Numbers Authority (IANA). Well-known ports range from 1 to 1023. (2) Ephemeral: A client can negotiate the
use of a port dynamically and such ports can be called ephemeral. These ports are maintained for the
duration of the session and then released. Ephemeral ports range from 1024 to 65535. Multiple applications
can use the ports as a means of multiplexing for communicating with other nodes.
Connections. The reliability and flow control mechanisms require that TCP initializes and maintains certain
status information for each data stream. The combination of this status, including sockets, sequence
numbers, and window sizes, is called a logical connection. Each connection is uniquely identified by the pair
of sockets used by the sending and receiving processes.
TCP entities exchange data in the form of segments. A segment consists of a fixed 20-byte header and an
optional part followed by zero or more data bytes.
The fields in the TCP header are desribed as follows:
Source Port and Destination Port: These fields identify the local endpoints of a connection. Each TCP entity
decides how to allocate its own ports. A number of well-known ports are reserved for specific applications
(e.g., FTP).
Sequence and Acknowledgment Number: Indicate the sequence number of the packet. The ACK number
specifies the next byte expected, and not the last byte correctly received.
TCP Header Length: Indicates how many 32-bit words are contained in the TCP header. This is required
because of the Options field, which is of variable length.
Reserved: For future use.
The six 1-bit flags are as follows:
URG— Set to 1 if the Urgent pointer is in use.
ACK— Set to 1 to indicate that the Acknowledgment number is valid.
PSH— Indicates PUSHed data. The receiver is requested to deliver the data to the application and not buffer
it until a full buffer has been received.
RST— Used to reset a connection.
SYN— Used to establish connections.
FIN— Used to release a connection.
Window Size: This field tells how many bytes may be sent starting at the byte acknowledged. Flow control in
TCP is handled using a variable-size sliding window.
Checksum: Provided for reliability. It checksums the header and the data (and the pseudoheader when
applicable). While computing the checksum, the Checksum field itself is replaced with zeros.
Urgent Pointer: Used to indicate a byte offset from the current sequence number at which urgent data are to
be found.
Options: This field was designed to provide a way to add extra facilities not covered by the regular header.
Page 142
2010(Fall)
TCP has been the workhorse of the Internet, and a significant portion of Internet traffic today is carried via
TCP. The reliability and congestion control aspects of TCP make it ideally suited for a large number of
applications. TCP is formally defined in RFC 793. RFC 1122 provides some clarification and bug fixes, and a
few extensions are defined in RFC 1323.
User Data Protocol
User Data Protocol (UDP) is a connectionless transport protocol. UDP is basically an application interface to
IP. It adds no reliability, flow control, or error recovery to IP. It simply serves as a multiplexer/demultiplexer
for sending and receiving datagrams, using ports to direct the datagrams. UDP is a light-weight protocol with
very minimal overhead. The responsibility of recovering from errors, retransmission, etc., is up to the
application. Applications that need to communicate need to identify a target is more specific than simply the
IP address. UDP provides this function via the concept of ports. The format of the UDP datagram is shown in
Figure 2-6.
Figure 2-6. UDP header.
The following is a description of the fields of the UDP header:
Source and Destination Port: The two ports serve the same function as in TCP; they identify the endpoints
within the source and destination nodes.
UDP Length: This field includes the 8-byte UDP header and the data.
UDP Checksum: The checksum is computed over the UDP header, the IP header, and the data.
Although UDP does not implement flow control or reliable/ordered delivery, it does a little more work than
simply to demultiplex messages to some application—it ensures the correctness of the message via the
checksum. UDP uses the same cheksum algorithm as IP. UDP is described in RFC 768.
Q.5 What is IP addressing? Discuss different classes of IP Addressing.
Ans: IP Addressing:-
An Internet Protocol address (IP address) is a numerical label assigned to each
device (e.g. computer, printer) participating in a computer network that uses the Internet Protocol for
communication. An IP address serves two principal functions: host or network interface identification and
location addressing. Its role has been characterized as follows: "A name indicates what we seek. An address
indicates where it is. A route indicates how to get there."
The designers of computer network communication protocols defined an IP address as a 32-bit number and
this system, known as Internet Protocol Version 4 (IPv4), is still in use today. However, due to the enormous
growth of the Internet and the predicted depletion of available addresses, a new addressing system (IPv6),
using 128 bits for the address, was developed in 1995, standardized in 1998, and is now being deployed
world-wide.
Although IP addresses are stored as binary numbers, they are usually displayed in human-readable
notations, such as 172.16.254.1 (for IPv4), and 2001:db8:0:1234:0:567:1:1 (for IPv6).
The Internet Assigned Numbers Authority (IANA) manages the IP address space allocations globally and
cooperates with five regional Internet registries (RIRs) to allocate IP address blocks to local Internet
registries (Internet service providers) and other entities.
IP addresses were originally organized into classes. The address class determined the potential size of the
network.
Page 143
2010(Fall)
The class of an address specified which of the bits were used to identify the network, the network ID, or
which bits were used to identify the host ID, host computer. It also defined the total number of hosts subnets
per network. There were five classes of IP addresses: classes A through E.
Classful addressing is no longer in common usage and has now been replaced with classless addressing.
Any netmask can now be assigned to any IP address range.
The four octets that make up an IP address are conventionally represented by a, b, c, and d respectively.
The following table shows how the octets are distributed in classes A, B, and C.
Class IP Address Network ID Host ID
A a.b.c.d a b.c.d
B a.b.c.d a.b c.d
C a.b.c.d a.b.c d
Class A: Class A addresses are specified to networks with large number of total hosts. Class A allows for
126 networks by using the first octet for the network ID. The first bit in this octet, is always set and fixed to
zero. And next seven bits in the octet is all set to one, which then complete network ID. The 24 bits in the
remaining octets represent the hosts ID, allowing 126 networks and approximately 17 million hosts per
network. Class A network number values begin at 1 and end at 127.
Class B: Class B addresses are specified to medium to large sized of networks. Class B allows for 16,384
networks by using the first two octets for the network ID. The two bits in the first octet are always set and
fixed to 1 0. The remaining 6 bits, together with the next octet, complete network ID. The 16 bits in the third
and fourth octet represent host ID, allowing for approximately 65,000 hosts per network. Class B network
number values begin at 128 and end at 191.
Class C: Class C addresses are used in small local area networks (LANs). Class C allows for approximately
2 million networks by using the first three octets for the network ID. In class C address three bits are always
set and fixed to 1 1 0. And in the first three octets 21 bits complete the total network ID. The 8 bits of the last
octet represent the host ID allowing for 254 hosts per one network. Class C network number values begin at
192 and end at 223.
Class D and E: Classes D and E are not allocated to hosts. Class D addresses are used for multicasting, and
class E addresses are not available for general use: they are reserved for future purposes.
Q..6 Define Cryptography. Discuss two cryptographic techniques.
Ans: Cryptography Definition :
The practise and study of encryption and decryption - encoding data so that it can only be decoded by
specific individuals. A system for encrypting and decrypting data is a cryptosystem. These usually involve
an algorithm for combining the original data "plaintext" with one or more "keys" - numbers or strings of
characters known only to the sender and/or recipient. The resulting output is known as "ciphertext".
The security of a cryptosystem usually depends on the secrecy of some of the keys rather than with the
supposed secrecy of the algorithm. A strong cryptosystem has a large range of possible keys so that it is not
possible to just try all possible keys a "brute force" approach. A strong cryptosystem will produce ciphertext
which appears random to all standard statistical tests. A strong cryptosystem will resist all known previous
methods for breaking codes "cryptanalysis".
Page 144
2010(Fall)
Symmetric-key cryptography
Symmetric-key cryptography refers to encryption methods in which both the sender and receiver share the
same key (or, less commonly, in which their keys are different, but related in an easily computable way). This
was the only kind of encryption publicly known until June 1976.[12]
One round (out of 8.5) of the patented IDEA cipher, used in some versions of PGP for high-speed encryption
of, for instance, e-mail
The modern study of symmetric-key ciphers relates mainly to the study of block ciphers and stream ciphers
and to their applications. A block cipher is, in a sense, a modern embodiment of Alberti's polyalphabetic
cipher: block ciphers take as input a block of plaintext and a key, and output a block of ciphertext of the same
size. Since messages are almost always longer than a single block, some method of knitting together
successive blocks is required. Several have been developed, some with better security in one aspect or
another than others. They are the modes of operation and must be carefully considered when using a block
cipher in a cryptosystem.
The Data Encryption Standard (DES) and the Advanced Encryption Standard (AES) are block cipher designs
which have been designated cryptography standards by the US government (though DES's designation was
finally withdrawn after the AES was adopted).[14] Despite its deprecation as an official standard, DES
(especially its still-approved and much more secure triple-DES variant) remains quite popular; it is used
across a wide range of applications, from ATM encryption[15] to e-mail privacy[16] and secure remote
access.[17] Many other block ciphers have been designed and released, with considerable variation in
quality. Many have been thoroughly broken; see Category:Block ciphers.[13][18]
Stream ciphers, in contrast to the 'block' type, create an arbitrarily long stream of key material, which is
combined with the plaintext bit-by-bit or character-by-character, somewhat like the one-time pad. In a stream
cipher, the output stream is created based on a hidden internal state which changes as the cipher operates.
That internal state is initially set up using the secret key material. RC4 is a widely used stream cipher; see
Category:Stream ciphers.[13] Block ciphers can be used as stream ciphers; see Block cipher modes of
operation.
Cryptographic hash functions are a third type of cryptographic algorithm. They take a message of any length
as input, and output a short, fixed length hash which can be used in (for example) a digital signature. For
good hash functions, an attacker cannot find two messages that produce the same hash. MD4 is a long-used
hash function which is now broken; MD5, a strengthened variant of MD4, is also widely used but broken in
practice. The U.S. National Security Agency developed the Secure Hash Algorithm series of MD5-like hash
functions: SHA-0 was a flawed algorithm that the agency withdrew; SHA-1 is widely deployed and more
secure than MD5, but cryptanalysts have identified attacks against it; the SHA-2 family improves on SHA-1,
but it isn't yet widely deployed, and the U.S. standards authority thought it "prudent" from a security
perspective to develop a new standard to "significantly improve the robustness of NIST's overall hash
algorithm toolkit."[19] Thus, a hash function design competition is underway and meant to select a new U.S.
national standard, to be called SHA-3, by 2012.
Message authentication codes (MACs) are much like cryptographic hash functions, except that a secret key
can be used to authenticate the hash value[13] upon receipt.
[edit]
Public-key cryptography
Symmetric-key cryptosystems use the same key for encryption and decryption of a message, though a
message or group of messages may have a different key than others. A significant disadvantage of
symmetric ciphers is the key management necessary to use them securely. Each distinct pair of
communicating parties must, ideally, share a different key, and perhaps each ciphertext exchanged as well.
The number of keys required increases as the square of the number of network members, which very quickly
requires complex key management schemes to keep them all straight and secret. The difficulty of securely
establishing a secret key between two communicating parties, when a secure channel does not already exist
between them, also presents a chicken-and-egg problem which is a considerable practical obstacle for
cryptography users in the real world.
Page 145
2010(Fall)
Whitfield Diffie and Martin Hellman, authors of the first published paper on public-key cryptography
In a groundbreaking 1976 paper, Whitfield Diffie and Martin Hellman proposed the notion of public-key (also,
more generally, called asymmetric key) cryptography in which two different but mathematically related keys
are used—a public key and a private key.[ A public key system is so constructed that calculation of one key
(the 'private key') is computationally infeasible from the other (the 'public key'), even though they are
necessarily related. Instead, both keys are generated secretly, as an interrelated pair.[21] The historian
David Kahn described public-key cryptography as "the most revolutionary new concept in the field since
polyalphabetic substitution emerged in the Renaissance".
In public-key cryptosystems, the public key may be freely distributed, while its paired private key must remain
secret. The public key is typically used for encryption, while the private or secret key is used for decryption.
Diffie and Hellman showed that public-key cryptography was possible by presenting the Diffie–Hellman key
exchange protocol.
In 1978, Ronald Rivest, Adi Shamir, and Len Adleman invented RSA, another public-key system.
In 1997, it finally became publicly known that asymmetric key cryptography had been invented by James H.
Ellis at GCHQ, a British intelligence organization, and that, in the early 1970s, both the Diffie–Hellman and
RSA algorithms had been previously developed (by Malcolm J. Williamson and Clifford Cocks, respectively).
The Diffie–Hellman and RSA algorithms, in addition to being the first publicly known examples of high quality
public-key algorithms, have been among the most widely used. Others include the Cramer–Shoup
cryptosystem, ElGamal encryption, and various elliptic curve techniques. See Category:Asymmetric-key
cryptosystems.
Padlock icon from the Firefox Web browser, meant to indicate a page has been sent in SSL or TLS-
encrypted protected form. However, such an icon is not a guarantee of security; any subverted browser
might mislead a user by displaying such an icon when a transmission is not actually being protected by SSL
or TLS.
In addition to encryption, public-key cryptography can be used to implement digital signature schemes. A
digital signature is reminiscent of an ordinary signature; they both have the characteristic that they are easy
for a user to produce, but difficult for anyone else to forge. Digital signatures can also be permanently tied to
the content of the message being signed; they cannot then be 'moved' from one document to another, for
any attempt will be detectable. In digital signature schemes, there are two algorithms: one for signing, in
which a secret key is used to process the message (or a hash of the message, or both), and one for
verification, in which the matching public key is used with the message to check the validity of the signature.
RSA and DSA are two of the most popular digital signature schemes. Digital signatures are central to the
operation of public key infrastructures and many network security schemes (e.g., SSL/TLS, many VPNs,
etc.).
Public-key algorithms are most often based on the computational complexity of "hard" problems, often from
number theory. For example, the hardness of RSA is related to the integer factorization problem, while
Diffie–Hellman and DSA are related to the discrete logarithm problem. More recently, elliptic curve
cryptography has developed in which security is based on number theoretic problems involving elliptic
curves. Because of the difficulty of the underlying problems, most public-key algorithms involve operations
such as modular multiplication and exponentiation, which are much more computationally expensive than the
techniques used in most block ciphers, especially with typical key sizes. As a result, public-key
cryptosystems are commonly hybrid cryptosystems, in which a fast high-quality symmetric-key encryption
algorithm is used for the message itself, while the relevant symmetric key is sent with the message, but
encrypted using a public-key algorithm. Similarly, hybrid signature schemes are often used, in which a
cryptographic hash function is computed
Page 146
2010(Fall)
Assignment (Set-1)
Business intelligence & Tools
Q1. Define the term business intelligence tools? Briefly explain how the data from the one
end gets transformed into information at the other end?
Ans:
In this section will be familiar with the definition of BI also the application involved in it. Business intelligence
(BI) is a wide category of applications and technologies which gathers, stores, analysis, and provides
access to data.
It helps enterprise users to make better business decisions. BI applications involve the activities of decision
support systems, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting,
and data mining.
Business intelligence tools provide information on how trade is presently being conducted and what are the
areas to be developed. Business intelligence tool are a kind of application software which is developed to
report, evaluate, and present data. The tools generally read data that are stored previously however nor
necessarily, in data warehouse. Following are some of the types of Business Intelligence Tools commonly
used.
Multiplicity of business intelligence tools:
Multiplicity of business intelligence tools offers past, existing, and observations which can be expected in
future which are of business operations. The main features of business intelligence methodologies include
reporting, online systematic processing, analytics, withdrawal of information, industry performance
administration, benchmarking, text mining, and projecting analytics.
It always encourages enhanced business decision-making processes. Therefore it is also called a decision
support system. Even if he word business intelligence is frequently used as a synonym for competitive
intelligence, BI uses technologies, processes, and functions to examine mainly internal, planned data and
business methods as competitive intelligence, is done by assembling, evaluating and distribute information
with or without the encouragement from technology and applications. It mainly concentrates on all-source
information and data, which is mostly external and also internal to an organization, which helps in decision
making.
Q2. What Do mean by data ware house? What are the major concepts and terminology used
in the study of data warehouse?
Page 147
2010(Fall)
Ans. In order to survive the market competition, an organization has to monitor the changes within the
organization and outside the organization. An organization that cannot study the current trends within the
organization and without its operations such as corporate relations will not be able to survive in the world
today. This is where data warehousing and its applications comes into play.
Data warehousing: technology is the process by which the historical data of a company (also referred to as
corporate memory) is created and utilized. A data warehouse is the database that contains data relevant to
corporate information. This includes sales figures, market performances, accounts payables, and leave
details of employees. However, the data warehouse is not limited to the above mentioned data. The data
available is useful in making decisions based on past performances of employees, expenses and
experiences.
To utilize the data warehousing technology, companies can opt for online transaction processing (OLTP) or
online analytical processing (OLAP). The uses of data warehousing are many. Let us consider a bank
scenario to analyse the importance of data warehousing. In a bank, the account of several customers have
to be maintained. It includes the balance, savings, and deposit details. The particulars of the bank
employees and the information regarding their performance have to be maintained. Data warehousing
technologies are used for the same.
Data warehousing transforms data to information and enables the organizations to analyse its operations
and performances. This task is done by the staging and transformation of data from data sources. The data
stores may be stored on disk or memory.
To extract, clean and load data from online transactions processing (OLTP) and the repositories of data, the
data warehousing system uses backend tools. Data warehousing consists of the data storage are composed
of the data warehouse, the data marts and the data store. It also provides tools like OLAP to organize,
partition and summarise data in the data warehouse and data marts. Mining, querying and reporting on data
requires front end tools.
Contrasting OLTP and Data Warehousing Environments
Illustrates key differences between an OLTP system and a data warehouse.
One major difference between the types of system is that data warehouses are not usually in third normal
form (3NF), a type of data normalization common in OLTP environments.
Data warehouses and OLTP systems have very different requirements. Here are some examples of
differences between typical data warehouses and OLTP systems:
Page 148
2010(Fall)
• Workload : Data warehouses are designed to accommodate ad hoc queries. You might not know the
workload of your data warehouse in advance, so a data warehouse should be optimized to perform well
for a wide variety of possible query operations. OLTP systems support only predefined operations. Your
applications might be specifically tuned or designed to support only these operations.
• Data modifications :A data warehouse is updated on a regular basis by the ETL process (run nightly or
weekly) using bulk data modification techniques. The end users of a data warehouse do not directly
update the data warehouse. In OLTP systems, end users routinely issue individual data modification
statements to the database. The OLTP database is always up to date, and reflects the current state of
each business transaction.
• Schema design :Data warehouses often use denormalized or partially denormalized schemas (such as a
star schema) to optimize query performance. OLTP systems often use fully normalized schemas to
optimize update/insert/delete performance, and to guarantee data consistency.
• Typical operations : A typical data warehouse query scans thousands or millions of rows. For example,
"Find the total sales for all customers last month. A typical OLTP operation accesses only a handful of
records. For example, "Retrieve the current order for this customer."
• Historical data : Data warehouses usually store many months or years of data. This is to support
historical analysis. OLTP systems usually store data from only a few weeks or months. The OLTP
system stores only historical data as needed to successfully meet the requirements of the current
transaction.
Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the specifics of an organization's situation.
Three common architectures are:
• Data Warehouse Architecture (Basic)
• Data Warehouse Architecture (with a Staging Area)
• Data Warehouse Architecture (with a Staging Area and Data Marts)
Data Warehouse Architecture (Basic)

shows a simple architecture for a
data warehouse. End users
directly access data derived from
several source systems through
the data warehouse.
In , the metadata and raw data of a

traditional OLTP system is present, as is
an additional type of data, summary data.
Summaries are very valuable in data
warehouses because they pre-
compute long operations in advance.
For example, a typical data
warehouse query is to retrieve
something like August sales. A
summary in Oracle is called a
materialized view.
Page 149
2010(Fall)
Data Warehouse Architecture (with a Staging Area)
In , you need to clean and process your operational data before putting it into the warehouse. You can do
this programmatically, although most data warehouses use a staging area instead. A staging area simplifies
building summaries and general warehouse management. illustrates this typical architecture.
Data Warehouse Architecture (with a Staging Area and Data Marts)
Although the architecture in is quite common, you may want to customize your warehouse's architecture for
different groups within your organization. You can do this by adding data marts, which are systems
designed for a particular line of business. illustrates an example where purchasing, sales, and inventories
are separated. In this example, a financial analyst might want to analyze historical data for purchases and
sales.
Q3. What are the data modeling techniques used in data warehousing environment?
Ans: Data Modelling Multi-fact Star Schema or Snowflake Schema
Each of the dimension table consists of a single field primary key that has one-to-many relationship with a
foreign key in the fact table. Let us look into some facts related to star and snowflake schema.
Model
The fact table consists of the main data and the other smaller dimension tables contain the description for
each value in the dimensions. The dimension tables can be connected to the fact table. Fact table consist of
a set of foreign keys that makes a complex primary key. Dimension tables consist of a primary key.
One of the reasons for using star schema is because it is simple. The queries are not complex as the joins
and conditions involve a fact table and few single level dimension tables. In snowflake schema the queries
are complex because of multiple levels of dimension tables.
Uses :
Page 150
2010(Fall)
The stat and snowflake schema are used in dimensional data where the speed or retrieval is more important
than the efficiency of data management. Therefore, data is not normalized much. The decision as to which
schema should be used depends on two factors: the database platform, the query tool to be used. Star
schema is relevant in environment where the queries are much simpler and the query tools expose the users
to the fundamental table structures. Snowflake schema would be apt for environments with several queries
with complex conditions where the user is detached from the fundamental table structures.
Data Normalization and storage:
The data in the database could be repeated. To reduce redundancy we use normalization. Commonly
repeated data are moved into a new table. Therefore, the number of tables to be joined to execute a query
increases. However, normalization reduces the space required for the storage of redundant data and other
places where it has to update. The dimensional tables are smaller compared to fact table when storage is
concerned.
What is Data Modeling?
Data modeling is the act of exploring data-oriented structures. Like other modeling artifacts data models can
be used for a variety of purposes, from high-level conceptual models to physical data models. From the
point of view of an object-oriented developer data modeling is conceptually similar to class modeling. With
data modeling you identify entity types whereas with class modeling you identify classes. Data attributes are
assigned to entity types just as you would assign attributes and operations to classes. There are
associations between entities, similar to the associations between classes – relationships, inheritance,
composition, and aggregation are all applicable concepts in data modeling.
Traditional data modeling is different from class modeling because it focuses solely on data – class models
allow you to explore both the behavior and data aspects of your domain, with a data model you can only
explore data issues. Because of this focus data modelers have a tendency to be much better at getting the
data “right” than object modelers. However, some people will model database methods (stored procedures,
stored functions, and triggers) when they are physical data modeling. It depends on the situation of course,
but I personally think that this is a good idea and promote the concept in my UML data modeling profile
(more on this later).
Although the focus of this article is data modeling, there are often alternatives to data-oriented artifacts
(never forget Agile Modeling’s Multiple Models principle). For example, when it comes to conceptual
modeling ORM diagrams aren’t your only option – In addition to LDMs it is quite common for people to create
UML class diagrams and even Class Responsibility Collaborator (CRC) cards instead. In fact, my
experience is that CRC cards are superior to ORM diagrams because it is very easy to get project
stakeholders actively involved in the creation of the model. Instead of a traditional, analyst-led drawing
session you can instead facilitate stakeholders through the creation of CRC cards.
How are Data Models Used in Practice?
Although methodology issues are covered later, we need to discuss how data models can be used in
practice to better understand them. You are likely to see three basic styles of data model:
• Conceptual data models. These models, sometimes called domain models, are typically used to
explore domain concepts with project stakeholders. On Agile teams high-level conceptual models are
often created as part of your initial requirements envisioning efforts as they are used to explore the
high-level static business structures and concepts. On traditional teams conceptual data models are
often created as the precursor to LDMs or as alternatives to LDMs.
Page 151
2010(Fall)
• Logical data models (LDMs). LDMs are used to explore the domain concepts, and their
relationships, of your problem domain. This could be done for the scope of a single project or for your
entire enterprise. LDMs depict the logical entity types, typically referred to simply as entity types, the
data attributes describing those entities, and the relationships between the entities. LDMs are rarely
used on Agile projects although often are on traditional projects (where they rarely seem to add much
value in practice).
Physical data models (PDMs). PDMs are used to design the internal schema of a database, depicting the
data tables, the data columns of those tables, and the relationships between the tables. PDMs often prove to
be useful on both Agile and traditional projects and as a result the focus of this article is on physical
modeling.
Although LDMs and PDMs sound very similar, and they in fact are, the level of detail that they model can be
significantly different. This is because the goals for each diagram is different – you can use an LDM to
explore domain concepts with your stakeholders and the PDM to define your database design. Figure 1
presents a simple LDM and Figure 2 a simple PDM, both modeling the concept of customers and addresses
as well as the relationship between them. Both diagrams apply the Barker notation, summarized below.
Notice how the PDM shows greater detail, including an associative table required to implement the
association as well as the keys needed to maintain the relationships. More on these concepts later. PDMs
should also reflect your organization’s database naming standards, in this case an abbreviation of the entity
name is appended to each column name and an abbreviation for “Number” was consistently introduced. A
PDM should also indicate the data types for the columns, such as integer and char(5). Although Figure 2
does not show them, lookup tables (also called reference tables or description tables) for how the address is
used as well as for states and countries are implied by the attributes ADDR_USAGE_CODE, STATE_CODE,
and COUNTRY_CODE.
A simple logical data model.
A simple physical data model.
An important observation about Figures 1 and 2 is that I’m not slavishly following Barker’s approach to
naming relationships. For example, between Customer and Address there really should be two names
Page 152
2010(Fall)
“Each CUSTOMER may be located in one or more ADDRESSES” and “Each ADDRESS may be the site of
one or more CUSTOMERS”. Although these names explicitly define the relationship I personally think that
they’re visual noise that clutter the diagram. I prefer simple names such as “has” and then trust my readers
to interpret the name in each direction. I’ll only add more information where it’s needed, in this case I think
that it isn’t. However, a significant advantage of describing the names the way that Barker suggests is that
it’s a good test to see if you actually understand the relationship – if you can’t name it then you likely don’t
understand it.
Data models can be used effectively at both the enterprise level and on projects. Enterprise architects will
often create one or more high-level LDMs that depict the data structures that support your enterprise, models
typically referred to as enterprise data models or enterprise information models. An enterprise data model is
one of several views that your organization’s enterprise architects may choose to maintain and support –
other views may explore your network/hardware infrastructure, your organization structure, your software
infrastructure, and your business processes (to name a few). Enterprise data models provide information
that a project team can use both as a set of constraints as well as important insights into the structure of their
system.
Project teams will typically create LDMs as a primary analysis artifact when their implementation
environment is predominantly procedural in nature, for example they are using structured COBOL as an
implementation language. LDMs are also a good choice when a project is data-oriented in nature, perhaps a
data warehouse or reporting system is being developed (having said that, experience seems to show that
usage-centered approaches appear to work even better). However LDMs are often a poor choice when a
project team is using object-oriented or component-based technologies because the developers would rather
work with UML diagrams or when the project is not data-oriented in nature. As Agile Modeling advises, apply
the right artifact(s) for the job. Or, as your grandfather likely advised you, use the right tool for the job. It's
important to note that traditional approaches to Master Data Management (MDM) will often motivate the
creation and maintenance of detailed LDMs, an effort that is rarely justifiable in practice when you consider
the total cost of ownership (TCO) when calculating the return on investment (ROI) of those sorts of efforts.
When a relational database is used for data storage project teams are best advised to create a PDMs to
model its internal schema. My experience is that a PDM is often one of the critical design artifacts for
business application development projects.
What About Conceptual Models?
Halpin (2001) points out that many data professionals prefer to create an Object-Role Model (ORM), an
example is depicted in Figure 3, instead of an LDM for a conceptual model. The advantage is that the
notation is very simple, something your project stakeholders can quickly grasp, although the disadvantage is
that the models become large very quickly. ORMs enable you to first explore actual data examples instead
of simply jumping to a potentially incorrect abstraction – for example Figure 3 examines the relationship
between customers and addresses in detail.
A simple Object-Role Model.
Page 153
2010(Fall)
It is seen that people will capture information in the best place that they know. As a result I typically discard
ORMs after I’m finished with them. I sometimes user ORMs to explore the domain with project
stakeholders but later replace them with a more traditional artifact such as an LDM, a class diagram,
or even a PDM. As a generalizing specialist, someone with one or more specialties who also
strives to gain general skills and knowledge, this is an easy decision for me to make; I know that this
information that I’ve just “discarded” will be captured in another artifact – a model, the tests, or even
the code – that I understand. A specialist who only understands a limited number of artifacts and
therefore “hands-off” their work to other specialists doesn’t have this as an option. Not only are they
tempted to keep the artifacts that they create but also to invest even more time to enhance the
artifacts. Generalizing specialists are more likely than specialists to travel light.
Common Data Modeling Notations
Figure presents a summary of the syntax of four common data modeling notations: Information Engineering
(IE), Barker, IDEF1X, and the Unified Modeling Language (UML). This diagram isn’t meant to be
comprehensive, instead its goal is to provide a basic overview. Furthermore, for the sake of brevity I wasn’t
able to depict the highly-detailed approach to relationship naming that Barker suggests. Although I provide a
brief description of each notation in Table 1 I highly suggest David Hay’s paper A Comparison of Data
Modeling Techniques as he goes into greater detail than I do.
Comparing the syntax of common data modeling notations.
Page 154
2010(Fall)
Table . Discussing common data modeling notations.
Notation Comments
The IE notation (Finkelstein 1989) is simple and easy to read, and is well suited for high-level
logical and enterprise data modeling. The only drawback of this notation, arguably an advantage,
IE is that it does not support the identification of attributes of an entity. The assumption is that the
attributes will be modeled with another diagram or simply described in the supporting
documentation.
Barker The Barker notation is one of the more popular ones, it is supported by Oracle’s toolset, and is
well suited for all types of data models. It’s approach to subtyping can become clunky with
Page 155
2010(Fall)
hierarchies that go several levels deep.

This notation is overly complex. It was originally intended for physical modeling but has been
misapplied for logical modeling as well. Although popular within some U.S. government agencies,
IDEF1X
particularly the Department of Defense (DoD), this notation has been all but abandoned by
everyone else. Avoid it if you can.
This is not an official data modeling notation (yet). Although several suggestions for a data
modeling profile for the UML exist, none are complete and more importantly are not “official” UML
UML
yet. However, the Object Management Group (OMG) in December 2005 announced an RFP for
data-oriented models.
How to Model Data

It is critical for an application developer to have a grasp of the fundamentals of data modeling so they can not
only read data models but also work effectively with Agile DBAs who are responsible for the data-oriented
aspects of your project. Your goal reading this section is not to learn how to become a data modeler, instead
it is simply to gain an appreciation of what is involved.
The following tasks are performed in an iterative manner:
• Identify entity types
• Identify attributes
• Apply naming conventions
• Identify relationships
• Apply data model patterns
• Assign keys
• Normalize to reduce data redundancy
• Denormalize to improve performance
Identify Entity Types

An entity type, also simply called entity (not exactly accurate terminology, but very common in practice), is
similar conceptually to object-orientation’s concept of a class – an entity type represents a collection of
similar objects. An entity type could represent a collection of people, places, things, events, or concepts.
Examples of entities in an order entry system would include Customer, Address, Order, Item, and Tax. If you
were class modeling you would expect to discover classes with the exact same names. However, the
difference between a class and an entity type is that classes have both data and behavior whereas entity
types just have data.
Ideally an entity should be normal, the data modeling world’s version of cohesive. A normal entity depicts
one concept, just like a cohesive class models one concept. For example, customer and order are clearly
two different concepts; therefore it makes sense to model them as separate entities.
Identify Attributes
Each entity type will have one or more data attributes. For example, in Figure 1 you saw that the Customer
entity has attributes such as First Name and Surname and in Figure 2 that the TCUSTOMER table had
corresponding data columns CUST_FIRST_NAME and CUST_SURNAME (a column is the implementation
of a data attribute within a relational database).
Attributes should also be cohesive from the point of view of your domain, something that is often a judgment
call. – in Figure 1 we decided that we wanted to model the fact that people had both first and last names
instead of just a name (e.g. “Scott” and “Ambler” vs. “Scott Ambler”) whereas we did not distinguish between
the sections of an American zip code (e.g. 90210-1234-5678). Getting the level of detail right can have a
significant impact on your development and maintenance efforts. Refactoring a single data column into
several columns can be difficult, database refactoring is described in detail in Database Refactoring,
although over-specifying an attribute (e.g. having three attributes for zip code when you only needed one)
can result in overbuilding your system and hence you incur greater development and maintenance costs
than you actually needed.
Page 156
2010(Fall)
Apply Data Naming Conventions

Your organization should have standards and guidelines applicable to data modeling, something you should
be able to obtain from your enterprise administrators (if they don’t exist you should lobby to have some put in
place). These guidelines should include naming conventions for both logical and physical modeling, the
logical naming conventions should be focused on human readability whereas the physical naming
conventions will reflect technical considerations. You can clearly see that different naming conventions were
applied in Figures.
As you saw in Introduction to Agile Modeling, AM includes the Apply Modeling Standards practice. The basic
idea is that developers should agree to and follow a common set of modeling standards on a software
project. Just like there is value in following common coding conventions, clean code that follows your chosen
coding guidelines is easier to understand and evolve than code that doesn't, there is similar value in
following common modeling conventions.
Identify Relationships
In the real world entities have relationships with other entities. For example, customers PLACE orders,
customers LIVE AT addresses, and line items ARE PART OF orders. Place, live at, and are part of are all
terms that define relationships between entities. The relationships between entities are conceptually
identical to the relationships (associations) between objects.
Figure depicts a partial LDM for an online ordering system. The first thing to notice is the various styles
applied to relationship names and roles – different relationships require different approaches. For example
the relationship between Customer and Order has two names, places and is placed by, whereas the
relationship between Customer and Address has one. In this example having a second name on the
relationship, the idea being that you want to specify how to read the relationship in each direction, is
redundant – you’re better off to find a clear wording for a single relationship name, decreasing the clutter on
your diagram. Similarly you will often find that by specifying the roles that an entity plays in a relationship will
often negate the need to give the relationship a name (although some CASE tools may inadvertently force
you to do this). For example the role of billing address and the label billed to are clearly redundant, you
really only need one. For example the role part of that Line Item has in its relationship with Order is
sufficiently obvious without a relationship name.
A logical data model (Information Engineering notation).
It is also need to identify the cardinality and optionality of a relationship (the UML combines the concepts of
optionality and cardinality into the single concept of multiplicity). Cardinality represents the concept of “how
many” whereas optionality represents the concept of “whether you must have something.” For example, it is
not enough to know that customers place orders. How many orders can a customer place? None, one, or
several? Furthermore, relationships are two-way streets: not only do customers place orders, but orders are
placed by customers. This leads to questions like: how many customers can be enrolled in any given order
and is it possible to have an order with no customer involved? Figure 5 shows that customers place one or
more orders and that any given order is placed by one customer and one customer only. It also shows that a
customer lives at one or more addresses and that any given address has zero or more customers living at it.
Page 157
2010(Fall)
Although the UML distinguishes between different types of relationships – associations, inheritance,
aggregation, composition, and dependency – data modelers often aren’t as concerned with this issue as
much as object modelers are. Subtyping, one application of inheritance, is often found in data models, an
example of which is the is a relationship between Item and it’s two “sub entities” Service and Product.
Aggregation and composition are much less common and typically must be implied from the data model, as
you see with the part of role that Line Item takes with Order. UML dependencies are typically a software
construct and therefore wouldn’t appear on a data model, unless of course it was a very highly detailed
physical model that showed how views, triggers, or stored procedures depended on other aspects of the
database schema.
Assign Keys
There are two fundamental strategies for assigning keys to tables. First, you could assign a natural key
which is one or more existing data attributes that are unique to the business concept. The Customer table of
Figure 6 there was two candidate keys, in this case CustomerNumber and SocialSecurityNumber. Second,
you could introduce a new column, called a surrogate key, which is a key that has no business meaning. An
example of which is the AddressID column of the Address table in Figure 6. Addresses don’t have an “easy”
natural key because you would need to use all of the columns of the Address table to form a key for itself
(you might be able to get away with just the combination of Street and ZipCode depending on your problem
domain), therefore introducing a surrogate key is a much better option in this case.
Figure . Customer and Address revisited (UML notation).
Let's consider Figure in more detail. Figure presents an alternative design to that presented in Figure, a
different naming convention was adopted and the model itself is more extensive. In Figure 6 the Customer
table has the CustomerNumber column as its primary key and SocialSecurityNumber as an alternate key.
This indicates that the preferred way to access customer information is through the value of a person’s
customer number although your software can get at the same information if it has the person’s social security
number. The CustomerHasAddress table has a composite primary key, the combination of
CustomerNumber and AddressID. A foreign key is one or more attributes in an entity type that represents a
key, either primary or secondary, in another entity type. Foreign keys are used to maintain relationships
between rows. For example, the relationships between rows in the CustomerHasAddress table and the
Page 158
2010(Fall)
Customer table is maintained by the CustomerNumber column within the CustomerHasAddress table. The
interesting thing about the CustomerNumber column is the fact that it is part of the primary key for
CustomerHasAddress as well as the foreign key to the Customer table. Similarly, the AddressID column is
part of the primary key of CustomerHasAddress as well as a foreign key to the Address table to maintain the
relationship with rows of Address.
Although the "natural vs. surrogate" debate is one of the great religious issues within the data community,
the fact is that neither strategy is perfect and you'll discover that in practice (as we see in Figure 6)
sometimes it makes sense to use natural keys and sometimes it makes sense to use surrogate keys. In
Choosing a Primary Key: Natural or Surrogate? I describe the relevant issues in detail.
Normalize to Reduce Data Redundancy
Data normalization is a process in which data attributes within a data model are organized to increase the
cohesion of entity types. In other words, the goal of data normalization is to reduce and even eliminate data
redundancy, an important consideration for application developers because it is incredibly difficult to stores
objects in a relational database that maintains the same information in several places. Table 2 summarizes
the three most common normalization rules describing how to put entity types into a series of increasing
levels of normalization. Higher levels of data normalization (Date 2000) are beyond the scope of this book.
With respect to terminology, a data schema is considered to be at the level of normalization of its least
normalized entity type. For example, if all of your entity types are at second normal form (2NF) or higher
then we say that your data schema is at 2NF.
Table 2. Data Normalization Rules.
Level Rule
First normal form (1NF) An entity type is in 1NF when it contains no repeating groups of data.
Second normal form (2NF) An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are
fully dependent on its primary key.
Third normal form (3NF) An entity type is in 3NF when it is in 2NF and when all of its attributes are directly
dependent on the primary key.
depicts a database schema in ONF whereas Figure 8 depicts a normalized schema in 3NF. Read the
Introduction to Data Normalization essay for details.
Why data normalization? The advantage of having a highly normalized data schema is that information is
stored in one place and one place only, reducing the possibility of inconsistent data. Furthermore, highly-
normalized data schemas in general are closer conceptually to object-oriented schemas because the object-
oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at
least from a data point of view). This generally makes it easier to map your objects to your data schema.
Unfortunately, normalization usually comes at a performance cost. With the data schema of Figure 7 all the
data for a single order is stored in one row (assuming orders of up to nine order items), making it very easy
to access. With the data schema of Figure 7 you could quickly determine the total amount of an order by
reading the single row from the Order0NF table. To do so with the data schema of Figure 8 you would need
to read data from a row in the Order table, data from all the rows from the OrderItem table for that order and
data from the corresponding rows in the Item table for each order item. For this query, the data schema of
Figure 7 very likely provides better performance.
An Initial Data Schema for Order (UML Notation).
Page 159
2010(Fall)
Normalized schema in 3NF (UML Notation).
Page 160
2010(Fall)
In class modeling, there is a similar concept called Class Normalization although that is beyond the scope of
this article.
Denormalize to Improve Performance
Normalized data schemas, when put into production, often suffer from performance problems. This makes
sense – the rules of data normalization focus on reducing data redundancy, not on improving performance of
data access. An important part of data modeling is to denormalize portions of your data schema to improve
database access times. For example, the data model of Figure 9 looks nothing like the normalized schema
of Figure 8. To understand why the differences between the schemas exist you must consider the
performance needs of the application. The primary goal of this system is to process new orders from online
customers as quickly as possible. To do this customers need to be able to search for items and add them to
their order quickly, remove items from their order if need be, then have their final order totaled and recorded
quickly. The secondary goal of the system is to the process, ship, and bills the orders afterwards.
A Denormalized Order Data Schema (UML notation).
Page 161
2010(Fall)
To denormalize the data schema the following decisions were made:
1. To support quick searching of item information the Item table was left alone.
2. To support the addition and removal of order items to an order the concept of an OrderItem table
was kept, albeit split in two to support outstanding orders and fulfilled orders. New order items can
easily be inserted into the OutstandingOrderItem table, or removed from it, as needed.
Page 162
2010(Fall)
3. To support order processing the Order and OrderItem tables were reworked into pairs to handle
outstanding and fulfilled orders respectively. Basic order information is first stored in the
OutstandingOrder and OutstandingOrderItem tables and then when the order has been shipped and
paid for the data is then removed from those tables and copied into the FulfilledOrder and
FulfilledOrderItem tables respectively. Data access time to the two tables for outstanding orders is
reduced because only the active orders are being stored there. On average an order may be
outstanding for a couple of days, whereas for financial reporting reasons may be stored in the
fulfilled order tables for several years until archived. There is a performance penalty under this
scheme because of the need to delete outstanding orders and then resave them as fulfilled orders,
clearly something that would need to be processed as a transaction.
4. The contact information for the person(s) the order is being shipped and billed to was also
denormalized back into the Order table, reducing the time it takes to write an order to the database
because there is now one write instead of two or three. The retrieval and deletion times for that data
would also be similarly improved.
Note that if your initial, normalized data design meets the performance needs of your application then it is
fine as is. Denormalization should be resorted to only when performance testing shows that you have a
problem with your objects and subsequent profiling reveals that you need to improve database access time.
As my grandfather said, if it ain’t broke don’t fix it.
Evolutionary/Agile Data Modeling
Evolutionary data modeling is data modeling performed in an iterative and incremental manner. The article
Evolutionary Development explores evolutionary software development in greater detail. Agile data
modeling is evolutionary data modeling done in a collaborative manner. The article Agile Data Modeling:
From Domain Modeling to Physical Modeling works through a case study which shows how to take an agile
approach to data modeling.
Although you wouldn’t think it, data modeling can be one of the most challenging tasks that an Agile DBA
can be involved with on an agile software development project. Your approach to data modeling will often be
at the center of any controversy between the agile software developers and the traditional data professionals
within your organization. Agile software developers will lean towards an evolutionary approach where data
modeling is just one of many activities whereas traditional data professionals will often lean towards a big
design up front (BDUF) approach where data models are the primary artifacts, if not THE artifacts. This
problem results from a combination of the cultural impedance mismatch, a misguided need to enforce the
"one truth", and “normal” political maneuvering within your organization. As a result Agile DBAs often find
that navigating the political waters is an important part of their data modeling efforts.
Q4. Discuss the categories in which data is divided before structuring it into data ware
house?
Ans. Data Warehouse Testing Categories
Categories of Data Warehouse testing includes different stages of the process. The testing is done on
individual and end to end basis.
Good part of the testing of data warehouse testing can be linked to 'Data Warehouse Quality Assurance'.
Data Warehouse Testing will include the following chapters:
Page 163
2010(Fall)
Extraction Testing
This testing checks the following:
• Data is able to extract the required fields.

• The Extraction logic for each source system is working
• Extraction scripts are granted security access to the source systems.
• Updating of extract audit log and time stamping is happening.
• Source to Extraction destination is working in terms of completeness and accuracy.
• Extraction is getting completed with in the expected window.
Transformation Testing
• Transaction scripts are transforming the data as per the expected logic.
• The one time Transformation for historical snap-shots are working.
• Detailed and aggregated data sets are created and are matching.
• Transaction Audit Log and time stamping is happening.
• There is no pilferage of data during Transformation process.
• Transformation is getting completed with in the given window
Loading Testing
• There is no pilferage during the Loading process.

• Any Transformations during Loading process is working.
• Data sets in staging to Loading destination is working.
• One time historical snap-shots are working.
• Both incremental and total refresh are working.
• Loading is happening with in the expected window.
End User Browsing and OLAP Testing
• The Business views and dashboard are displaying the data as expected.
• The scheduled reports are accurate and complete.
• The scheduled reports and other batch operations like view refresh etc. is happening in the expected
window.
• 'Analysis Functions' and 'Data Analysis' are working.
• There is no pilferage of data between the source systems and the views.
Ad-hoc Query Testing
• Ad-hoc queries creation is as per the expected functionalities.

• Ad-hoc queries output response time is as expected.
Down Stream Flow Testing

• Data is extracted from the data warehouse and updated in the down-stream systems/data marts.
• There is no pilferage.
One Time Population testing
• The one time ETL for the production data is working

• The production reports and the data warehouse reports are matching
• T he time taken for one time processing will be manageable within the conversion weekend.
End-to-End Integrated Testing
• End to end data flow from the source system to the down stream system is complete and accurate.
Page 164
2010(Fall)
Stress and volume Testing
This part of testing will involve, placing maximum volume OR failure points to check the robustness and
capacity of the system. The level of stress testing depends upon the configuration of the test environment
and the level of capacity planning done. Here are some examples from the ideal world:
• Server shutdown during batch process.

• Extraction, Transformation and Loading with two to three times of maximum possible imagined data
(for which the capacity is planned)
• Having 2 to 3 times more users placing large numbers of ad-hoc queries.
• Running large number of scheduled reports.
Parallel Testing
Parallel testing is done where the Data Warehouse is run on the production data as it would have done in
real life and its outputs are compared with the existing set of reports to ensure that they are in synch OR
have the explained mismatches.
Q5. Discuss the purpose of executive information system in an organization?
Ans: Not a piece of hardware or software, but an infrastructure that supplies to a firm's executives the up-
to-the-minute operational data, gathered and sifted from various databases. The typical information mix
presented to the executive may include financial information, work in process, inventory figures, sales
figures, market trends, industry statistics, and market price of the firm's shares. It may even suggest what
needs to be done, but differs from a decision support system (DSS) in that it is targeted at executives and
not managers.
An Executive Information System (EIS) is a computer-based system intended to facilitate and support the
information and decision making needs of senior executives by providing easy access to both internal and
external information relevant to meeting the strategic goals of the organization. It is commonly considered as
a specialized form of Decision Support System (DSS).
The emphasis of EIS in on graphical displays and easy-to-use user interfaces. They offer strong reporting
and drill-down capabilities. In general, EIS are enterprise-wide DSS that help top-level executives analyze,
compare, and highlight trends in important variables so that they can monitor performance and identify
opportunities and problems. EIS and data warehousing technologies are converging in the marketplace.
Executive Information System (EC-EIS)
Purpose
An executive information system (EIS) provides information about all the factors that influence the
business activities of a company. It combines relevant data from external and internal sources and provides
the user with important current data which can be analyzed quickly.
The EC-Executive Information System (EC-EIS) is a system which is used to collect and evaluate
information from different areas of a business and its environment. Among others, sources of this information
can be the Financial Information System (meaning external accounting and cost accounting), the Human
Resources Information System and the Logistics Information System.
The information provided serves both management and the employees in Accounting.
Implementation Considerations
EC-EIS is the information system for upper management. It is generally suitable for the collection and
evaluation of data from different functional information systems in one uniform view.
Integration
The Executive Information System is based on the same data basis, and has the same data collection
facilities as Business Planning . In EC-EIS you can report on the
data planned in EC-BP.
Features
Page 165
2010(Fall)
When customizing your Executive Information System you set up an individual EIS database for your
business and have this supplied with data from various sub-information systems (Financial Information
System, Human Resources Information System, Logistics Information System, cost accounting, etc.) or with
external data. Since this data is structured heterogeneously, you can structure the data basis into separate
EIS data areas for different business purposes. These data areas are called aspects. You can define various
aspects for your enterprise containing, for example, information on the financial situation, logistics, human
resources, the market situation, and stock prices. For each aspect you can create reports to evaluate the
data. You can either carry out your own basic evaluations in the EIS presentation (reporting) system or
analyze the data using certain report groups created specifically for your requirements. To access the EIS
presentation functions, choose Information systems →EIS.
In this documentation the application functions are described in detail and the customizing functions in brief.
It is intended for the EC-EIS user but also those responsible for managing the system. To access the EC-EIS
application menu, choose Accounting →Enterprise control. →Executive InfoSystem.
To call up the presentation functions from the application menu, choose Environment → Executive menu.
Necessary preliminary tasks and settings are carried out in Customizing. You can find a detailed description
of the customizing functions in the implementation guidelines.
Setting Up the Data Basis

An aspect consists of characteristics and key figures. Characteristics are classification terms such as
division, region, department, or company. A combination of characteristic values for certain characteristics
(such as Division: Pharmaceuticals, Region: Northwest) is called an evaluation object. Key figures are
numerical values such as revenue, fixed costs, variable costs, number of employees and quantity produced.
They also form part of the structure of the aspect.
The key figure data is stored according to the characteristics in an aspect. Besides these key figures stored
in the database, you can also define calculated key figures in EC-EIS and EC-BP. Calculated key figures are
calculated with a formula and the basic key figures of the aspect (for example: CM1 per employee = (sales -
sales deductions – variable costs) / Number of employees).
Determining characteristics and key figures when setting up the system provides the framework for the
possible evaluations. You make these settings in customizing. When the structure of the aspect and the data
basis have been defined in Customizing, you can evaluate data.
Presentation of the Data

Drilldown reporting and the report portfolio help you to evaluate and present your data. You can evaluate EC-
EIS data interactively using drilldown reporting. You a make a selection of the characteristics and key figures
from the data basis. You can analyze many types of variance (plan/actual comparisons, time comparisons,
object comparisons). Drilldown reporting contains easy-to-use functions for navigating through the dataset. In
addition, there are a variety of functions for interactively processing a report (selection conditions,
exceptions, sort, top n and so on). You can also access SAPgraphics and SAPmail and print using Microsoft
Word for Windows and Microsoft Excel.
Drilldown reporting, with its numerous functions, is aimed at trained users, especially financial controllers and
managers. By using the various function levels appropriately, other users can execute reports without
needing extensive training. The reports created in drilldown reporting can be combined for specific user
groups and stored in the graphical report portfolio. The report portfolio is aimed at users with basic
knowledge of the system who wish to access information put together for their specific needs. You can call
up report portfolio reports via a graphical menu. This menu can be set up individually for different user
groups. The navigation function in the report portfolio is limited to scrolling through reports created by the
relevant department.
Q6. Discuss the challenges involved in data integration and coordination process?
Ans:
Simple Data Integration
Page 166
2010(Fall)
The data integration process can often seem overwhelming, and this is often compounded by the vast
number of large-scale, complex, and costly enterprise integration applications available on the market.
MapForce seeks to alleviate this burden with powerful data integration capabilities built into a straightforward
graphical user interface.
MapForce allows you to easily associate target and source data structures using drag and drop functionality.
Advanced data processing filters and functions can be added via a built-in function library, and you can use
the visual function builder to combine multiple inline and/or recursive operations in more complex data
integration scenarios.
Integrating Data from/into Multiple Files
MapForce lets you easily integrate data from multiple files or split data from one file into many. Multiple files
can be specified through support for wildcard characters (e.g., ? or *), a database table, auto-number
sequences, or other methods. This feature is very useful in a wide variety of data integration scenarios; for
example, it may be necessary to integrate data from a file collection or to generate individual XML files for
each main table record in a large database. The screenshot below shows an example in which two files from
a directory are integrated into a single target file.
Page 167
2010(Fall)
As a complement to this feature, MapForce also allows you to use file names as parameters in your data
integration projects. This lets you create dynamic mappings in which this information is defined at run-time.
Re-usable Data Mappings
Whether it is an XML or
database schema, EDI
configuration file, or XBRL
taxonomy and beyond,
MapForce integrates data based
on data structures regardless of
the underlying content. This
means that you can re-use your
data integration mappings again
and again as your business data
changes.
Simply right click the data

structure and choose Properties
to access the component
settings dialog to change your
data source and, consequently,
the output of your data
integration project.
If you need to make some

changes to your mapping along
the way - to accommodate for
underlying schema changes, for
instance - MapForce offers a
variety of automation features
that help ease this process. For
example, when you re-map a parent element, you will be asked if you would like to automatically reassign
child elements or any other descendent connections accordingly.
Page 168
2010(Fall)
Data integration output is created on-the-fly, and can be viewed at any time by simply clicking the Output tab
in the design pane.
Automated Data Integration
For XML mappings, MapForce automatically generates data integration code on-the-fly in XSLT 1.0/2.0 or
XQuery, based on your selection.
Page 169
2010(Fall)
MapForce data mappings can also be fully automated through the generation of royalty-free data integration
application code in Java, C#, or C++. This enables you to implement scheduled or event-triggered data
integration/migration operations for inclusion in any reporting, e-commerce, or SOA-based applications.
Page 170
2010(Fall)
MapForce data integration operations can also be automated via data integration API, ActiveX control, or the
command line.
Full integration with the Visual Studio and Eclipse IDEs helps developers use MapForce data integration
functionality as part of large-scale enterprise projects, without the hefty price tag.
Legacy Data Integration
As technology rapidly advances in the information age, organizations are often left burdened with legacy
data repositories that are no longer supported, making the data difficult to access and impossible to edit in its
native format. Here, MapForce provides the unique FlexText utility for parsing flat file output so that it can
easily be integrated with any other target structure.
FlexText enables you to create reusable legacy data integration templates for mapping flat files to modern
data formats like XML, databases, Excel 2007+, XBRL, Web services, and more.
In addition, legacy data formats like EDI can easily be integrated with modern accounting systems like ERP
and relational databases, or even translated to modern formats like XML.
Data Coordination Process:
1. A data coordination method of coordinating data between a source application program and a destination
application program in an information processing terminal, the information processing terminal including a
data storage unit configured to store therein data, a virus pattern file describing characteristics of a computer
virus, and a data string pattern file describing a detecting data string; and an applications storage unit that
stores therein a plurality of application programs each capable of creating data and storing the data in the
data storage unit, and a virus detection program configured to detect a virus contained in the data created by
any one of the application programs based on the virus pattern file before storing the data in the data storage
unit, the application programs including a source application program that creates a specific data and a
Page 171
2010(Fall)
destination application program that makes use of the specific data, the data coordination method
comprising: executing the virus detection program whereby the virus detection program looks for a data
string in the specific data based on the detecting data string in the data string pattern file in the data storage
unit and extracts the data string if such a data string is present in the specific data; and notifying the data
string extracted by the virus detection program at the executing and path information that specifies path of
the specific data as data coordination information to the destination application program.
2. The data coordination method according to claim 1, further comprising creating and storing the data string
pattern file in the storage unit.
3. The data coordination method according to claim 1, wherein the virus pattern file and the data string
pattern file being separate files.
4. The data coordination method according to claim 1, wherein the data string pattern file includes pattern
information for detecting a data string relating to any one of date information and position information or both
included in the data created by any one of the application programs.
5. The data coordination method according to claim 1, wherein the executing includes looking for a data
string in the specific data each time the destination application program requests the virus detection program
to detect a virus contained in the data.
string in the specific data each time the destination application program is activated.
string in the specific data at a timing specified by a user.
8. The data coordination method according to claim 1, wherein the destination application program is a
schedule management program that manages schedule by using at least one of calendar information and
map information.
9. The data coordination method according to claim 8, wherein the data is e-mail data, and the schedule
management program extracts e-mail data from the data storage unit based on the storage destination
information, and handles extracted e-mail data in association with the calendar information based on the
date information.
10. The data coordination method according to claim 8, wherein the data is image data, and the schedule
management program extracts image data from the data storage unit based on the storage destination
information, and handles extracted image data in association with the calendar information based on the
date information.
11. The data coordination method according to claim 8, wherein the data is e-mail data, and the schedule
management program extracts e-mail data from the data storage unit based on the storage destination
information, and handles extracted e-mail data in association with the map information.
12. The data coordination method according to claim 8, wherein the data is image data, and the schedule
management program extracts image data from the data storage unit based on the storage destination
information, and handles extracted image data in association with the map information.
13. A computer-readable recording medium that stores therein a computer program that implements on a
computer a data coordination method of coordinating data between a source application program and a
destination application program in an information processing terminal, the information processing terminal
including a data storage unit configured to store therein data, a virus pattern file describing characteristics of
a computer virus, and a data string pattern file describing a detecting data string; and an applications storage
Page 172
2010(Fall)
unit that stores therein a plurality of application programs each capable of creating data and storing the data
in the data storage unit, and a virus detection program configured to detect a virus contained in the data
created by any one of the application programs based on the virus pattern file before storing the data in the
data storage unit, the application programs including a source application program that creates a specific
data and a destination application program that makes use of the specific data, the computer program
causing the computer to execute: executing the virus detection program whereby the virus detection program
looks for a data string in the specific data based on the detecting data string in the data string pattern file in
the data storage unit and extracts the data string if such a data string is present in the specific data; and
notifying the data string extracted by the virus detection program at the executing and path information that
specifies path of the specific data as data coordination information to the destination application program.
14. An information processing terminal comprising: a data storage unit configured to store therein data, a
virus pattern file describing characteristics of a computer virus, and a data string pattern file describing a
detecting data string; an applications storage unit that stores therein a plurality of application programs each
capable of creating data and storing the data in the data storage unit, and a virus detection program
configured to detect a virus contained in the data created by any one of the application programs based on
the virus pattern file before storing the data in the data storage unit, the application programs including a
source application program that creates a specific data and a destination application program that makes
use of the specific data; an executing unit that executes the virus detection program whereby the virus
detection program looks for a data string in the specific data based on the detecting data string in the data
string pattern file in the data storage unit and extracts the data string if such a data string is present in the
specific data; and a notifying unit that notifies the data string extracted by the virus detection program and
path information that specifies path of the specific data as data coordination information to the destination
application program.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a technology for coordinating data between various application programs
that handle storage of data in a storage unit and a virus detection program that detects a virus contained in
the data based on a virus pattern file describing characteristics of viruses.
2. Description of the Related Art
Mobile phones are becoming multifunctional. Most mobile phones now have an e-mail function, a web page
browsing function, a music reproducing function, and a photograph function.
In a mobile phone, data files are generally stored in a storage device within the mobile phone. Along with the
increase of the functions of mobile phones, types and number of the data files that need to be stored have
increased. As a result, there is a need to efficiently group and manage the data files. Various techniques
have been proposed to achieve this. One approach includes adding unique classification information to data
files. Another approach includes using a data file name to facilitate classification.
For example, Japanese Patent Application Laid-Open (JP-A) No. 2003-134454 discloses a technique that
appends a date on which image data was photographed to be included in a data file indicative of the image
data. On the contrary, JP-A No. 2001-34632 discloses a technique that appends the name of the place
where image data was photographed to be included in a data file indicative of the image data.
Thus, the techniques disclosed in JP-A Nos. 2003-134454 and 2001-34632 include appending unique
information to data files. However, some application programs (AP) cannot handle such data files that are
Page 173
2010(Fall)
appended with additional information. In other words, there is a limitation on where the conventional
techniques can be employed.
Many APs (for example, an e-mail AP, a camera AP, and a web browser AP) are installed in a multifunctional
mobile phone, and generally, file formats that are handled by these APs are not the same. Accordingly,
although some APs can handle the data files that are appended with additional information, others cannot.
For example, when a user of a mobile phone wishes to rearrange data files stored in a predetermined
directory by the camera AP, it is necessary to shift the data file by creating a new directory. However, such
an operation is troublesome, and is not convenient for the user.
Data files such as e-mail data files, music files, web page data files, image data files that are handled by
mobile phones are in conformity with a standardized format. That is, these data files already include
information such as the date when the file is created.
One approach could be to create a search program that can extract information, such as date, from these
data files, and install the program to mobile phones. However, creation of a new search program increases
the costs.
Therefore, there is a need of a technique that can easily and at low costs coordinate data files among
various APs. This issue is not particularly limited to the mobile phones, and applies likewise to information
processing terminals such as a personal digital assistant (PDA)
Assignment (Set-2)
Business intelligence & Tools
Q.1 Explain business development life cycle in detail?
Ans: The Systems Development Life Cycle (SDLC), or Software Development Life Cycle in systems
engineering, information systems and software engineering, is the process of creating or altering systems,
and the models and methodologies that people use to develop these systems. The concept generally refers
to computer or information systems.
In software engineering the SDLC concept underpins many kinds of software development methodologies.
These methodologies form the framework for planning and controlling the creation of an information system:
the software development process.
ystems Development Life Cycle (SDLC) is a process used by a systems analyst to develop an information
system, including requirements, validation, training, and user (stakeholder) ownership. Any SDLC should
result in a high quality system that meets or exceeds customer expectations, reaches completion within time
and cost estimates, works effectively and efficiently in the current and planned Information Technology
infrastructure, and is inexpensive to maintain and cost-effective to enhance.[2]
Computer systems are complex and often (especially with the recent rise of Service-Oriented Architecture)
link multiple traditional systems potentially supplied by different software vendors. To manage this level of
Page 174
2010(Fall)
complexity, a number of SDLC models have been created: "waterfall"; "fountain"; "spiral"; "build and fix";
"rapid prototyping"; "incremental"; and "synchronize and stabilize". [3]
SDLC models can be described along a spectrum of agile to iterative to sequential. Agile methodologies,
such as XP and Scrum, focus on light-weight processes which allow for rapid changes along the
development cycle. Iterative methodologies, such as Rational Unified Process and Dynamic Systems
Development Method, focus on limited project scopes and expanding or improving products by multiple
iterations. Sequential or big-design-upfront (BDUF) models, such as Waterfall, focus on complete and correct
planning to guide large projects and risks to successful and predictable results [citation needed]. Other models, such
as Anamorphic Development, tend to focus on a form of development that is guided by project scope and
adaptive iterations of feature development.
In project management a project can be defined both with a project life cycle (PLC) and an SDLC, during
which slightly different activities occur. According to Taylor (2004) "the project life cycle encompasses all the
activities of the project, while the systems development life cycle focuses on realizing the product
requirements".[4]
Systems development phases
The System Development Life Cycle framework provides system designers and developers to follow a
sequence of activities. It consists of a set of steps or phases in which each phase of the SDLC uses the
results of the previous one.
A Systems Development Life Cycle (SDLC) adheres to important phases that are essential for developers,
such as planning, analysis, design, and implementation, and are explained in the section below. A number of
system development life cycle (SDLC) models have been created: waterfall, fountain, spiral, build and fix,
rapid prototyping, incremental, and synchronize and stabilize. The oldest of these, and the best known, is the
waterfall model: a sequence of stages in which the output of each stage becomes the input for the next.
These stages can be characterized and divided up in different ways, including the following[6]:
• Project planning, feasibility study: Establishes a high-level view of the intended project and
determines its goals.
• Systems analysis, requirements definition: Refines project goals into defined functions and
operation of the intended application. Analyzes end-user information needs.
• Systems design: Describes desired features and operations in detail, including screen layouts,
business rules, process diagrams, pseudocode and other documentation.
• Implementation: The real code is written here.
• Integration and testing: Brings all the pieces together into a special testing environment, then
checks for errors, bugs and interoperability.
• Acceptance, installation, deployment: The final stage of initial development, where the software is
put into production and runs actual business.
• Maintenance: What happens during the rest of the software's life: changes, correction, additions,
moves to a different computing platform and more. This, the least glamorous and perhaps most
important step of all, goes on seemingly forever.
In the following example (see picture) these stage of the Systems Development Life Cycle are divided in ten
steps from definition to creation and modification of IT work products:
Systems development life cycle topics
Page 175
2010(Fall)
Management and control
SDLC Phases Related to Management Controls.
The Systems Development Life Cycle (SDLC) phases serve as a programmatic guide to project activity and
provide a flexible but consistent way to conduct projects to a depth matching the scope of the project. Each
of the SDLC phase objectives are described in this section with key deliverables, a description of
recommended tasks, and a summary of related control objectives for effective management. It is critical for
the project manager to establish and monitor control objectives during each SDLC phase while executing
projects. Control objectives help to provide a clear statement of the desired result or purpose and should be
used throughout the entire SDLC process. Control objectives can be grouped into major categories
(Domains), and relate to the SDLC phases as shown in the figure.
To manage and control any SDLC initiative, each project will be required to establish some degree of a Work
Breakdown Structure (WBS) to capture and schedule the work necessary to complete the project. The WBS
and all programmatic material should be kept in the “Project Description” section of the project notebook. The
WBS format is mostly left to the project manager to establish in a way that best describes the project work.
There are some key areas that must be defined in the WBS as part of the SDLC policy. The following
diagram describes three key areas that will be addressed in the WBS in a manner established by the project
manager.[8]
Work breakdown structured organization
Work Breakdown Structure.
Page 176
2010(Fall)
The upper section of the Work Breakdown Structure (WBS) should identify the major phases and milestones
of the project in a summary fashion. In addition, the upper section should provide an overview of the full
scope and timeline of the project and will be part of the initial project description effort leading to project
approval. The middle section of the WBS is based on the seven Systems Development Life Cycle (SDLC)
phases as a guide for WBS task development. The WBS elements should consist of milestones and “tasks”
as opposed to “activities” and have a definitive period (usually two weeks or more). Each task must have a
measurable output (e.g. document, decision, or analysis). A WBS task may rely on one or more activities
(e.g. software engineering, systems engineering) and may require close coordination with other tasks, either
internal or external to the project. Any part of the project needing support from contractors should have a
Statement of work (SOW) written to include the appropriate tasks from the SDLC phases. The development
of a SOW does not occur during a specific phase of SDLC but is developed to include the work from the
SDLC process that may be conducted by external resources such as contractors and struct.
Q.2 Discuss the various components of data ware house?
Ans: The data warehouse architecture is based on a relational database management system server that
functions as the central repository for informational data. Operational data and processing is completely
separated from data warehouse processing. This central information repository is surrounded by a number of
key components designed to make the entire environment functional, manageable and accessible by both
the operational systems that source data into the warehouse and by end-user query and analysis tools.
Typically, the source data for the warehouse is coming from the operational applications. As the data enters
the warehouse, it is cleaned up and transformed into an integrated structure and format. The transformation
process may involve conversion, summarization, filtering and condensation of data. Because the data
contains a historical component, the warehouse must be capable of holding and managing large volumes of
data as well as different data structures for the same database over time.
The next sections look at the seven major components of data warehousing:
Data Warehouse Database
The central data warehouse database is the cornerstone of the data warehousing environment. This
database is almost always implemented on the relational database management system (RDBMS)
technology. However, this kind of implementation is often constrained by the fact that traditional RDBMS
products are optimized for transactional database processing. Certain data warehouse attributes, such as
very large database size, ad hoc query processing and the need for flexible user view creation including
aggregates, multi-table joins and drill-downs, have become drivers for different technological approaches to
the data warehouse database. These approaches include:
• Parallel relational database designs for scalability that include shared-memory, shared disk, or
shared-nothing models implemented on various multiprocessor configurations (symmetric
multiprocessors or SMP, massively parallel processors or MPP, and/or clusters of uni- or
multiprocessors).
• An innovative approach to speed up a traditional RDBMS by using new index structures to bypass
relational table scans.
• Multidimensional databases (MDDBs) that are based on proprietary database technology;
conversely, a dimensional data model can be implemented using a familiar RDBMS. Multi-
dimensional databases are designed to overcome any limitations placed on the warehouse by the
nature of the relational data model. MDDBs enable on-line analytical processing (OLAP) tools that
architecturally belong to a group of data warehousing components jointly categorized as the data
query, reporting, analysis and mining tools.
Sourcing, Acquisition, Cleanup and Transformation Tools
Page 177
2010(Fall)
A significant portion of the implementation effort is spent extracting data from operational systems and
putting it in a format suitable for informational applications that run off the data warehouse.
The data sourcing, cleanup, transformation and migration tools perform all of the conversions,
summarizations, key changes, structural changes and condensations needed to transform disparate data
into information that can be used by the decision support tool. They produce the programs and control
statements, including the COBOL programs, MVS job-control language (JCL), UNIX scripts, and SQL data
definition language (DDL) needed to move data into the data warehouse for multiple operational systems.
These tools also maintain the meta data. The functionality includes:
• Removing unwanted data from operational databases

• Converting to common data names and definitions
• Establishing defaults for missing data
• Accommodating source data definition changes
The data sourcing, cleanup, extract, transformation and migration tools have to deal with some significant
issues including:
• Database heterogeneity. DBMSs are very different in data models, data access language, data
navigation, operations, concurrency, integrity, recovery etc.
• Data heterogeneity. This is the difference in the way data is defined and used in different models -
homonyms, synonyms, unit compatibility (U.S. vs metric), different attributes for the same entity and
different ways of modeling the same fact.
These tools can save a considerable amount of time and effort. However, significant shortcomings do exist.
For example, many available tools are generally useful for simpler data extracts. Frequently, customized
extract routines need to be developed for the more complicated data extraction procedures.
Meta data
Meta data is data about data that describes the data warehouse. It is used for building, maintaining,
managing and using the data warehouse. Meta data can be classified into:
• Technical meta data, which contains information about warehouse data for use by warehouse
designers and administrators when carrying out warehouse development and management tasks.
• Business meta data, which contains information that gives users an easy-to-understand perspective
of the information stored in the data warehouse.
Equally important, meta data provides interactive access to users to help understand content and find data.
One of the issues dealing with meta data relates to the fact that many data extraction tool capabilities to
gather meta data remain fairly immature. Therefore, there is often the need to create a meta data interface
for users, which may involve some duplication of effort.
Meta data management is provided via a meta data repository and accompanying software. Meta data
repository management software, which typically runs on a workstation, can be used to map the source data
to the target database; generate code for data transformations; integrate and transform the data; and control
moving data to the warehouse.
As user's interactions with the data warehouse increase, their approaches to reviewing the results of their
requests for information can be expected to evolve from relatively simple manual analysis for trends and
exceptions to agent-driven initiation of the analysis based on user-defined thresholds. The definition of these
thresholds, configuration parameters for the software agents using them, and the information directory
indicating where the appropriate sources for the information can be found are all stored in the meta data
repository as well.
Access Tools
Page 178
2010(Fall)
The principal purpose of data warehousing is to provide information to business users for strategic decision-
making. These users interact with the data warehouse using front-end tools. Many of these tools require an
information specialist, although many end users develop expertise in the tools. Tools fall into four main
categories: query and reporting tools, application development tools, online analytical processing tools, and
data mining tools.
Query and Reporting tools can be divided into two groups: reporting tools and managed query tools.
Reporting tools can be further divided into production reporting tools and report writers. Production reporting
tools let companies generate regular operational reports or support high-volume batch jobs such as
calculating and printing paychecks. Report writers, on the other hand, are inexpensive desktop tools
designed for end-users.
Managed query tools shield end users from the complexities of SQL and database structures by inserting a
metalayer between users and the database. These tools are designed for easy-to-use, point-and-click
operations that either accept SQL or generate SQL database queries.
Often, the analytical needs of the data warehouse user community exceed the built-in capabilities of query
and reporting tools. In these cases, organizations will often rely on the tried-and-true approach of in-house
application development using graphical development environments such as PowerBuilder, Visual Basic and
Forte. These application development platforms integrate well with popular OLAP tools and access all major
database systems including Oracle, Sybase, and Informix.
OLAP tools are based on the concepts of dimensional data models and corresponding databases, and allow
users to analyze the data using elaborate, multidimensional views. Typical business applications include
product performance and profitability, effectiveness of a sales program or marketing campaign, sales
forecasting and capacity planning. These tools assume that the data is organized in a multidimensional
model. A critical success factor for any business today is the ability to use information effectively. Data
mining is the process of discovering meaningful new correlations, patterns and trends by digging into large
amounts of data stored in the warehouse using artificial intelligence, statistical and mathematical techniques.
Data Marts
The concept of a data mart is causing a lot of excitement and attracts much attention in the data warehouse
industry. Mostly, data marts are presented as an alternative to a data warehouse that takes significantly less
time and money to build. However, the term data mart means different things to different people. A rigorous
definition of this term is a data store that is subsidiary to a data warehouse of integrated data. The data mart
is directed at a partition of data (often called a subject area) that is created for the use of a dedicated group
of users. A data mart might, in fact, be a set of denormalized, summarized, or aggregated data. Sometimes,
such a set could be placed on the data warehouse rather than a physically separate store of data. In most
instances, however, the data mart is a physically separate store of data and is resident on separate database
server, often a local area network serving a dedicated user group. Sometimes the data mart simply
comprises relational OLAP technology which creates highly denormalized dimensional model (e.g., star
schema) implemented on a relational database. The resulting hypercubes of data are used for analysis by
groups of users with a common interest in a limited portion of the database.
These types of data marts, called dependent data marts because their data is sourced from the data
warehouse, have a high value because no matter how they are deployed and how many different enabling
technologies are used, different users are all accessing the information views derived from the single
integrated version of the data.
Unfortunately, the misleading statements about the simplicity and low cost of data marts sometimes result in
organizations or vendors incorrectly positioning them as an alternative to the data warehouse. This viewpoint
defines independent data marts that in fact, represent fragmented point solutions to a range of business
problems in the enterprise. This type of implementation should be rarely deployed in the context of an overall
technology or applications architecture. Indeed, it is missing the ingredient that is at the heart of the data
warehousing concept -- that of data integration. Each independent data mart makes its own assumptions
about how to consolidate the data, and the data across several data marts may not be consistent.
Page 179
2010(Fall)
Moreover, the concept of an independent data mart is dangerous -- as soon as the first data mart is created,
other organizations, groups, and subject areas within the enterprise embark on the task of building their own
data marts. As a result, you create an environment where multiple operational systems feed multiple non-
integrated data marts that are often overlapping in data content, job scheduling, connectivity and
management. In other words, you have transformed a complex many-to-one problem of building a data
warehouse from operational and external data sources to a many-to-many sourcing and management
nightmare.
Data Warehouse Administration and Management
Data warehouses tend to be as much as 4 times as large as related operational databases, reaching
terabytes in size depending on how much history needs to be saved. They are not synchronized in real time
to the associated operational data but are updated as often as once a day if the application requires it.
In addition, almost all data warehouse products include gateways to transparently access multiple enterprise
data sources without having to rewrite applications to interpret and utilize the data. Furthermore, in a
heterogeneous data warehouse environment, the various databases reside on disparate systems, thus
requiring inter-networking tools. The need to manage this environment is obvious.
Managing data warehouses includes security and priority management; monitoring updates from the multiple
sources; data quality checks; managing and updating meta data; auditing and reporting data warehouse
usage and status; purging data; replicating, subsetting and distributing data; backup and recovery and data
warehouse storage management.
Information Delivery System
The information delivery component is used to enable the process of subscribing for data warehouse
information and having it delivered to one or more destinations according to some user-specified scheduling
algorithm. In other words, the information delivery system distributes warehouse-stored data and other
information objects to other data warehouses and end-user products such as spreadsheets and local
databases. Delivery of information may be based on time of day or on the completion of an external event.
The rationale for the delivery systems component is based on the fact that once the data warehouse is
installed and operational, its users don't have to be aware of its location and maintenance. All they need is
the report or an analytical view of data at a specific point in time. With the proliferation of the Internet and the
World Wide Web such a delivery system may leverage the convenience of the Internet by delivering
warehouse-enabled information to thousands of end-users via the ubiquitous world wide network.
In fact, the Web is changing the data warehousing landscape since at the very high level the goals of both
the Web and data warehousing are the same: easy access to information. The value of data warehousing is
maximized when the right information gets into the hands of those individuals who need it, where they need it
and they need it most. However, many corporations have struggled with complex client/server systems to
give end users the access they need. The issues become even more difficult to resolve when the users are
physically remote from the data warehouse location. The Web removes a lot of these issues by giving users
universal and relatively inexpensive access to data. Couple this access with the ability to deliver required
information on demand and the result is a web-enabled information delivery system that allows users
dispersed across continents to perform a sophisticated business-critical analysis and to engage in collective
decision-making.
Page 180
2010(Fall)
Q.3 Discuss data extraction process? What are the various methods being used for data
extraction?
Ans: Data extract: Data extract is the output of the data extraction process, a very important aspect of
data warehouse implementation.
A data warehouse gathers data from several sources and utilizes these data to serve as vital information for
the company. These data will be used to spot patterns and trends both in the business operations as well as
in industry standards.
Since the data coming to the data warehouse may come from different source which commonly are of
disparate systems resulting in different data formats, a data warehouse uses three processes to make use of
the data. These processes are extraction, transformation and loading (ETL).
Data extraction is a process that involves retrieval of all format and types of data out of unstructured of badly
structured data sources. These data will be further used for processing or data migration. Raw data is usually
imported into an intermediate extracting system before being processed for data transformation where they
will possibly be padded with meta data before being exported to another stage in the data warehouse work
flow. The term data extraction is often applied when experimental data is first imported into a computer
server from the primary sources such as recording or measuring devices.
During the process of data extraction in a data warehouse, data may be removed from the system source or
a copy may be made with the original data being retained in the source system. It is also practiced in some
Page 181
2010(Fall)
data extraction implementation to move historical data that accumulates in the operational system to a data
warehouse in order to maintain performance and efficiency.
Data extracts are loaded into the staging area of a relational database which for future manipulation in the
ETL methodology.
The data extraction process in general is performed within the source system itself. This is can be most
appropriate if the extraction is added to a relational database. Some database professionals implement data
extraction using extraction logic in the data warehouse staging area and query the source system for data
using applications programming interface (API).
Data extraction is a complex process but there are various software applications that have been developed
to handle this process.
Some generic extraction applications can be found free on the internet. A CD extraction software can create
digital copies of audio CDs on the hard drive. There also email extraction tools which can extract email
addresses from different websites including results from Google searches. These emails can be exported to
text, html or XML formats.
Another data extracting tool is a web data or link extractor which can extra URLs, meta tags (like keywords,
title and descriptions), body texts, email addresses, phone and fax numbers and many other data from a
website.
There is a wide array of data extracting tools. Some are used for individual purposes such as extracting data
for entertainment while some are used for big projects like data warehousing.
Since data warehouses need to do other processes and not just extracting alone, database managers or
programmers usually write programs that repetitively checks on many different sites or new data updates.
This way, the code just sits in one area of the data warehouse sensing new updates from the data sources.
Whenever an new data is detected, the program automatically does its function to update and transfer the
data to the ETL process.
Three common methods for data extraction
Probably the most common technique used traditionally to do this is to cook up some regular expressions
that match the pieces you want (e.g., URL’s and link titles). Our screen-scraper software actually started out
as an application written in Perl for this very reason. In addition to regular expressions, you might also use
some code written in something like Java or Active Server Pages to parse out larger chunks of text. Using
raw regular expressions to pull out the data can be a little intimidating to the uninitiated, and can get a bit
messy when a script contains a lot of them. At the same time, if you’re already familiar with regular
expressions, and your scraping project is relatively small, they can be a great solution.
Other techniques for getting the data out can get very sophisticated as algorithms that make use of artificial
intelligence and such are applied to the page. Some programs will actually analyze the semantic content of
an HTML page, then intelligently pull out the pieces that are of interest. Still other approaches deal with
developing “ontologies“, or hierarchical vocabularies intended to represent the content domain.
There are a number of companies (including our own) that offer commercial applications specifically intended
to do screen-scraping. The applications vary quite a bit, but for medium to large-sized projects they’re often a
good solution. Each one will have its own learning curve, so you should plan on taking time to learn the ins
and outs of a new application. Especially if you plan on doing a fair amount of screen-scraping it’s probably a
good idea to at least shop around for a screen-scraping application, as it will likely save you time and money
in the long run.
So what’s the best approach to data extraction? It really depends on what your needs are, and what
resources you have at your disposal. Here are some of the pros and cons of the various approaches, as well
as suggestions on when you might use each one:
Page 182
2010(Fall)
Raw regular expressions and code
Advantages:
• If you’re already familiar with regular expressions and at least one programming language, this can
be a quick solution.
• Regular expressions allow for a fair amount of “fuzziness” in the matching such that minor changes
to the content won’t break them.
• You likely don’t need to learn any new languages or tools (again, assuming you’re already familiar
with regular expressions and a programming language).
• Regular expressions are supported in almost all modern programming languages. Heck, even
VBScript has a regular expression engine. It’s also nice because the various regular expression
implementations don’t vary too significantly in their syntax.
Disadvantages:
• They can be complex for those that don’t have a lot of experience with them. Learning regular
expressions isn’t like going from Perl to Java. It’s more like going from Perl to XSLT, where you have
to wrap your mind around a completely different way of viewing the problem.
• They’re often confusing to analyze. Take a look through some of the regular expressions people
have created to match something as simple as an email address and you’ll see what I mean.
• If the content you’re trying to match changes (e.g., they change the web page by adding a new “font”
tag) you’ll likely need to update your regular expressions to account for the change.
• The data discovery portion of the process (traversing various web pages to get to the page
containing the data you want) will still need to be handled, and can get fairly complex if you need to
deal with cookies and such.
When to use this approach: You’ll most likely use straight regular expressions in screen-scraping when you
have a small job you want to get done quickly. Especially if you already know regular expressions, there’s no
sense in getting into other tools if all you need to do is pull some news headlines off of a site.
Ontologies and artificial intelligence
Advantages:
• You create it once and it can more or less extract the data from any page within the content domain
you’re targeting.
• The data model is generally built in. For example, if you’re extracting data about cars from web sites
the extraction engine already knows what the make, model, and price are, so it can easily map them
to existing data structures (e.g., insert the data into the correct locations in your database).
• There is relatively little long-term maintenance required. As web sites change you likely will need to
do very little to your extraction engine in order to account for the changes.
Disadvantages:
• It’s relatively complex to create and work with such an engine. The level of expertise required to
even understand an extraction engine that uses artificial intelligence and ontologies is much higher
than what is required to deal with regular expressions.
• These types of engines are expensive to build. There are commercial offerings that will give you the
basis for doing this type of data extraction, but you still need to configure them to work with the
specific content domain you’re targeting.
• You still have to deal with the data discovery portion of the process, which may not fit as well with
this approach (meaning you may have to create an entirely separate engine to handle data
discovery). Data discovery is the process of crawling web sites such that you arrive at the pages
where you want to extract data.
Page 183
2010(Fall)
When to use this approach: Typically you’ll only get into ontologies and artificial intelligence when you’re
planning on extracting information from a very large number of sources. It also makes sense to do this when
the data you’re trying to extract is in a very unstructured format (e.g., newspaper classified ads). In cases
where the data is very structured (meaning there are clear labels identifying the various data fields), it may
make more sense to go with regular expressions or a screen-scraping application.
Screen-scraping software
Advantages:
• Abstracts most of the complicated stuff away. You can do some pretty sophisticated things in most
screen-scraping applications without knowing anything about regular expressions, HTTP, or cookies.
• Dramatically reduces the amount of time required to set up a site to be scraped. Once you learn a
particular screen-scraping application the amount of time it requires to scrape sites vs. other
methods is significantly lowered.
• Support from a commercial company. If you run into trouble while using a commercial screen-
scraping application, chances are there are support forums and help lines where you can get
assistance.
Disadvantages:
• The learning curve. Each screen-scraping application has its own way of going about things. This
may imply learning a new scripting language in addition to familiarizing yourself with how the core
application works.
• A potential cost. Most ready-to-go screen-scraping applications are commercial, so you’ll likely be
paying in dollars as well as time for this solution.
• A proprietary approach. Any time you use a proprietary application to solve a computing problem
(and proprietary is obviously a matter of degree) you’re locking yourself into using that approach.
This may or may not be a big deal, but you should at least consider how well the application you’re
using will integrate with other software applications you currently have. For example, once the
screen-scraping application has extracted the data how easy is it for you to get to that data from your
own code?
When to use this approach: Screen-scraping applications vary widely in their ease-of-use, price, and
suitability to tackle a broad range of scenarios. Chances are, though, that if you don’t mind paying a bit, you
can save yourself a significant amount of time by using one. If you’re doing a quick scrape of a single page
you can use just about any language with regular expressions. If you want to extract data from hundreds of
web sites that are all formatted differently you’re probably better off investing in a complex system that uses
ontologies and/or artificial intelligence. For just about everything else, though, you may want to consider
investing in an application specifically designed for screen-scraping.
As an aside, I thought I should also mention a recent project we’ve been involved with that has actually
required a hybrid approach of two of the aforementioned methods. We’re currently working on a project that
deals with extracting newspaper classified ads. The data in classifieds is about as unstructured as you can
get. For example, in a real estate ad the term “number of bedrooms” can be written about 25 different ways.
The data extraction portion of the process is one that lends itself well to an ontologies-based approach,
which is what we’ve done. However, we still had to handle the data discovery portion. We decided to use
screen-scraper for that, and it’s handling it just great. The basic process is that screen-scraper traverses the
various pages of the site, pulling out raw chunks of data that constitute the classified ads. These ads then get
passed to code we’ve written that uses ontologies in order to extract out the individual pieces we’re after.
Once the data has been extracted we then insert it into a database.
Q.4 Discuss the needs of developing OLAP tools in details?
Ans: OLAP
Page 184
2010(Fall)
Short for Online Analytical Processing, a category of software tools that provides analysis of data stored in a
database. OLAP tools enable users to analyze different dimensions of multidimensional data. For example, it
provides time series and trend analysis views. OLAP often is used in data mining.
The chief component of OLAP is the OLAP server, which sits between a client and a database management
systems (DBMS). The OLAP server understands how data is organized in the database and has special
functions for analyzing the data. There are OLAP servers available for nearly all the major database
systems.
The first commercial multidimensional (OLAP) products appeared approximately 30 years ago (Express).
When Edgar Codd introduced the OLAP definition in his 1993 white paper, there were already dozens of
OLAP products for client/server and desktop/file server environments. Usually those products were
expensive, proprietary, standalone systems afforded only by large corporations, and performed only OLAP
functions.
After Codd's research appeared, the software industry began appreciating OLAP functionality and many
companies have integrated OLAP features into their products (RDBMS, integrated business intelligence
suites, reporting tools, portals, etc.). In addition, for the last decade, pure OLAP tools have considerably
improved and become cheaper and more user-friendly.
These developments brought OLAP functionality to a much broader range of users and organizations. Now
OLAP is used not only for strategic decision-making in large corporations, but also to make daily tactical
decisions about how to better streamline business operations in organizations of all sizes and shapes.
However, the acceptance of OLAP is far from maxi mized. For example, one year ago, The OLAP Survey 2
found that only thirty percent of its participants actually used OLAP.
General purpose tools with OLAP capabilities
Organizations do not want to use pure OLAP tools or integrated business intelligence suites for different
reasons. But many of organizations may want to use OLAP capabilities integrated into popular general
purpose applications development tools which they already use. In this case, the organizations do not need
to buy and deploy new software products, train staff to use them or hire new people.
There is another argument for creating general purpose tools with OLAP capabilities. End users work with
the information they need via applications. The effectiveness of this work depends very much on the number
of applications (and the interfaces, data formats, etc. associated with them). So it is very desirable to reduce
the number of applications (ideally to one application). General purpose tools with OLAP capabilities allow us
to reach the goal. In other words, there's no need to use separate applications based on pure OLAP tools.
The advantages of such an approach to developers and end users are clear, and Microsoft and Oracle have
recognized this. Both corporations have steadily integrated OLAP into their RDBMSs and general purpose
database application development tools.
Microsoft provides SQL Server to handle a relational view of data and the Analysis Services OLAP engine to
handle a multidimensional cube view of data. Analysis Services provides the OLE DB for OLAP API and the
MDX language for processing multidimensional cubes, which can be physically stored in relational tables or
a multidimensional store. Microsoft Excel and Microsoft Office both provide access to Analysis Services data.
Oracle has finally incorporated the Express OLAP engine into the Oracle9i Database Enterprise Edition
Release 2 (OLAP Option). Multidimensional cubes are stored in analytical workspaces, which are managed
in an Oracle database using an abstract data type. The existing Oracle tools such as PL/SQL, Oracle
Reports, Oracle Discoverer and Oracle BI Beans can query and analyze analytical workspaces. The OLAP
API is Java-based and supports a rich OLAP manipulation language, which can be considered to be the
multidimensional equivalent of Oracle PL/SQL.
Page 185
2010(Fall)
Granulated OLAP
There is yet another type of OLAP tool, different from pure OLAP tools and general purpose tools with OLAP
capabilities: OLAP components. It seems that this sort of OLAP is not as appreciated.
The OLAP component is the minimal and elementary tool (granula) for developers to embed OLAP
functionality in applications. So we can say that OLAP components are granulated OLAP.
Each OLAP component is used within some application development environment. At present, almost all
known OLAP components are ActiveX or VCL ones.
All OLAP components are divided into two classes: OLAP components without an OLAP engine (MOLAP
components) and OLAP components with this engine (ROLAP components).
The OLAP components without OLAP engines allow an application to access existing multidimensional
cubes on MOLAP server that performs a required operation and returns results to the application. At present
all the OLAP components of this kind are designed for access to MS Analytical Services (more precisely,
they use OLE DB for OLAP) and were developed by Microsoft partners (Knosys, Matrix, etc.).
The ROLAP components (with OLAP engine) are of much greater interest than the MOLAP ones and
general purpose tools with OLAP capabilities because they allow you to create flexible and efficient
applications that cannot be developed with other tools.
The ROLAP component with an OLAP engine has three interfaces:
• A data access mechanism interface through which it gets access to data sources
• APIs through which a developer uses a language like MDX to define data access and processing to
build a multidimensional table (cube). The developer manages properties and behavior of the
component so it fully conforms to the application in which the component is embedded
• End-user GUI that can pivot, filter, drill down and drill up data and generate numbers of views from a
multidimensional table (cube).
As a rule, ROLAP components are used in client-side applications, so the OLAP engine functions on a client
PC. Data access mechanisms like BDE (Borland Database Engines) or ADO.NET allow you to get source
data from relational tables and flat files of an enterprise. PC performance nowadays allows the best ROLAP
components to quickly process hundreds of thousands or even millions of records from these data sources,
and dynamically build multidimensional cubes and perform operations with them. So very effective ROLAP
and DOLAP are realized -- and in many cases, they are more preferable than MOLAP.
For example, the low price and simplicity of using a ROLAP component is the obvious (and possibly the only)
choice for developers to create mass-deployed small and cheap applications, especially single-user DOLAP
applications. Another field in which these components may be preferable is real-time analytical applications
(no need to create and maintain MOLAP server, load cubes).
The most widely-known OLAP component is the Microsoft Pivot Table, which has an OLAP engine and
access to MS Analytical Services so the component is both MOLAP and ROLAP. Another well-known OLAP
(ROLAP) component is DesicionCube of Borland corporation.
Some ROLAP components have the ability to store dynamically-built multidimensional cubes which are
usually named microcubes. This feature deserves attention from applications architects and designers
because it allows them to develop flexible and cheap applications like enterprise-wide distributed corporate
reporting system or Web-based applications.
For example, the ContourCube component stores a microcube with all associated metadata (in fact, it is a
container of an analytical application like an Excel workbook) in compressed form (from 10 up to 100 times).
Page 186
2010(Fall)
So this microcube is optimized for use on the Internet and can be transferred through HTTP and FTP
protocols, and via e-mail. An end user with ContourCube is able to fully analyze the microcube. InterSoft
Lab, developer of ContourCube component, has also developed several additional tools to facilitate the
development, deployment and use of such distributed applications.
At present, there is a broad range of pure OLAP tools, general purpose tools with OLAP capabilities and
OLAP components. To make the best choice, developers must understand the benefits and disadvantages
of all sorts of OLAP.
Q.5 what do you understand by the term statistical analysis? Discuss the most important
statistical techniques?
Ans: Developments in the field of statistical data analysis often parallel or follow advancements in other
fields to which statistical methods are fruitfully applied. Because practitioners of the statistical analysis often
address particular applied decision problems, methods developments is consequently motivated by the
search to a better decision making under uncertainties.
Decision making process under uncertainty is largely based on application of statistical data analysis for
probabilistic risk assessment of your decision. Managers need to understand variation for two key reasons.
First, so that they can lead others to apply statistical thinking in day to day activities and secondly, to apply
the concept for the purpose of continuous improvement. This course will provide you with hands-on
experience to promote the use of statistical thinking and techniques to apply them to make educated
decisions whenever there is variation in business data. Therefore, it is a course in statistical thinking via a
data-oriented approach.
Statistical models are currently used in various fields of business and science. However, the terminology
differs from field to field. For example, the fitting of models to data, called calibration, history matching, and
data assimilation, are all synonymous with parameter estimation.
Your organization database contains a wealth of information, yet the decision technology group members tap
a fraction of it. Employees waste time scouring multiple sources for a database. The decision-makers are
frustrated because they cannot get business-critical data exactly when they need it. Therefore, too many
decisions are based on guesswork, not facts. Many opportunities are also missed, if they are even noticed at
all.
Knowledge is what we know well. Information is the communication of knowledge. In every knowledge
exchange, there is a sender and a receiver. The sender make common what is private, does the informing,
the communicating. Information can be classified as explicit and tacit forms. The explicit information can be
explained in structured form, while tacit information is inconsistent and fuzzy to explain. Know that data are
only crude information and not knowledge by themselves.
Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is:
from Data to Information, from Information to Facts, and finally, from Facts to Knowledge. Data
becomes information, when it becomes relevant to your decision problem. Information becomes fact, when
the data can support it. Facts are what the data reveals. However the decisive instrumental (i.e., applied)
knowledge is expressed together with some statistical degree of confidence.
Fact becomes knowledge, when it is used in the successful completion of a decision process. Once you
have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same
sense that mankind with writing is superhuman compared to mankind before writing. The following figure
illustrates the statistical thinking process based on data in constructing statistical models for decision making
under uncertainties.
Page 187
2010(Fall)
The above figure depicts the fact that as the exactness of a statistical model increases, the level of
improvements in decision-making increases. That's why we need statistical data analysis. Statistical data
analysis arose from the need to place knowledge on a systematic evidence base. This required a study of
the laws of probability, the development of measures of data properties and relationships, and so on.
Statistical inference aims at determining whether any statistical significance can be attached that results after
due allowance is made for any random variation as a source of error. Intelligent and critical inferences
cannot be made by those who do not understand the purpose, the conditions, and applicability of the various
techniques for judging significance.
Considering the uncertain environment, the chance that "good decisions" are made increases with the
availability of "good information." The chance that "good information" is available increases with the level of
structuring the process of Knowledge Management. The above figure also illustrates the fact that as the
exactness of a statistical model increases, the level of improvements in decision-making increases.
Knowledge is more than knowing something technical. Knowledge needs wisdom. Wisdom is the power to
put our time and our knowledge to the proper use. Wisdom comes with age and experience. Wisdom is the
accurate application of accurate knowledge and its key component is to knowing the limits of your
knowledge. Wisdom is about knowing how something technical can be best used to meet the needs of the
decision-maker. Wisdom, for example, creates statistical software that is useful, rather than technically
brilliant. For example, ever since the Web entered the popular consciousness, observers have noted that it
puts information at your fingertips but tends to keep wisdom out of reach.
Almost every professionals need a statistical toolkit. Statistical skills enable you to intelligently collect,
analyze and interpret data relevant to their decision-making. Statistical concepts enable us to solve problems
in a diversity of contexts. Statistical thinking enables you to add substance to your decisions.
The appearance of computer software, JavaScript Applets, Statistical Demonstrations Applets, and Online
Computation are the most important events in the process of teaching and learning concepts in model-based
statistical decision making courses. These tools allow you to construct numerical examples to understand the
concepts, and to find their significance for yourself.
We will apply the basic concepts and methods of statistics you've already learned in the previous statistics
course to the real world problems. The course is tailored to meet your needs in the statistical business-data
analysis using widely available commercial statistical computer packages such as SAS and SPSS. By doing
this, you will inevitably find yourself asking questions about the data and the method proposed, and you will
have the means at your disposal to settle these questions to your own satisfaction. Accordingly, all the
applications problems are borrowed from business and economics. By the end of this course you'll be able to
think statistically while performing any data analysis.
There are two general views of teaching/learning statistics: Greater and Lesser Statistics. Greater statistics is
everything related to learning from data, from the first planning or collection, to the last presentation or report.
Lesser statistics is the body of statistical methodology. This is a Greater Statistics course.
Page 188
2010(Fall)
There are basically two kinds of "statistics" courses. The real kind shows you how to make sense out of data.
These courses would include all the recent developments and all share a deep respect for data and truth.
The imitation kind involves plugging numbers into statistics formulas. The emphasis is on doing the
arithmetic correctly. These courses generally have no interest in data or truth, and the problems are
generally arithmetic exercises. If a certain assumption is needed to justify a procedure, they will simply tell
you to "assume the ... are normally distributed" -- no matter how unlikely that might be. It seems like you all
are suffering from an overdose of the latter. This course will bring out the joy of statistics in you.
Statistics is a science assisting you to make decisions under uncertainties (based on some numerical
and measurable scales). Decision making process must be based on data neither on personal opinion nor on
belief.
It is already an accepted fact that "Statistical thinking will one day be as necessary for efficient citizenship as
the ability to read and write." So, let us be ahead of our time.
Page 189
2010(Fall)
Analysis Of Variance
An important technique for analyzing the effect of categorical factors on a response is to perform an
Analysis of Variance. An ANOVA decomposes the variability in the response variable amongst the
different factors. Depending upon the type of analysis, it may be important to determine: (a) which
factors have a significant effect on the response, and/or (b) how much of the variability in the
response variable is attributable to each factor.
STATGRAPHICS Centurion provides several procedures for performing an analysis of variance:
1. One-Way ANOVA - used when there is only a single categorical factor. This is equivalent to
comparing multiple groups of data.
2. Multifactor ANOVA - used when there is more than one categorical factor, arranged in a crossed
pattern. When factors are crossed, the levels of one factor appear at more than one level of the other
factors.
3. Variance Components Analysis - used when there are multiple factors, arranged in a hierarchical
manner. In such a design, each factor is nested in the factor above it.
4. General Linear Models - used whenever there are both crossed and nested factors, when some
factors are fixed and some are random, and when both categorical and quantitative factors are
present.
One-Way ANOVA
A one-way analysis of variance is used

when the data are divided into groups
according to only one factor. The questions
of interest are usually: (a) Is there a
significant difference between the groups?,
and (b) If so, which groups are significantly
different from which others? Statistical tests
are provided to compare group means,
group medians, and group standard
deviations. When comparing means,
multiple range tests are used, the most
popular of which is Tukey's HSD procedure.
For equal size samples, significant
group differences can be determined by
examining the means plot and
identifying those intervals that do not
overlap.
Multifactor ANOVA
When more than one factor is present

and the factors are crossed, a
multifactor ANOVA is appropriate. Both
main effects and interactions between
the factors may be estimated. The
output includes an ANOVA table and a
new graphical ANOVA from the latest
edition of Statistics for Experimenters
by Box, Hunter and Hunter (Wiley,
2005). In a graphical ANOVA, the points
are scaled so that any levels that differ
Page 190
2010(Fall)
by more than exhibited in the distribution of the residuals are significantly different.
Variance Components Analysis
A Variance Components Analysis is

most commonly used to determine the
level at which variability is being
introduced into a product. A typical
experiment might select several
batches, several samples from each
batch, and then run replicates tests on
each sample. The goal is to determine
the relative percentages of the overall
process variability that is being
introduced at each level.
General Linear Model
The General Linear Models procedure is used whenever the above procedures are not appropriate.
It can be used for models with both crossed and nested factors, models in which one or more of the
variables is random rather than fixed, and when quantitative factors are to be combined with
categorical ones. Designs that can be analyzed with the GLM procedure include partially nested
designs, repeated measures experiments, split plots, and many others. For example, pages 536-540
of the book Design and Analysis of Experiments (sixth edition) by Douglas Montgomery (Wiley,
2005) contains an example of an experimental design with both crossed and nested factors. For that
data, the GLM procedure produces several important tables, including estimates of the variance
components for the random factors.
Analysis of Variance for Assembly Time
Source Sum of Df Mean Square F-Ratio P-Value

Squares
Model 243.7 23 10.59 4.54 0.0002
Residual 56.0 24 2.333
Total (Corr.) 299.7 47
Type III Sums of Squares
Source Sum of Df Mean Square F-Ratio P-Value

Squares
Layout 4.083 1 4.083 0.34 0.5807
Operator(Layout) 71.92 6 11.99 2.18 0.1174
Fixture 82.79 2 41.4 7.55 0.0076
Layout*Fixture 19.04 2 9.521 1.74 0.2178
Fixture*Operator(Layout) 65.83 12 5.486 2.35 0.0360
Residual 56.0 24 2.333
Total (corrected) 299.7 47
Page 191
2010(Fall)
Expected Mean Squares
Source EMS
Layout (6)+2.0(5)+6.0(2)+Q1
Operator(Layout) (6)+2.0(5)+6.0(2)
Fixture (6)+2.0(5)+Q2
Layout*Fixture (6)+2.0(5)+Q3
Fixture*Operator(Layout) (6)+2.0(5)
Residual (6)
Variance Components
Source Estimate
Operator(Layout) 1.083
Fixture*Operator(Layout) 1.576
Residual 2.333
Page 192

MBA 3rd Sem Assignment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MBA 3rd Sem Assignment

Uploaded by

Copyright:

Available Formats

By-V.K.

Saini MBA(IS) Semester- 3

• How did the universe begin?

• What are protons, neutrons, and electrons composed of?

• How do slime molds reproduce?

• What is the specific genetic code of the fruit fly?

For example, applied researchers may investigate ways to:

• Improve agricultural crop production

• Treat or cure a specific disease

• Improve the energy efficiency of homes, offices, or modes of transportation

Exploratory research is not typically generalizable to the population at large.

These are symbolically represented as:

Alternative To be read as follows

We can divide non-probability sampling methods into two broad types:

Accidental, Haphazard or Convenience Sampling

• Modal Instance Sampling

Ans.: Primary Sources of Data

The advantages of primary data are –

The disadvantages are –

Methods of Data Collection:

A survey involves the following steps -

7 Processing of data and tabulation

There are four basic survey methods, which include:

3 Telephone Interviewing Telephone interviewing is a non-personal method of data collection. It may be

43. Advance information: Advance information can be provided to potential respondents by a

Ans.: Title: Newspaper reading choices

Q 1.Discuss the relative advantages and disadvantages of the different methods of

Advantages and Disadvantages:

The advantages of Questionnaire are:

The major limitations or problems of Questionnaire method are:

Ans.: Measures of Central tendency:

Table 1: Number of touchdown passes

Student Dataset 1 Dataset 2 Dataset 3

Student Dataset 1 Dataset 2 Dataset 3

Table 2: Three possible datasets for the 5-point make-up quiz

Table 3: Grouped frequency distribution

There are other measures of dispersion:

Sources of statistical dispersion

Minimizing Threats to Validity

2. By Measurement or Observation. In some cases it will be possible to rule out a threat by

Case Study Writing Ideas

• Observation must cover a sufficient number of representative samples of the cases.

• Recording should be accurate and complete.

o It is easier to conduct disguised observation studies than disguised questioning.

• Competency/criteria based interviews - These are structured to reflect the competencies or

Specific types of interview

The Screening Interview

Some tips for maintaining confidence during screening interviews:

• Highlight your accomplishments and qualifications.

The Informational Interview

The Directive Style

Either way, remember:

• Flex with the interviewer, following his or her lead.

The Meandering Style

o What is this all about?

A review can take many forms; for example:

o An historical survey of theory and research in your field

As with all academic writing, a literature review needs:

o 'Table 2 shows that sixteen of the twenty subjects responded positively.'

When using published data, you would say:

4. Using appropriate language

Grammar tips - practical and helpful