You are on page 1of 8

15 November 2016

CSV, XLS, SAV are different formats used.


Convert the data into the required format and
operations can be done.
Most widely used format is XL Format.
Compute the following for the data given in the
file.
1.Average and standard deviation of sales for
all companies
2.Sample standard deviation for population
industry-wise.
3.Form a frequency distribution of companies
4.Bivariate /cross-classification table for
company and industry.

Solution procedure:
1.Problem
2.Data
3.program

23 November 2016
Statistical operations
Calculate numerical summaries such as
1.1.1. average
1.1.2. standard deviation and
1.1.3. third quartile for all numerical variables

> numSummary(Dataset[,c("Total", "Unit.Cost",


"Units")], statistics=c("mean", "sd", "cv"),
quantiles=c(0,.25,.5,.75,1))
mean
Total

sd

cv n

456.46233 447.02210 0.9793187 43

Unit.Cost 20.30860 47.34512 2.3312836 43


Units

49.32558 30.07825 0.6097900 43

> numSummary(Dataset[,c("Total", "Unit.Cost",


"Units")], statistics=c("quantiles"),
quantiles=c(.25,.5,.75,1.0))
25%

50%

75%

100% n

Total

144.59 299.40 600.18 1879.06 43

Unit.Cost 3.99 4.99 17.99 275.00 43


Units

27.50 53.00 74.50 96.00 43

Contingency summaries
Two-way table
Frequency table:
Region
Item

Central East West

Binder

Desk

Pen

Pen Set
Pencil

3
4

1
3

0
2

Pearson's Chi-squared test


data: .Table
X-squared = 7.326, df = 8, p-value = 0.5019

The prices of shares of a company on different


days in a month were found to be 66, 65, 69,
70, 69, 71,63,70,64 and 68. Test at 5% level
average share price is Rs.65.

with(company, (t.test(price,
alternative='two.sided', mu=65,
conf.level=.95)))
One Sample t-test
data: price
t = 2.8247, df = 9, p-value = 0.01989
alternative hypothesis: true mean is not equal
to 65
95 percent confidence interval:
65.49785 69.50215
sample estimates:
mean of x
67.5
p<alphareject
p>alphaaccept
therefore hypothesis is rejected.

Performance of Three sales men A, B, C over a


period of time is shown in the following data.

salesm
en
A

quanti
ty
300
400

300

500

600

300

300

400

700

300

400

600

100

50

Test whether average performance of these three


salesmen differ significantly
> AnovaModel.2 <- aov(quantity ~ salesmen,
data=salesmen)
> summary(AnovaModel.2)
Df Sum Sq Mean Sq F value Pr(>F)
salesmen

2 23750 11875 0.319 0.734

Residuals 11 410000 37273


> with(salesmen, numSummary(quantity,
groups=salesmen, statistics=c("mean", "sd")))
mean

sd data:n

A 320 148.3240

B 400 141.4214

C 410 255.9297

30/11
(One-sample) T-test:

From the following 10 observations .test


whether the population mean is 45.
Sample: 16 46 55 41 49 51 50 44 47 42
Sol:

H0: u=45
> with(ttest, (t.test(X,
alternative='two.sided', mu=45,
conf.level=.95)))
One Sample t-test
data: X
t = -0.26464, df = 9, p-value = 0.7972
alternative hypothesis: true mean is not
equal to 45
95 percent confidence interval:
36.40682 51.79318
sample estimates:
mean of x
44.1
Therefore Hypothesis is accepted
i.e..,u=45

Ttest for two independent variables

The sale performance of 8 salesmen are


recorded before and after training.
befor afte
no e
r
1
75
77
2
90 101
3
94
93
4
95
92
5
100 105
6
90
88
7
70
76
8
64
68

with(sales,
(t.test(before,
alternative='two.sided', conf.level=.95,

after,

+ paired=TRUE)))
Paired t-test
data: before and after
t = -1.6503, df = 7, p-value = 0.1429
alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
-6.690337 1.190337
sample estimates:

mean of the differences


-2.75
p-value>0.05
hypothesis accepted.

You might also like