You are on page 1of 49

Computing in

Archaeology
Basic Statistics
Week 8 (25/04/07)
© Richard Haddlesey www.medievalarchitecture.net
Aims
 To familiarise ourselves with KEY
statistical terms and their meanings

 To understand the use of stats in


archaeology

 To assign variables, appropriate


levels of measurement, at the
recording level
Key texts
Basic Stats
Batch Post holes

Vari abl es Var iab les


Length, area,
diameter

Case Case Case

Post hole ID
Variables
 Variables are measured according
to one of FOUR levels

 Nominal = arbitrary name


 Ordinal = sequence with no distance
 Interval = sequence with fixed distance
 Ratio = sequence with a fixed datum
Vince NOIR

 Nominal
 Ordinal
 Interval
 Ratio
Nominal examples
 Condition
 Age
 Diameter
 Length
 Context
 Period
Ordinal examples
 Condition
1. Excellent
2. Good
3. Fair
4. Poor

 Here “2” may be between “1” and


“3” but is unlikely to be of equal
distance
Interval examples
 Period
1. Late Bronze (1200-650)
2. Early Iron (649-100)
3. Late Iron (100+)

 Here, if we have 3 artefacts dated


150BC, 300BC and 450BC, although
b may be equal distance between a
and c, c is not twice as old as a.
 This is because there is no datum.
Ratio examples
 Age instead of period
• 1000 ya is twice 500 ya
• 20kg is twice 10kg

 Ratio is the highest level of


measurement because it has a
datum
Mortlake Fengate Grooved
style bowl style bowl ware jar

Nominal, Ordinal and Interval


Note!
 Avoid using 0 or 1 to indicate such
variables as yes or no, as we may
need to know if it is “no” or “no data”

 Also when using presence or absence


you may wish to add “missing” to
avoid confusion
Further distinction
 Nominal and Ordinal
• = categorical
• = qualitative

 Interval and Ratio


•= continuous
•= quantitative
Coding
 Nominal and Ordinal often need coding, to
minimise errors, via a keyword index
 con = context
• str = stray find
• set = settlement
• bur = burial

 Avoid 1,2,3,etc, as you will have to keep


looking up their meanings which is time
consuming
Coding

NOTE!

EVERY DATA VALUE MUST HAVE A


CODE AND ONLY ONE CODE!
Grouping
 Good for periods, as in
• Late Bronze (1200-650)
• Early Iron (649-100)
• Late Iron (100+)

 NOTE: it is better to record as a


continuous variable (i.e. 780BC),
then group as an output (i.e. Late
Bronze)
Good Practice

 Always keep a “CLEAN” version of


the original data set
Exploring the data
example data set
Context FNO Taxon Bone z1 z2 z3 z4 z5 z6 F/U L/R art. sex NISP chop cut m1 m2 m3 m4
269 58 bs mn 0 0 0 0 0 0 - r - - 1 35.9 14.6
722 191 eq sc 1 1 1 1 1 1 f r 2 - 1 78.2 40.7 55.6
722 191 eq sc 1 1 1 1 1 1 f l 2 - 1 78.7 41.4 48.5
371 102 eq sc 1 1 1 1 1 1 f r - - 1 45.0 58.0 52.9
722 191 eq cal 1 1 1 1 1 0 f r 2 - 1 90.6 45.0
722 191 eq mp 1 1 1 0 0 0 f l 2 - 1 41 45.6 40.3 28.7
722 191 eq mp 1 1 1 0 0 0 f r 2 - 1 42 46.0 39.5 29.4
722 191 eq mp 1 1 1 0 0 0 f r 2 - 1 46.0 39.7 28.5
285 72 bs cal 1 1 1 1 1 0 f r - - 1 1 1 137.5 46.3
722 191 eq mp 1 1 1 0 0 0 f l 2 - 1 42 46.3 40.0 29.2
722 191 eq pp 1 1 1 0 0 0 f l 2 - 1 71 48.7 45.0 32.5
722 191 eq pp 1 1 1 0 0 0 f r 2 - 1 71 48.8 45.2 32.5
722 191 eq pp 1 1 1 0 0 0 f r 2 - 1 68 49.0 45.0 34.1
722 191 eq pel 1 1 1 1 1 1 f l 2 - 1 60.1 52.2
722 191 eq ast 1 1 1 1 0 0 - r 2 - 1 51 53 44.9
722 191 eq ast 1 1 1 1 0 0 - l 2 - 1 51 54 44.4 52.7
722 191 eq mciii 1 1 1 1 1 1 f r 2 - 1 187 179 43.7 28.6
722 191 eq mciii 1 1 1 1 1 1 f l 2 - 1 187 180 42.8
722 191 eq mtiii 1 1 1 1 1 1 f l 2 - 1 229 223 41.4 39.1
722 191 eq mtiii 1 1 1 1 1 1 f r 2 - 1 229 223 42.8 39.5
722 191 eq hum 1 1 1 1 1 1 f/f r 2 - 1 232 30.8
722 191 eq rad 1 1 1 1 1 1 f/f l 2 - 1 274 71.7 64.2
univariate frequency table

species frequency

cattle 187

sheep 109

pig 78

horse 21

Total 395
bivariate frequency table

species pits ditches Total

cattle 67 120 187

sheep 63 46 109

pig 41 37 78

horse 3 18 21

Total 174 221 395


bivariate frequency table

species pits ditches Total

cattle 67 39% 120 54% 187

sheep 63 36% 46 21% 109

pig 41 24% 37 17% 78

horse 3 2% 18 8% 21

Total 174 100% 221 100% 395


Multivariate

 These tend to operate on a table, or


matrix of items, described in terms
of a set of variables
Pictorial displays for
categorical data
bar chart
50

45

40

35

30

25
%

20

15

10

0
cattle sheep pig horse
multiple bar chart
60

50

40

pits
%

30
ditches

20

10

0
cattle sheep pig horse
pie chart
Pictorial displays for
continuous data
histogram

Hunt's House
6

4
Count

Monkton
6

4
Count

0
4 9.0 5 0.0 5 1.0 5 2.0 5 3.0 5 4.0 5 5.0 5 6.0 5 7.0 5 8.0 5 9.0 6 0.0 6 1.0 6 2.0 6 3.0 6 4.0 6 5.0 6 6.0 6 7.0 6 8.0 6 9.0 7 0.0 7 1.0 7 2.0

Bd (mm)
Basic descriptive statistics:

• mode
• median
• mean
• range
• variance
• standard deviation
pottery fragments (weights in grams):
2, 2, 3, 5, 8
pottery fragments (weights in grams):
2, 2, 3, 5, 8
Mode = 2
Mode
 Mode is the only way to measure
average/typical in the Nominal class

 If there are two averages then they


are bimodal (1,2,3,3,6,6,7,8,9)

 Three = trimodal, etc.


pottery fragments (weights in grams):
2, 2, 3, 5, 8
Mode = 2

Median = 3
Median

 Best for ordinal and above

 If the number of variables is even,


you make a number between the two
middle numbers

 (1,2,3,4,5,6,7,8 = 4+5/2=4.5)
pottery fragments (weights in grams):
2, 2, 3, 5, 8
Mode = 2

Median = 3

Mean = (2+2+3+5+8)/5 = 4
Mean

 The most commonly used average


and, it will only work for interval
and ratio

 It is the most important measure of


position because a lot of further
statistical analyses are based on it
Conclusion
 It is important to understand that the
mode, median and mean are three quite
different measures of position which can
give three different values when applied to
the same data-set

 2, 2, 3, 5, 8 2, 2, 3, 5, 6, 8

 Mode = 2 2
 Median = 3 4
 Mean = 4 4.333
The skew

symmetrical

Positive skew Negative skew


Measures of variability – the spread
pottery fragments (weights in grams):
2, 2, 3, 5, 8

Range =
max – min
8-2=6

• Very simple and of limited use


variance

key:
pottery fragments (weights in grams):
2, 2, 3, 5, 8
variance (s2)

s2 =
(Mean = 2=2=3=5=8/5=4)

s =
2 (2-4)2 + (2-4)2 + (3-4)2 +(5-4)2 + (8-4)2
5

s2 = 5.2
variance

standard deviation
pottery fragments (weights in grams):
2, 2, 3, 5, 8

variance (s2) = = 5.2

standard deviation =

= (√variance) = √5.2
= 2.28
Summary
 Variables are measured according to
one of FOUR levels

 Nominal = arbitrary name


 Ordinal = sequence with no distance
 Interval = sequence with fixed distance
 Ratio = sequence with a fixed datum
Summary
 Measures of position
(average/typical)
• Mode
• Median
• Mean
• Range
• Variance
• Standard Deviation

You might also like