You are on page 1of 2

What Do Experimenters Want?

? To determine experimental bias (artifacts) ? To find outliers in expression space


? Marker discovery ? Tumor diagnosis

How Can These Be Determined by Visualization and Data Analysis?


? Determine experimental bias (artifacts)
? Scatter Plots (slope of best-fit line) ? Histograms (position of peak(s))

? Variations in expression
? across experiments ? over time

? Find outliers in expression space


? Scatter Plots ? Histograms

? Interrelationships between genes


? Similarly behaving genes ? Complex relationships (cell function, new pathways) ? Prediction
Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

? Find similarly behaving genes


? Scatter Plots ? Cluster Diagrams

? Variations across experiments/time


? Parallel coordinate planes / Time Series Analysis
Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Statistics Fundamentals

Measures of Central Tendency Mean, Median, and Mode

Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Mode
? The mode is the value that occurs most often in a dataset.

Mode
? A mode is a value that occurs more often than the nearby values. 962 980 975 1005 1042 1005 965 989 987 1005 1033 1030 955 1000 998 786 1005 783 970 999 ? This is a bi-modal distribution:

The mode of this set of data is 1005, since it occurs 4 times

Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Median
? The median is the value that is in the middle when the values are ordered

Mean (arithmetic mean):


? The mean is sum of the measurements divided by the total number of measurements. Quite often when people talk about the averagethey are referring to the mean n

96, 78, 90, 62, 73, 89, 92, 84, 76, 86


x?

i? 1

xi

62, 73, 76, 78, 84, 86, 89, 90, 92, 95 Median = (84 + 86)/2 = 85
Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

55.20, 18.06, 28.16, 44.14, 61.61, 4.88, 180.29, 399.11, 97.47, 56.89, 271.95, 365.29, 807.80, 9.98, 82.73
Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

55.20 + 18.06 + + 82.73 15

Mode vs. median


? The median is the central ? The mode is the most value; 50% of the frequent measurement in measurements lie above the data set it and 50% below it ? There can be MORE than one mode for a data set ? There is ONLY one median for a data set ? It is not influenced by ? It is not influenced by extreme measurement extreme measurements

Mode vs. median


? Where are the modes and medians in the distributions below? ? What happens with the modes and medians when the distribution is not normal?

Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

Major characteristics of each measure of central tendency


The mean is the arithmetic average of the
measurements in a data set

Measures of variability
Percentile: The pth percentile of a set of n measurements arranged in order of magnitude is the value that has p% of the measurements below it and (100 - p)% above it. 60th percentile
Relative frequency 40% above

There is ONLY one mean for a data set Its value is influenced by extreme measurements;
trimming can help reduce the degree of influence

Means of subsets can be combined to determine


the mean of the complete data set

60% below

y
Sorin Draghici BioDiscovery Inc. Wayne State University, 2000 Sorin Draghici BioDiscovery Inc. Wayne State University, 2000

You might also like