Professional Documents
Culture Documents
Workshop Agenda
Welcome Where does decision uncertainty come from? You cant find the answer if you dont know the question! Beware of statistics bearing assumptions The cure for sampling dilemmas: use emerging best practices
10June2008 Triad Investigations: New Approaches and Innovative Strategies 3
Instructors
Deana Crumbling, crumbling.deana@epa.gov Office of Superfund Remediation &Technology Innovation U.S. Environmental Protection Agency Washington, D.C. (703) 603-0643 Robert Johnson, rlj@anl.gov Environmental Science Division Argonne National Laboratory Argonne, Illinois (630) 252-7004
10June2008 Triad Investigations: New Approaches and Innovative Strategies 4
10June2008
10June2008
Module 2
As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know. Donald Rumsfeld, Feb. 12, 2002, Department of Defense news briefing
10June2008 Triad Investigations: New Approaches and Innovative Strategies 8
Decision Quality Only as Good as the Weakest Link in the Data Quality Chain
Sampling
Sampling Design SubSampling
Analysis
Extract Cleanup Method Result Reporting
Interpretive
Sample Support
Sample Preservation
Determinative Method
Each link represents a variable contributing toward the quality of the analytical result. All links in the data quality chain must be intact for data to be of decision-making quality!
10June2008 Triad Investigations: New Approaches and Innovative Strategies 10
GC
23.4567 ppm
11
10June2008
13
Between 3/8 and 4-mesh Between 4- and 10-mesh Between 10- and 50-mesh Between 50- and 200-mesh Less than 200-mesh Bulk Total
10June2008
The decision determines representativeness Triad Investigations: New Approaches and Innovative Strategies
Sample Prep
2g
5g
1g 10 g 100 g
39 5 1
3 136 ppm
5 27,700 ppm
10June2008
4 42,800 ppm
17
2 ft
Uncertainty Math Magnifies Weakest Links Effects in Data Quality Chain Uncertainties add according to (a2 + b2 = c2)
Analytical Uncertainty Total Uncertainty
We cant control the effects of uncertainty on our decisions if we dont know where it is coming from. Historically sampling programs have focused resources on the wrong sources of data uncertainty.
10June2008
21
Module 3
You cant find the answer if you dont know the question!
10June2008
23
10June2008
24
The decision driving sample collection: Can it be shown that atmospheric 10June2008 Triad Investigations: New Approaches and Innovative Strategies deposition caused contamination?
25
10June2008
Triad Investigations: New adapted Approachesfrom and Innovative Strategies Graphic Columbia Technologies
26
400
500 500
600 600
10June2008
27
10June2008
28
Different sample supports different concentration estimates that are all correct but lead to different conclusions
10June2008
30
10June2008
31
Sample support critical, yet currently determined by convenience or whim of samplers & analysts. If so, data quality being left to chance!!
10June2008 Triad Investigations: New Approaches and Innovative Strategies 32
10June2008
33
Module 4
10June2008
35
Wed prefer to ignore statistics when they tell something we dont want to hear
10June2008
36
Representativeness Assumed
SAP says: Representative samples will be collected. But provides no explanation of how or what the samples are supposed to represent. Non-representative data decision errors
Sample support mismatched to cleanup criteria (single grab vs. area average) Samples with different supports mixed together in databases & statistical analysis Use spatially clustered locations or biased samples when calculating average concentrations Mix different populations together
10June2008 Triad Investigations: New Approaches and Innovative Strategies 38
Frequency
0.020 0.010
0.060
0.000 Concentration (ppm)
Frequency
100
200
300
400
600
0.080
100
200
300
400
500
Concentration (ppm)
Frequency
Lognormal seen with small sample supports & when data from different supports are mixed together 10June2008 Triad Investigations: New Approaches and Innovative Strategies
0 100 200 300
400
500
600
Concentration (ppm)
39
EU#1
Dump
Exposure Unit #2: Mix 2 populations (cleaner area & dump) into the same sampling design & data set
10June2008
EU#2
40
10June2008
41
10June2008
44
425
mean
500 525
AL 95%UCL
Need more samples if want to make confident decision about risk or compliance: redo project
10June2008 Triad Investigations: New Approaches and Innovative Strategies 45
10June2008
46
Frequency
0.020
0.070
0.010
0.060
0.000 Concentration (ppm)
Frequency
100
200
300
400
600
0.080
100
200
300
400
500
Concentration (ppm)
Frequency
0.050 0.040 0.030 0.020 0.010 0.000 0 100 200 300 400 500 600
10June2008
Concentration (ppm)
47
650
mean
700
AL
725
95%UCL
Recognize that these input values will be different for different contaminants on the same site
Different field concs, different ALs VSP may predict 10 samples for Zn, 500 for PAHs, 1000 for Hg
10June2008 Triad Investigations: New Approaches and Innovative Strategies 48
Characterize or verify cleanup? Statistical confidence desired How close to each other are the true mean & AL? How much variability is present in the soil concentrations
200 ppm
100 ppm
2 5 8
3 10 21
4 36 79
50
10June2008
Fact is, if we knew everything we needed to know in order to design a statistical sampling program correctly, we wouldnt need to do the sampling!!
10June2008
51
Dilemma Resolutions
How can good approximation inputs be chosen?
Data & experience from similar sites Historical data from your site Pilot study (efficient if part of dynamic field work)
There are three kinds of lies: lies, damned lies, and statistics
-- Attributed to Benjamin Disraeli (as popularized by Mark Twain)
Module 5
uncertainty mgt
10June2008
55
Uncertainty mgt
10June2008
56
10June2008
57
Uncertainty mgt
58
10June2008
Planning Systematically
10June2008
Planning Systematically
2 Fundamental Concepts for Sampling Design & Statistics: (1) Decision Unit
Decision Unit: Area, volume, or set of objects (e.g. -acre area, bin of soil, set of drums) All items treated as a single unit for decision-making Statistical goal: discover true mean for that single unit
Amount of variability w/in the unit creates uncertainty in estimating the true mean Therefore, statistics used to express amount of uncertainty around the estimate of the mean
10June2008
Valley of the Drums: These need to be characterized, transported, and disposed properly. What is the decision unit? How do you sample it?
10June2008 Triad Investigations: New Approaches and Innovative Strategies 61
Batch #1
Batch #2
Batch #3
Batch #4
40 drums were cleaned in batches of 20. You need to ensure the cleaning process worked. What is the decision unit and how would you sample it?
10June2008
10June2008
Bin
A well-articulated CSM serves as the point of stakeholder consensus. CSMs are livingas new data become available, incorporate into CSM.
CSM is mature when desired decision confidence is achieved
Triad Investigations: New Approaches and Innovative Strategies 66
10June2008
Statistics cannot be used properly w/o a CSM that defines the statistical populations!!
Triad Investigations: New Approaches and Innovative Strategies 67
10June2008
Improving Representativeness
10June2008
Improving Representativeness
Cleanup Standards for Groundwater and Soil, Interim Final Guidance (State of Maryland, 2001)
no more than 3 adjacent samples allowed
10June2008
Improving Representativeness
10June2008
Improving Representativeness
10June2008
Improving Representativeness
Compositing
Decision Unit 1 Decision Unit 2
MI Sample
Decision Unit 5
sample
Decision Unit 6
10June2008
Improving Representativeness
Multi-Increment Sampling
Effective when cost of analysis is significantly greater than cost of sample acquisition/ handling How many increments?
Practical upper limit imposed by homogenization capacity, background concentration & magnitude of non-background concentration Enough to bring sampling error under control relative to other sources of error
10June2008
73
Uncertainty mgt
10June2008
10June2008
MI sample
Frequency
0.080 0.070 0.060 Frequency 0.050 0.040 0.030 0.020 0.010 0.000 0 100 200 300 400 500 600 Concentration (ppm)
600
500
Physical equivalent of averaging many individual sample results mathematicallyMI sampling creates larger sample supports & tends to normalize statistical data distributions
10June2008 Triad Investigations: New Approaches and Innovative Strategies 76
Improving Representativeness
Not as useful for subsurface & other sampling where sampling costs higher than analytical Requires special design & handling for volatile contaminants (Hg, VOCs, etc.) In situ & other cost-effective high density analyses (like XRF) potentially substitute or augment MIS
Triad Investigations: New Approaches and Innovative Strategies 77
10June2008
Adaptive analytics
Strategies to produce collaborative data sets with sufficient analytical & sampling QC checks
Adaptive sampling
Strategies for confident estimates of DUs mean Strategies for delineating contaminant populations
Adaptive compositing
Efficient strategies for searching for contamination
Triad Investigations: New Approaches and Innovative Strategies 78
10June2008
10June2008
10June2008
10June2008
How probable is it that contamination is present? The less likely it is that contamination is present, the larger the number of samples that can be composited
10June2008
2-sample composite: 55 ppm 3-sample composite: 40 ppm 4-sample composite: 33 ppm 5-sample composite: 28 ppm 6-sample composite: 25 ppm
10June2008
83
10June2008
Color coding for probabilities that 1-ft deep volumes > 250 ppm Pb (actual Pb conc not shown)
Decision plan: Any soil w/ Pr(Pb > 250 ppm) > 40% will be landfilled. 10June2008 Triad Investigations: New Approaches andbe Innovative Strategies 86 Soil with Pr(Pb > 250 ppm) < 40% will reused in new firing berm.
Module 6
10June2008
88
This slimmed down case study illustrates how to determine & control data error in real-time to generate definitive data This project used a handheld X-ray fluorescence (XRF) instrument to measure Pb in minutes at the site of sample collection
Plastic bag of soil
10June2008
89
10June2008
90
Divide each section into 5 equal area subsections The subsections will be sampled by taking 1 grab soil sample (~300 g) per subsection & placing it into a plastic bag for XRF analysis
10June2008 Triad Investigations: New Approaches and Innovative Strategies 91
House Footprint
Area fx = 0.15
Area fx = 0.6
Total yard average determined statistically (& area-weighted) as 10June2008 Triad Investigations: New Approaches and Innovative Strategies 92 410 +/- 25 (385 435 ppm Pb)
Front yard average (at 95% statistical confidence) = 700 +/- 150 (550 850 ppm Pb) Side yard average (at 95% statistical confidence) = 500 +/- 100 (400 600 ppm Pb) Back yard average (at 95% statistical confidence) = 300 +/- 50 (250 350 ppm Pb)
Follow Decision Tree #1 Triad Investigations: New Approaches and Innovative Strategies
93
Decision Tree #1
Evaluate statistical results for the yard & compare to the 500 ppm AL
Is there statistical confidence that mean is below AL? Is there statistical confidence that mean is above AL?
yes
yes
Go to Decision Tree #2
Sect. Bag #1 Shot #1 700 #2 670 #3 740 #4 650 Bag Mean 690 W/in-Bag SD
39
582
54
456
53
810
475
107
65
690
10June2008
582
456
810
475
96
Between-bag error (SD) for and the 5 bag Strategies means = 150 Triad Investigations: New Approaches Innovative
Decision Tree #2
Determine the greater source of data variability (decision uncertainty) Is within-bag variability GREATER than between-bag variability? yes Go to Decision Tree # 3 no, they are ~equal Go to Decision Tree #5
10June2008
Decision Tree #3
Major source of data error: heterogeneity within sample bag (subsampling error)
To control this source of variability: Re-shoot each bag another 4 times (total of 8 shots/bag. Add results to spreadsheet & recalculate stats for whole yard. Examine results.
Is within-bag variability sufficiently reduced? no Take addl corrective action
10June2008
Decision Tree #2
Determine the greater source of data variability (decision uncertainty) Is within-bag variability significantly GREATER than between-bag variability? yes Go to Decision Tree # 3 no, they are ~equal Go to Decision Tree #5 no Is within-bag variability significantly LESS than between-bag variability? yes Go to Decision Tree #4
99
10June2008
Major source of data error is from concentration variations across the yard section area.
To control this source of variability: Collect another 5 bag samples from section area. Analyze 4 times/bag. Add results to spreadsheet & recalculate statistics for whole yard.
Is between-bag variability sufficiently reduced? no Take addl corrective action yes Make decision at 500 ppm AL w/ desired statistical confidence
100
Decision Tree #4
10June2008
Decision Tree #2
Determine the greater source of data variability (decision uncertainty) Is within-bag variability significantly GREATER than between-bag variability? yes Go to Decision Tree # 3 no, they are ~equal Go to Decision Tree #5 no Is within-bag variability significantly LESS than between-bag variability? yes Go to Decision Tree #4
101
10June2008
Decision Tree #5
Concentration variability across yard section & within samples about the same.
Analyze original bags an addl 4 times each. Also collect another 5 bag samples from the section & analyze 8 times each. Add all results to spreadsheet & recalculate statistics for whole yard.
Is statistical decision uncertainty now sufficiently resolved? no Take addl corrective action
10June2008
yes
Make decision at 500 ppm AL w/ desired statistical confidence Triad Investigations: New Approaches and Innovative Strategies 102