Professional Documents
Culture Documents
Statistical Computing
1 / 17
Outline
Introduction Robust Regression Time Series Analysis in Intensive Care Statistics and Computer Science Using the Toolkits of Computer Science Problem transformation Example
Statistical Computing
2 / 17
Robust Regression
Denition (Donoho and Huber(1983))
The (nite sample) breakdown point is the smallest fraction of data points that need to be changed to have an unbounded eect on the estimate.
LS
LQD 0 1950
1955
1960 Year
1965
1970
Statistical Computing
3 / 17
1000
2000
3000 time
4000
5000
Time series data is monitored online, e.g. in intensive care Regression techniques have to be applied to a moving time window Robust regression may be used to reduce the eect of outliers Need for fast oine and online algorithms
Thorsten Bernholt, Robin Nunkesser Fast Algorithms for Robust Regression Statistical Computing 4 / 17
Denition
An algorithmic problem consists of a description of the set of allowable inputs and a description of a function that maps each allowable input to a non-empty set of correct outputs (answers, results). Computational Geometry is a related eld of research The geometric avour of statistics becomes apparent when a sample is regarded as a set of points in Euclidean space. Searching nearest neighbours may be reformulated to compute the Hodges-Lehmann estimator and an estimator of scale. It is often useful to consider underlying decision problems
Statistical Computing
5 / 17
Decision Problems
Denition
A decision problem is an algorithmic problem where the set of outputs is restricted to Yes and No. Obvious decision problems are: Is the optimal value of the objective function better than x? Is the local solution y the global solution?
Statistical Computing
6 / 17
primal space
3 2
dual space
2 y -3 -1.5 -2 -1 0 1 3
-3
-2
-1
-1.5
-1
-0.5
0 x
0.5
1.5
-1
-0.5
0 x
0.5
1.5
Statistical Computing
7 / 17
Statistical Computing
8 / 17
The problem has O(n4 ) possible solutions Original running time O(n5 log n) We achieve O(n2 log n)
Statistical Computing
9 / 17
lines
In this arrangement, we search the lowest point (, r ) with n + h 2 2 subjacent or intersecting lines. equals the slope of the LQD t, r equals the minimised order statistic.
v
0.2 0.6 0 0.2
0.4
0.7
0.8
0.9
1.1
1.2
1.3
1.4
Statistical Computing
10 / 17
The dual problem is equivalent to two problems from Computational Geometry: Minimum k-level point k-violation linear programming
Statistical Computing
11 / 17
Theoretical superior algorithms are often hard to implement or even impractical Our own algorithms achieve slightly inferior theoretical running times The framework of the algorithms:
1
n 2
Search for the optimal solution with the help of the underlying decision problem Output the solution
Statistical Computing
12 / 17
Compute all intersections of the lines with this height Sift through the sorted intersections and update the number of subjacent lines accordingly If the number equals decide YES
n 2
v
0.2
0.4
h 2
0.2 0.6
0.7
0.8
0.9
1.1
1.2
1.3
1.4
Statistical Computing
13 / 17
Stopping Criterion:
Search until no intersection remains between the lower and the upper bound.
Label the lines according to their intersection with the upper horizontal line. Interpret the intersections with the lower horizontal line as a permutation of these labels (e.g. (8, 1, 5, 2, 10, 3, 7, 4, 9, 6, 11)). Calculate the number of inversions of the permutation, e.g. with merge sort.
8 5 10 1 6
239
11 4
Summary
Results from Computational Geometry are applicable for problems from Statistics. Solving dual or equivalent problems may lead to superior running times. Results from Statistics may also help computer scientists, e.g. in the analysis of running times.
Statistical Computing
16 / 17
Thank you!
Statistical Computing
17 / 17
Bibliography
Chan, T. M., 1999. Geometric applications of a randomized optimization technique. Discrete and Computational Geometry 22 (4), 547567. Cole, R., Sharir, M., Yap, C. K., 1987. On k-hulls and related problems. SIAM J. Comput. 16 (1), 6177. Croux, C., Rousseeuw, P. J., Hssjer, O., 1994. Generalized S-estimators. J. Amer. Statist. Assoc. 89, 12711281. Donoho, D., Huber, P., 1983. The notion of breakdown point. In: Bickel, P., Doksum, K., Hodges, J. J. (Eds.), A Festschrift for Erich L. Lehmann. Wadsworth, pp. 157184. Roos, T., Widmayer, P., 1994. k-violation linear programming. Inf. Process. Lett. 52 (2), 109114.
Statistical Computing
18 / 17