You are on page 1of 24

Reasoning under Uncertainty

Uncertainty
Real world is uncertain and ambiguous
Can never be certain about the state of the world and its domain

True uncertainty: rules are probabilistic in nature


rolling dice, flipping a coin

Laziness: too hard to determine exception-less rules


takes too much work to determine all of the relevant factors too hard to use the enormous rules that result

Theoretical ignorance: don't know all the rules


problem domain has no complete, consistent theory (e.g., medical diagnosis)

Practical ignorance: do know all the rules BUT


haven't collected all relevant information for a particular case
2

Uncertainty (contd)
Plausible/probabilistic inference
Ive got this evidence; whats the chance that this

conclusion is true?

Suppose you wake up with headache


Do you have flu? Logical Inference: if headache then flu But not all patients have headache due to flu.

Better inference rule would be


if headache then problem is flu with 0.8 probability or P(flu | headache) = 0.8 the probability of flu is 0.8 given headache is observed

Uncertainty (contd)
Probability theory
Formal language for representing and reasoning with uncertain knowledge. Compute the probability of an event or decision given the evidence or observation

Rather than reasoning about the truth or falsity of a proposition, reason about the belief that a proposition or event is true or false
4

Source of Probabilities
Relative Frequency probability means the fraction that would be observed in the limit of large number of samples if 10 of 100 people tested have a cavity the P(cavity) = 0.1 Objective-based probabilities are real aspects of the world objects have a tendency to behave in certain ways coin has a tendency to come up heads with a probability 0.5 Subjective-based probabilities characterize an agent's belief i.e. experience and judgment of the person making estimates the probability that you'll pass the final exam can be based on your own subjective evaluation of your hardwork and understanding of the material
5

Sample Space
A space of events/outcomes to which we assign probabilities Events can be binary, multi-valued, or continuous Events are mutually exclusive Examples
Coin flip: {head, tail} Die roll: {1,2,3,4,5,6} English words: a dictionary Temperature tomorrow: R+ (Kelvin)
6

Random Variable
A variable, X, whose domain is a sample space, and whose value is (somewhat) uncertain
Assigns a number to every possible outcome of an experiment

Examples: X = coin flip outcome X = first word in tomorrows headline news X = tomorrows temperature For a given task, user defines a set of random variables for describing the world

Random Variable contd


Refers to attributes of the world whose "status" is
unknown Have one and only one value at a time Have a domain of values that are possible states of the world: Boolean: domain = <true, false> Discrete: domain is countable and values are mutually exclusive and exhaustive e.g. Sky domain = <clear, partly_cloudy, overcast> Continuous: domain is real numbers
8

Probability for Discrete Events


An agents uncertainty is represented by
P(A=a) or simply P(a) the agents degree of belief that variable A takes on value a given no other information relating to A a single probability called an unconditional or prior probability

Examples
P(head) = P(tail) = 0.5 fair coin P(A=head or tail) = 0.5 + 0.5 = 1 P(A=even number) = 1/6 + 1/6 + 1/6 = 0.5 fair 6-sided die P(A=two dice rolls sum to 2) = 1/6 * 1/6 = 1/36
9

Probability Distributions
Given A is a RV taking values in <a1,a2,,an> e.g. if A is Sky, then a is one of <clear, partly_cloudy, overcast> P(a) represents a single probability where A=a e.g. if A is Sky, then P(a) means any one of <P(clear), P(partly_cloudy), P(overcast) > P(A) represents a probability distribution The set of all possible values of a random variable and their associated probabilities <P(a1),P(a2),,P(an)> if A is Sky, then P(Sky) is the set of probabilities <P(clear), P(partly_cloudy), P(overcast)> sum over all values in the domain of variable A is 1

P(a ) P(a ) ... P(a ) 1


i 1 n
10

Useful Probability Distributions


Binomial Distribution : describes the number of successes in independent trials of a Bernoulli process (a process with two outcomes) Normal Distribution: Bell-shaped distribution that is a function of two parameters, the mean and standard deviation

Exponential Distribution: Used in dealing with queuing problems. Often used to describe the time required to service a customer Poisson Distribution : describes customers arrival times during a certain time interval
F Distribution: Helpful in testing hypotheses about variances
11

Useful Probability Distributions contd


Binomial Distribution

Normal Distribution

Exponential Distribution

Poisson Distribution

12

Axioms of Probability
0 <= P(A=a) <= 1 for all a in sample space of A P(True)=1, P(False)=0 P(A v B) = P(A) + P(B) - P(A ^ B)

Derived Properties: P(~A) = 1 - P(A) If A can take k different values a1, , ak: P(A=a1) + + P(A=ak) = 1 P(A) = P(A ^ B) + P(A ^ ~B) if B is binary event
13

Joint Probabilities
Joint probabilities specify the probabilities for a conjunction of events
Bird T T T T F F F F Flier T T F F T T F F Young T F T F T F T F Probability 0 0.2 0.04 0.01 0.01 0.01 0.23 0.5
14

Joint Probabilities
With n Boolean variables, the table will be of size 2n. And if n variables each had k possible values, then the table would be of size kn Example
P(Bird=T) = P(bird) = 0.0 + 0.2 + 0.04 + 0.01 = 0.25 P(bird, ~flier) = 0.04 + 0.01 = 0.05 P(bird flier) = 0.0 + 0.2 + 0.04 + 0.01 + 0.01 + 0.01 = 0.27
15

Conditional Probabilities
Formalize the process of accumulating evidence and updating probabilities based on new evidence. P(A|B) = P(A ^ B)/P(B) = P(A,B)/P(B) Example:
P(~B|F) = P(~B,F) / P(F) = (P(~B,F,Y) + P(~B,F,~Y)) / P(F) = (.01 + .01)/P(F) P(B|F) = P(B,F) / P(F) = (P(B,F,Y) + P(B,F,~Y)) / P(F) = (0.0 + 0.2)/P(F) P(~B|F) + P(B|F) = 1, so substituting and solving for P(F) we get P(F) = 0.22.
16

Conditional Probabilities contd


Product Rule P(A,B) = P(A|B)P(B) Chain Rule P(A,B,C,D) = P(A|B,C,D)P(B|C,D)P(C|D)P(D) Conditionalized version of the Chain Rule P(A,B|C) = P(A|B,C)P(B|C) Bayes Rule P(A|B) = P(A)P(B|A) / P(B) Conditionalized version of Bayes Rule: P(A|B,C) = P(B|A,C)P(A|C)/P(B|C)
17

Conditional Probabilities contd


Conditioning /Addition Rule: P(A) = Sum{P(A|B=b)P(B=b)} where the sum is over all possible values b in the sample space of B. P(~B|A) = 1 - P(B|A) Example
P(~Bird | Flier, ~Young) = P(~B,F,~Y) / P(F,~Y) =P(~B,F,~Y) /(P(~B,F,~Y) + P(B,F,~Y)) = .01 / (.01 + .2) = .048
18

Bayes Rule
Given a prior model of the world P(A) and a new evidence B, Bayes rule says how this piece of evidence decreases our ignorance about the world Initially know P(A) Update after knowing B (Posterior) P(A|B)=P(B|A).P(A) / P(B) Generalizing Bayes Rule for two pieces of evidence, B and C, we get: P(A|C,B) = P(C,B | A) P(A) / P(C,B) =P(C|B,A) P(B|A) P(A) / [P(C|B) P(B)] = P(A) * [P(B|A)/P(B)] * [P(C | B,A)/P(C|B)]
19

Independence
RVs A and B are independent if
P(A|B) = P(A) P(B | A) = P(B) P(A,B) = P(A)P(B)
RVs A and B are conditionally independent given C if P(A | B, C) = P(A | C) P(B | A, C) = P(B | C) P(A, B | C) = P(A | C) P(B | C)
20

Independence contd
Bayes Rule with Multiple, Independent Evidence Assuming conditional independence of B and C given A then Bayes Rule can be simplified as: P(A | B,C) = P(A) P(B,C | A) / P(B,C) = P(A) P(B|A)P(C|A) / [P(B) P(C|B)] = P(A) P(B|A)P(C|A) / [P(B) P(C)] = P(A) * [P(B|A)/P(B)] * [P(C|A)/P(C)]

21

Bayes Rule Example


RVs: P=PickledLiver (disease), J=Jaundice (symptom), B= Eyes Bloodshot (symptom) P(P) = 10-17 ,P(J) = 2-10 and P(J | P) = 2-3 ,P(B) = 2-6 and P(B| P) = 2-1 . J,B are independent. Determine the likelihood that the patient has a PickledLiver. P(PickledLiver | Jaundice) = P(J|P) P(P) / P(J) = (2-17 * 2-3)/2-10 = 2-10 P(PickledLiver | Jaundice, Bloodshot) = P(P) P(J|P) P(B|P) / [P(J)P(B)] = 2-10 * [2-1 / 2-6] = 2-5 .
22

Nave Bayes Classifier


Say we have a class/diagnosis/decision variable A. Goal is to find the value of A that is most likely given the evidence B,C,D i.e. Find a such that P(A=a | B,C,D) is maximized
argmaxa P( A a)P(B | A a)P(C | A a)P(D | A a) / P(B,C, D)

P(B,C,D) is constant for all a so can be ignored.


argm axv P(V v)

i 1

P( X i xi | V v)
23

Homework
Discrete mathematics and its Applications Study Chapter-1. Solve related examples/problems Quantitative Analysis for Management Study Chapter 1 & 2 Solve related examples/problems
24

You might also like