Supervised

Pattern Rc,,,gmtlo, V.I. I'~. Nl~. t, PP. 21)1 21{~. 19R2. Printed in Great Britain.
I~ltl t20t/82/tl~0201 16$o.kt~/O Pergamon Press Lid. 1982 Pattern Recognition Society
SOME APPROACHES TO OPTIMAL CLUSTER LABELING WITH APPLICATIONS TO REMOTE SENSING

C. B. C l l I t t I n r n l CONOCO Inc., 1000 South Pine, P.O. Box 1267, Ponca City, OK 74603, U.S.A. (Received 20 November 1980; in revised form 5 May 1981 ; received for publication June 1981)
Abstract - - This paper presents some approaches for the problem of labeling clusters using information from
a given set of labeled and unlabeled patterns. Assigning class labels to the clusters is formulated as finding the best label assignment'over all possible label assignments with respect to a criterion. Labeling clusters is also viewed as obtaining probabilities of class labels to the clusters with the maximization of the likelihood function and probability of correct labeling as criteria. Closed form solutions are obtained for the probabilities of class labels to the clusters by maximizing a lower bound on the likelihood criterion. Fixed point iteration equations are developed for obtaining probabilitiesof class labels to the clusters. The problem of obtaining class labels to the clusters is further formulated as that of minimizing the variance of the proportion estimates of the classes that use both the given labeled and unlabeled patterns. Imperfections in the labels of the given labeled set are incorporated into the criteria. Furthermore, the results of application of these techniques in the processing of remotely sensed multispectral scanner imagery data are presented. Aerospace imagery Asymptotic variance Cluster labeling Label imperfections Maximum likelihood Mode Probabilities of class labels Probability of correct labeling Probability of error Remote sensing Unlabeled patterns Variance of proportion estimate
I. I N T R O D U C T I O N
Recently, there has been considerable interest in the development of systems for the machine processing of remotely sensed data for predicting crop yields, monitoring crop conditions, inventorying natural resources, etc. The inherent classes in the data are usually multimodal, and clustering techniques"2) have been found to be effective in the analysis of remotely sensed data. 131 These usually break up the image into its inherent modes or clusters. In the prediction of crop yields and the inventory of natural resources, an important step in machine processing is the estimation of the proportion of the class of interest in the image. The proportion estimation may be done by either of the following methods :t4)
tained from the classification of every pixel in the image. Labeling the clusters is one of the crucial problems in the application of clustering techniques, either for the estimation of proportion of class of interest in the image or in the classification of imagery data.
There is considerable interest in cluster labeling in the statistical literature. ~51 This problem is also common in labeling the regions, using segmentation algorithms, in the development of scene understanding systems. In the recent literature, relaxation labeling algorithms 16'7~ have been proposed for labeling the segmented regions, but these use relational or spatial properties of the regions through compatibility coefficients. However, in cluster labeling, the relational 1. By classifying every pixel in the image using a properties of the clusters are either nonavailable or not classifier, such as the maximum likelihood class- meaningful. For example, in aerospace imagery, the ifier. The use of the maximum likelihood class- regions of interest are crops, nonagricultural areas, ifier with the clustering requires the labeling of etc., and can be anywhere in the image; hence, it is the clusters for computing the probability density difficult to define relational properties. functions of the classes. It is the purpose of this paper to address the problem 2. By estimating the proportion of the class ofinterest of labeling the clusters using the information from a directly from the labeled clusters. It is observed ~4~ given set of labeled patterns. It is assumed that the that the proportions estimated directly from the probability density functions and a priori probabilities labeled clusters agree closely with the ones ob- of the clusters or modes are given. Let these re201
202
C.B. CHITTINENI
spectively be p(X ID = i), 6~, i = 1, 2 ..... m where m is the number of modes or clusters. It is also assumed that a set of labeled patterns, X,U) with labels w,(j) = i; j = 1, 2 ..... N~; i = 1, 2 .... , C are given, where C is the number of classes. In remote sensing, the labels for these patterns are provided by analyst interpreter (AI) by examining imagery films and using some other information such as historic information, crop calendar models, etc. Very often, the AI labels are imperfect. It is relatively expensive to acquire labels, and a large number of unlabeled patterns is usually available. Some approaches that use all the given information are presented in this paper for optimum cluster labeling. This paper is organized as follows. In Section 2, the problem of obtaining optimum class labels to the modes is formulated as the one that maximizes likelihood criterion by exhaustive search over all possible label assignments. Section 3 considers the problem of obtaining probabilities of class labels to the clusters using maximum likelihood criterion. A closedform solution that maximizes a lowerbound on the criterion is presented in Section 3. Also, iteration equations are developed for obtaining the probabilities of class labels to the clusters. In Section 4, probability of correct labeling is used as a criterion for obtaining probabilities ofclass labels for the modes. In Section 5, variance of the class proportion estimates is proposed as a criterion that uses both the given labeled and unlabeled pattern sets for obtaining the probabilities of class labels to the modes. Imperfections in the labels of the given, labeled set are considered in Section 6. Section 7 contains the experimental results in the processing of remotely sensed imagery data and a concluding summary is given in Section 8. Fixed point iteration schemes for the probability of correct labeling criterion are presented in Appendix A.
Z q,=l
i=1 Ci
c [
,
~. q~j = 1
j=l
(2-I)
pi(X) = J
q'JP'J(X)
p(X) = ~ qiPi(X) )
i=l
Choosing the likelihood function as a criterion, the likelihood of occurrence of given patterns with their corresponding labels can be expressed by the quantity L', where U =
i=l "=
p[XiU ), to,U ) = i
=
i=1 "=
pEX,U) lto, U) = i] PEte,U) = i
\, : I
U:
,=
(2-2)
Since logarithm is a monotonic function of its argument, taking logarithm ofequation (2-2) results in L = log(L') C ( N, c, X" + ~, (2-3)
The a priori probability of modej of class i, qo, may be estimated as follows. For a particular labeling assignment, let the modes 1, 2..... C, belong to class i. Then qo2. LABEL A S S I G N M E N T T O CLUSTERS BY E X H A U S T I V E SEARCH
6j c,
(2-4)
'~r
If the clustering algorithms generate a relatively fewer number of clusters (in remote sensing, typically around 12), optimal class label assignment for the clusters can easily be obtained by exhaustive search. By giving all possible class label assignments to clusters and computing the value of the criterion for each assignment, the optimal class label assignment can be chosen as the one that extremizes the criterion. If the density functions of the modes are Gauss, an, the criterion takes a relatively simple form (for example, if a clustering algorithm based on the maximum likelihood equations (z) is used to fit the Gauss,an density functions for the modes). 2.1 Case in which the number of modes is equal to the number of classes and the mode-conditional densities are Gauss,an
In general, the class-conditional density functions are multimodal. Let C, be the number of modes of class i, where
C
~. Ci=m. i=!
By defining a criterion, the class label assignment to the modes that maximizes the criterion can be chosen as the optimal assignment. Let p~(X) be the density function of class i, pij(X) be the density function of modej ofc|ass i, qij be the a priori probability of modej of class i, q, be the a priori probability of class i, and p(X) be the mixture density function. Then we have the following relationships.
Optimal cluster labeling with applications to remote sensing Consider a simple case in which the number of classes is equal to the number of modes and the modeconditional densities are Gaussian. That is and ai>0;
i=
203
1,2 ..... C
(2-10)
p(X Ill =
l) ~
N(M,, Et).
(2-5)
Let the density function of the/th mode of class i be
The problem is to assign class labels to the modes such that the criterion L of equation (2-3) is maximized. For a particular assignment of labels, let the mode I be given a class label i. In a unimodal case, then
p,,(X) ~ N(Md,, E~,).
(2-11)
Using equations (2-9)-(2-11) in equation (2-8) yields L > C } log(2n) - L 2 where

C Ci /1
(2-12)
p(X I(o = i) = p(X Ill = l).

Let
(2-6)
p(X [to = i) ~ N(M~, Y.f)

where M~=M,, E~=E,, andqi=b,=fi~.
L2 = ~ ~ ~,logtlZ,SI)
i=1 I=1
+
x I~.
C
E qi~
i=l I=1 '=
-- Mdi,)T
Then, the log likelihood criterion becomes L = T - n C log(2n)-
~,:_~ log( IZ,~I )

C
1 c
[Xi(j) - M
- 2 Z N, Iog(q~).
i=l
(2-13)
+ ~ Nilog(6/a)-l tL l a
i=t
(2-7)
where L~ =
i=1 I_j=l
Thus, the optimal class label assignment can be chosen as the one that minimizes L2. Combinatorial algorithms ~s) can be used to efficiently generate all possible class label assignments for exhaustive search.
3. P R O B A B I L I S T I C C L U S T E R L A B E L I N G B A S E D O N
(X,(j) - M~) Yi ~(X,(j) - M~) .

MAXIMUM L I K E L I H O O D C R I T E R I O N
The maximization of L is equivalent to minimization of L~. That is, when the number of modes is equal to the number of classes, the assignment of class labels to the modes by the maximization of the criterion L is based on the smallest quadratic distance between the given labeled patterns of each class and the modeconditional densities. 2.2 General case in which the number of modes is greater than the number of classes In this case, the criterion L becomes
The last section addressed the problem of obtaining class labels to the clusters by exhaustive search. This section considers the problem of obtaining a probabilistic description for the class labels of the clusters. The criterion used is the likelihood function, but normalized as shown in equation (3-14).
p[x,(j),
g t i=l
~,(j) = i
i=1 tj=l
L=
i=1 C
log
I
qf, pi,[Xi(j)
(2-8) The mixture density p(X) can be written in terms of class-conditional densities as
C
+ E N,log(qf)
i=1
where d refers to a particular assignment ofclass labels. The probabilities q/a l and q/4are computed according to equation (2-4), for this label assignment. Equation 2-8) can be used as a criterion. However, for Gaussian densities, a simple criterion can be obtained using the fact that the logarithm is a convex upward function to derive a lower bound on L. Since logarithm is a convex upward function, we have the inequality log
i
p(X) = ~. p((o = i) p(X Io9 = i).

i=1
(3-2)
The mixture density p(X) can also be written in terms of mode-conditional densities as
p(X) = ~ P(tl = l ) p ( X l t l = 1)
I=1
a~o~(X) > ~ ailog[gi(X)]

i=l
(2-9)
:
I=l i=1
p ( . : ,.
--,)pix
I.--,)
where
(,
~ai=l
i=l
(3-31
204
C.B. CHITTINENI Equations (3-8) and (3-9) are similar to the maximum likelihood equations 1~2.1a~ in the estimation of parameters of a mixture of Gauss,an densities. However. closed form solutions for aq, can be obtained with the criterion as the maximization of a lower bound on L. Using the inequality (2-9) in equation (3-6). a lower bound on the log likelihood function L can be obtained as
Ni
On comparing equations (3-2) and (3-3). the following assumption is made.

p ( X ] to = i) = ~. P(VI = I I co = i) p(X [f~ = l).
/=1
(3-4) Equation (3-4) can be rewritten as
p(co = , I x ) = ~ P(co = ilf~ = l)p(f~ = I l X ).

I=1
L>__
'=lj=ll=l
/pEn-- lx,ull ogl i,l
13-01
(3-5) Since logarithm is a monotonic function of its argument, taking logarithm of L' of equation (3-1) and using equation (3-5) yields the following:
Introducing the Lagrangian multipliers, the probabilities a. that maximize the lower bound of equation (3-10), subject to the constraints ofequation (3-7). can easily be shown to be a,=
N~e, c
(3-11)
L= log(L')=
/=l j=
log
I=1
~,,plf~ =/IX,t/)]
(3-6)
where
~'~ Nrerl r=t

1 :' e,, = ~ j ~ l p[V~ = / I x ' u ) ] '
where ct. = P(co = i ]f~ = I) is the probability that the label of model I is class i. The probabilities ct, satisfy the constraints given in equation (3-7). ~t,>0; i = 1 , 2 .... ,C
(,
(3-12)
a n d l = l , 2 ..... m (3-7)
tz.=l;
i=1
I = 1,2 ..... m
Closed form solutions for a., by maximizing L of equation (3-6), subject to the constraints of equation (3-7). seem to be difficult. The probabilities 0t, can easily be obtained using optimization techniques such as Davidon- Fletcher-Powell.t 9-1i The Davidon-Fletcher-Poweli procedure, in conjunction with an exterior penalty function, very efficiently carries out the optimization of the performance function, subject to various constraints. In general, these constraints must be continuous differentiable functions of the parameters. The original likelihood function is augmented with the functions of the constraints. The augmented likelihood function is penalized whenever the constraints are violated. For sufficiently large penalties, the unconstrained optimization of the augmented likelihood function can be shown to be equivalent to the original constrained optimization. The following fixed-point iteration equation for the solution of the above optimization problem can easily be obtained by introducing Lagrangian multipliers.
Ni
This solution simply states that the probability of the ith class label for a given cluster I is the ratio of the a posterior, probability of cluster l given the labeled patterns from class i to the sum over all classes of the sum of the a posterior, probabilities of the cluster l given the labeled patterns from each class. Having obtained ~t., q,, the proportion of class i can be estimated as follows. Consider q , = ~ P(co = i. f~ = l) = ~ ~5,~t,.
/=1 /=1
Hence, c~,,the estimate of q,, can be computed from the following. 4i = ~. 6t~..
I=1
(3-13)
4. C L U S T E R L A B E L I N G B A S E D O N T H E C R I T E R I O N O F PROBABILITY OF CORRECT LABELING
If the class-conditional densities are known, the a

posterior, probabilities of the classes can be expressed as a function of pattern X and a priori probabilities.
(Xli ~-
J=~ Ni
(3-8)
Z ~ d,,~
/=l j=l
where
dlu --
Since the label of the pattern X,(j) Is i, for particular class-conditional densities and a priori probabilities, p[co = i Ix,u) ] is the probability with which the pattern X,(j) is correctly recognized. Let p. be the probability that the pattern comes from class i and is assigned to class i for particular class-conditional densities and a priori probabilities. Similarly, let p. be the probability with which the pattern comes from class i and is assigned to class I. Then, these probabilities can be expressed as
p , = P(co = i)
~,,p[f~ = t l x,u)]
~. et.p[f~ = l I Xi(j)-]
/=1
(3-9)
f p(to= i I x ) p ( x l c o
E
= i)dX
= P(co = i)
[p(co = i] X)]
(4-1)
Optimal cluster labeling with applications to remote sensing and p,, = P(,, = i) E
~XIo~ = i )
205 (4-10)
0 < riO') _< 1 [p(,o = / I X ) ]. (4-2) Using equation (4-10) in equation (4-9) yields Var(Ps) <
i=l
The probability of correct labeling or the probability with which a pattern comes from a particular class and is assigned to the same class is
("
q~ N-~ ~ {E[rl(j)] -- u~}

j=l
=
i=1
q~-z--ui(1 -- Ui).
Ni
(4-11)
P,; = ~
i=1
Pii"
(4"3)
The error probability or the probability with which the pattern comes from a particular class and is assigned to some other class is
C C
P~= ~ ~ Pa,
i=l I = l I~i
(4-4)
Hence, the variance of PsiS less than the variance of the estimate based on counts of correctly classified patterns.C1 s~The criterion of either the maximization of P.~ or the minimization of Pe can be used to obtain probabilistic class label assignment for the clusters. Using the relationship of equation (3-5) between a posteriori probabilities of classes and clusters in terms of probabilistic a, in equation (4-6) results in
From equations (4-1)-(4-4), it is easily seen that Ps + P~ = 1. 4.1 Estimation of optimal ~tl It is observed that equations (4-1) and (4-3) are based on treating the a posteriori probabilities of the classes as continuous variables and differ from the usual estimates based on the counts. The probability P~ can be estimated from the given, labeled pattern set as follows.
C
(4-5)
/3s
-'-
c 1 Ni ~. ,=IE e(~ = i ) ~ ; _E ,,=,.
~.prn= llX,V)3
(4-12)
= ~qi~a,,e,,
i=1 '=1
where
1 '~ eu = ~j=~l pill. = l I X,(/)].
(4-13)
The probabilities q~ and a, are related as follows. qi = ~ e,6,;

I=l
P,s = ~ cifii
i=,
(4-6)
i = 1,2 ..... C.
(4-14)
where
1
3
Ni
,,U)
(4-7)
Now the problem of estimation of proportions and the probabilities of class labels for the clusters can be formulated as follows. Find:qi, i = 1 , 2 , . . . , C a n d e t , : i = 1,2 .... , C ; l = .... m such that Ps is maximized where P s = ~ q, ~ ~,e~,.
i=1 I=l
and
1,2,
ri(j ) = p[o) ---- i I X,(j)] . The following analysis shows that the estimate for Ps of equation (4-6) has less variance than the estimate based on counting the classification errors. The estimate of equation (4-6) is unbiased. That is
C ("
(4-15)
Subject to the constraints

C
E(Ps) = ~. q,E(fi,) = ~ q,u, = Ps.

i=1
i=1
(4-8)
C
~qi=l
i=1
Assuming the patterns are independent, an expression for the variance of Ps can be obtained as follows. Consider Var(Ps) = E[(P s - ps) 2]
e,/= 1" l = 1,2 ..... m

i=!
(4-16) ~euf,=qi; i = 1 , 2 ..... C i = 1 , 2 ..... C

,=,
EI, ~'L-,
c
N,
1
_
N, N,
2
and
qi>_O; c
x E{[r,(j) 1 N~
u,]rrk(I
) -
u,]}
(4-9)
eli>0;
i = 1 , 2 ..... C;
1 = 1 , 2 ..... m
= ,=,~ qi2~./~,_ (E{[ri(-/)]z}- u/2).

But, we have the relationship
Comparing equations (3-6) and (4-15), it is seen that qi is now directly entered into the problem. Optimization techniques such as Davidon-Fietcher-Powell o mean easily be used to solve the above problem.
206
C.B. CHITTINENI
4.2 A relationship between m a x i m u m likelihood and

probability o f correct labelin9 criteria
Pc(i) = P(o c = i) be the probability that the classifier classifies a pattern into class i and q~ = P(o9 = i) be the Since the logarithm is a monotonic function of its ; a priori probability of class i. Then we obtain argument, taking the logarithm ofequation (4-12)and Pi.i = P ( ~ = i, oc = j) using the inequality of equation (2-9) yields equation = P(o2c = j) P(co = i] coc = j) (4-17).
{'," =z
~
=,Ix,.)1
~,,P[n=llx,U)]
= P('(J))~ij.
(5-I)
_>
i=1
q, l o g ~ ,
Z ~
ij=ll=t
Since each classification is independent, the likelihood function of the observed n's and V's can be written as
C C C
>
i=l
qi-N-~
j=t
log
I
a,p[f~ = l I Xi(j) ] . (4-17)
L = K I-[ I-I ()',~)"'~ FI [Pc(J)] vj+"~

i=lj=l j=l
(5-2)
Substituting the sample estimates for q, q~ = ( N d N ) , in equation (4-17) result~ in the following. iog(/S~) >_ ~-=~, s : , log , ~,,p[fl = l I X,(j)] . (4-18) From equation (4-18), it is seen that the maximum likelihood criterion provides a lower bound on the probability of correct labeling criterion.
where K is a constant. The value of Pc(j) and 2q which maximize L, subject to the probability constraints on Pc(J) and )'o, can be shown to be,6.1~}
Pc.U)-
c y. (n., + v,)
l=l
n.j + Vj
(5-3)
and
).~q =--.ni~
n.j
(5-4)
An estimate ~i for the proportion q~may be obtained as follows.

5. U S E O F L A B E L E D A N D U N L A B E L E D P A T T E R N S F O R PROBABILISTIC CLUSTER LABELING C j=l C j=l
ql = P(co = i) = Y~ P(co = i, COc = j) = Y~ Pc(J);.q. (5-5)

From equations (5-3)-(5-5), the following is obtained. I-n
~ = j=ic
One of the important objectives in the processing of remotely sensed imagery data is to estimate the proportions of classes of interest. Ideally, these estimates should be unbiased and of minimum variance. It is the purpose of this section to develop a scheme that uses both the given labeled and unlabeled patterns for obtaining the probabilities of class labels for the clusters by minimizing the variance of the proportion estimates. It is assumed that we are given a set of labeled patterns X i(j) Owl(j) = i ;j = 1,2 ..... N i ; i = 1, 2,..., C and a set of unlabeled patterns Z~, i = 1, 2,..., N,. Let N r be the total number of labeled and unlabeled patterns. That is
C
]
(5-6)
L;(.. + v,i
Y (n.~ + V,)
/=1
Nr=
~,Ni+N
i=1
Let Y~, i = 1, 2 . . . . . N r be the given labeled and unlabeled patterns. Let the Bayes classifier be used to classify the given labeled and unlabeled pattern sets. For particular class-conditional densities and a priori probabilities, let the resulting confusion matrix of a given labeled pattern set and the classification matrix of an unlabeled pattern set be as shown in Table !. Let co be the given label and 02c be the classifier label. Let 2q = P(co = ilto c = j ) be the probability that the true label is i, given that the classifier label isj. Let Pq = P(o = i, coc = j) be the probability that the true label of the pattern is i and the classifier label isj. Let
The estimate of equation (5-6) can be interpreted as follows. The ratio (nq/n.~) gives the proportion of the patterns truly belonging to class i of the patterns classified into class j. Multiplying this ratio by (n 4 + V~) and summing it from I to C gives an estimate of the patterns of class i in the patterns classified into all the classes. Dividing this by the total number of patterns gives an estimate for the proportion of class i. It can be shown (t6"t 7} that the estimate of equation (5-6) is asymptotically unbiased and its asymptotic variance is given by the following. Var(O3
~-~ 2 0 ( 1 - j=i
2ii)Pc(j)
n
+
J =z"t _ qi n q2i IN. N. -n j Pc(j)2~.
(5-7) (5-8)
For particular a priori
probabilities and class-
Optimal cluster labeling with applications to remote sensing Table 1. Classifications of labeled and unlabeled pattern sets
207
(a) Confusion matrix of labeled panern set

Classifier label True label
1
...
Number belonging to each class

n~. = N~
nl~ n2t
n~2 tl22
... ...
nw n2c
n2. = N 2
C
Number classified into each class
ncl
n.t
nc2
n.2
...
"'"
ncc
n.c
no. = N c
n = n..
(b) Matrix of classifications of unlabeled set Classifier label
1
Vl where
2
V2
...
...
C
Vc
n o = number of labeled patterns for which the true or given label is i and the classifier label isj. C = number of classes.
C
hi. ~ ~. Ilij. j=l c H.j = ~. nij.

i=l C C
n = n. =
Y" nq, the total number of labeled
i=l j=l
patterns. Vi = number of unlabeled patterns for which the classifier label is j.
c o n d i t i o n a l densities, t h e p r o b a b i l i t i e s Pc(J) a n d 2q m a y be e x p r e s s e d as
Pc(J) = N-~T,=,~"P ( w = J t Y ' ) = where

l Nr
~=t % e . ~
(5-il)
Pc(J) = fp(o2 = j [ X ) p ( X ) d X
and
)'u = P(co = i t o2c = j )
(5-9)
=
P(to = i, m c = j ) and
,=,Z p(n = s I v,i
P((o = i, ~ c = J)
i=1 t ~*ij
1 Ni qi-~i ,: , p[o2 = j IX,(I)] ~. 1 N~ i)dX

,=,
--c=, ;
Z qi
i=
q, | p ( ( o = d
j lx)p(Xl~ =
q,
p[o, = J l x
(5-10)
qi ~ ~sjeis
s=l
p(o) = j l X ) p ( X l c o = i)dX
=
c
U s i n g t h e given l a b e l e d a n d u n l a b e l e d p a t t e r n s , est i m a t e s for Pc(J) a n d 2 u c a n b e o b t a i n e d f r o m e q u a t i o n s (3-5), (5-9) a n d (5-10) as follows:
(5-12)
Ors jets
Z qr
r=l s=l
208
C.B. ClilTTINI-~NI
(.
Using equations (5-1 I) and (5-12) in equation (5-5) yields

C
P(co' = i) = ~ flj, P(co = j )

j=l C
(6-3)
q, = y. ,~,(/);~,j;
j=l
i = 1,2 ..... c.
(5-13)
P(co' = i)p(X Ico' = i) = ~ flj, P(co = j ) p ( X Io9 = j )

j=l
Let S be the sum of asymptotic variances of proportion estimates. From equations (5-8), (5-11) and (5-12), the following estimate for S is obtained. = ~Var(tl')=l-i=1 n
(6-4)
C
p(co' = i l X ) = ~ flj, p(co = j l X )

j=l
(6-5)
~-~
i=l
+ (N~-n')
i
'~1
\2
where it is assumed that p(S it = J) = p(X it' = i, to = j). (6-6)
r c
x
/m
Let fl be the matrix of probabilities of label imperfections where

p = [p,j].
16-7)
(5-14) Now the problem of obtaining probabilistic class label assignment for the clusters may be formulated as follows. Find:cqj;s = 1,2,...,m;j = 1,2,...,C;andq,,i = 1,2, .... C such that S of equation (5-14) is minimized, subject to the constraints of equations (4-16) and (5-13). Optimization techniques such as DavidonFletcher-Poweli o-Hi can easily be used to solve the above optimization problem.
Let v = (fir)- 1. (6-8)
Using equations (6-7) and (6-8) and inverting equation (6-4) results in
C
P(co = i)p(X [to = i) = ~ vijP(co' = j ) p ( X Ico' =j).

j=l
(6-9) Using these relationships, the criteria developed in the previous sections for labeling the clusters can be reformulated to take into account the imperfections in the labels, once fl~i are known or estimated. In the following, it is assumed that the probabilities of label imperfections flji are available.
6. F O R M U L A T I O N W I T H L A B E L I M P E R F E C T I O N S OF THE GIVEN LABELED SET
In practice, such as in the classification of remotely sensed, muitispectrai scanner imagery data, it is difficult and expensive to obtain labels for the pattern set. The labels for the patterns are usually provided by an analyst interpreter on examining imagery films and using some other information such as historic information, crop calendar models, etc. These labels are very often imperfect. Recently, Chittineni (t6"la'~9~ investigated techniques for estimating the probabilities of label imperfections. Once the probabilities of label imperfections are known, these can be used in obtaining the class labels to the clusters through their densities and proportions. Let co and co' be the perfect and imperfect labels, respectively, each of which take values 1, 2, ..., C. The imperfections in the labels are described by the probabilities flj, = P(co' = ilco = j ) where
(,
6.1 Maximum likelihood criterion with imperfections in the labels The log of likelihood function of equation (3-1) with imperfections in the labels can be written as
C Ni
L = Z Z iog{p[co',(j)= iIX,0)]}.
i=lj=l
(6-10)
Using equations (3-5) and (6-5) in equation (6-10) yields L log

/=lj=l I
#,,p[co =
I I X,U)] (6-11)
=
/=l j=|
log
l
~,,/~,,p[O = r IX,U)]
(6-1)
~. flj, = 1.
i=l
(6-2)
The a priori probabilities, the class-conditional densities, and the a posterior, probabilities of the classes with and without imperfections in the labels are related in terms of probabilities of label imperfections as (:8)
For a given fit,, the problem of estimating ~,l can be formulated as follows. Find : ~t,l; r = 1, 2 ..... m ; I = I, 2 ..... C such that L of equation (6-11) is maximized, subject to the constraints of equation (3-7). Closed form solutions for the above optimization problem seem to be difficult. However, the following fixed point iteration scheme, similar to equation (3-8), can easily be obtained by introducing the Lagrangian multipliers.
Optimal cluster labeling with applications to remote sensing

(" Ni
209
subject to the constraints of equation (4-16). (6-12) Optimization techniques such as DavidonFletcher_Powell(9 ~11can easily be used to solve the above optimization problem. 6.3 The criterion qf variance of proportion estimate with imperfections in the labels The probabilities of label imperfections can be taken into account in estimating the probabilities of class labels to the modes through the probabilities ).~j. Using equation (6-9) in equation (5-10) yields
~)'ij =
Z Z dql,
O~r' ~ ("
i=I~ = I Ni ("
E Z Z d,,,,
l=l i=l j=l
where dw,
if-
~"fl"P[~_ = r IX'U)]
(6-13)
s=lk=l
Also, optimization methods such as DavidonFletcher-Powell ~)can easily be used to solve the above optimization problem. 6.2 The criterion of probability of correct labelina with
~, v,P((o' = I1 p(co=jlX)p(X]~o' = I)dX

I=1
"
label imperfections
In Section 4, the probability of correct labeling is proposed as a criterion for obtaining the probabilities of class labels for the clusters. From equations (4-1), (4-3) and (6-9), we obtain Ps = ~
i=1 ~
C ~ C v.P(co'= I)I p(co=jlX)p(Xl~.'= l)dX i=II=l

(6-18)
An estimate for 2q can be obtained from the given, imperfectly labeled patterns as
v,,P(~' ---
[p(co = i] X)][P(co = i)p(X ]co = i)] dX

~q = c t- = 1_ C
I)
s-
I% =JlX,(s)
=jlx,(s)]
s
= ~ ~ vijP(oY=j)fp(tn=ilX)p(XloY=j)dX.
i=lj=l
Z Z
i=1 I=1
= l)
(6-14) An estimate for Ps can be obtained in terms of given, imperfectly labeled patterns as
C C
(6-19) Using equations (6-3), (3-5), (6-19) and (5-11) in equation (5-8) yields a criterion similar to equation
(5-14).
Ps = ~ ~ vijP(oY= j)
i=11=I
xF I ~p[(o=ilXfl)]]. (6-15, LNj

,=
7. EXPERIMENTALRESULTS This section presents some results obtained in the processing of remotely sensed Landsat multispectral scanner (MSS) imagery data. The objective of the processing is to estimate the proportion of class of interest in each image. Several segments were processed in the following manner CA segment is a 9 x 11 km or 5 x 6 nautical mile area for which the MSS image is divided into a rectangular array of pixels, 117 rows by 196 columns.). The image is overlaid with a rectangular grid of 209 grid intersections. Ground truth labels or true labels of the pixels or dots corresponding to each grid intersection are acquired. Also, for a subset of the pixeis of 209 grid intersections, the labels are provided by analyst interpreter (AI) by examining the imagery films and using information such as historic information, crop calendar models, etc. These are imperfect labels. There are two classes in the image. Class 1 is wheat and class 2 is nonwheat, designated "other". The class of interest is wheat. Several acquisitions are used for each segment. The number of features or the number of channels used for each segment is listed in Table 2. The Gaussian modeconditional densities and a priori probabilities of the inherent modes in the data of each segment are obtained using maximum likelihood clustering algor-
Using equations (3-5) and (6-3) in equation (6-15) yields

C C
Ps = ~ Z viJP(t' =J)
i=lj=l
x
--rlx,(,,l}
C
C
=
Z Z vi' Z fl,Jq,
i=lj=l a=l
( 1 Nj
x '
an
,..,pt"--rlX,(')1
'=1
"~
= ~ q, ~ ~ %b,,i
s=l r=l
(6-16)
where
("
b~,i = ~ fl,f~,v u.
j=l Now
(6-17)
the problem can be formulated as follows.
Find:ct,i;i = 1,2,...,C;r = 1,2,...,mandq,,s = 1,2, .... C such that/~s of equation (6-16) is maximized,
PB 15:3 - G
tJ Q
Table 2. Estimation of proportion of class 1 with maximum likelihood criterion
AI labels
computed comparing AI and GT labels [ ' , , ,#,21 Closed form
GT labels
p-matrix
With imperfections initial :~,~=

No. of
Closed
form
Segment 0.2311
L0.O308 0.9692J
Location 0.2456 0.3621 0.3885 ?.5455 0.45451 0.3044 0.3025
Number of labck'd patterns '-~21 ,6z~J 0.5 solution solution lterative scheme Iterative scheme
Closed form solution
features (d) and clusters

(m)
GT proportion
1005
Sherman.
97
d=8
m=ll
0.348
Texas
1060
Cheyenne, Colorado 0.6378 0,6265 0.7395 0.7156 F0.9315 0.0685] L0.1304 0.86963 ?.7917 0.2083] L0.0149 0.9851A F0"46G0 0"5400 ] LO J.1569 0.8431 ?.2667 0.7333~ L0.0390 0.96103 0.1156 0.1181 0,1649 0.1896 ?.4211 0.5789] L0,0640 0.9360J ?.8077 0.1923~ L0.0615 0.9385J [~0.8644 0.1366]
.2500 0.75002 0.832E-01 -0.11156E-OI - 0.17522E-OI 0.8667E-O2
106
0.1968
0.1975
0.3169
0.3251
?.5667 0.4333~ L0.0293 0.9737A 0.7141
0,2174
0,2172
d=4
0.231 m=ll 0.7139 d=6 m=12 0.3647 0.744
1231
Jackson, Oklahoma 0.2300 0.2109 0.2733 0.2696
96
0
0.3643 d=6
m=
1520
Big Stone, Montana 0.3214 0.2963 0.5023 0.4982
91
0.301 I0 0.4817 0.4814

d=4 m=7
1604
Renvillc. N. Dakota 0,1019 0.10R5 0.2977 0.3037
101
7 0.524
z
1675
s.MCpherson'Dakota
107
0.2193
0.2156
d=8 m=12 0.1937 0.1932 ,/=8 m=12 0.3052 0.3052 d=6 m=12 0.6269 0.6269
d=8
0.291
1805
Gregory. S. Dakota 0.3161 0,3246 0.3200 0.3379
144
0.164
1853
Ness, Kansas 0.6273 0.6282 0.6287 0.6345
91
0.305
1899
Walsh.
95
O.596 m=7
0.93778E-O2
N. Dakota
Bias
O.80778E-OI
MSE
0.12736-01
0.13575E-01
0.11274E-02
0.18077E-02
0.1750E-02
0.18369E-02
Table 3. Estimation of proportion of class 1 by exhaustive search Maximum likelihood criterion AI labels 0.2746 0.2171 0.7562 0.1837 0.4114 0.1713 0.1207 0.3573 0.6313 0.4238E-01 0.5805E-02 0.6320 -0.2111E-03 0.5804E-03 0.2719 0.1981 0.2843 0.3131 0.1098 0.2860 0.6312 0.2995E-01 0.3233E-02 0.5195 0.4378 0.3017 0.1986 0.7245 0.7672 0.7491 0.2886 0.5359 0.2809 0.1914 0.2960 0.6441 - 0.8077E-02 0.4987E-03 0.2529 0.2171 0.2171 0.3220 0.2746 0.3746 0.349 0.231 0.744 0.301 0.524 0.291 0.164 0.306 0.596 ?,
O
Probability of correct labeling criterion AI labels GT labels Ground truth proportion
Segment 97 106 96 91 101 107 144 91 95
Location
No. of labeled patterns GT labels
O
B
1005
Sherman, Texas
1060
Cheyenne, Colorado
1231
g_
.,i =r"
Jackson, Oklahoma
1520
Big Stone, Montana
1604
Renville, N. Dakota
R"
1675
Mcpherson, S. Dakota
1805
Gregory, S. Dakota
1853
R
t~
Ness, Kansas
1899
Walsh, N. Dakota
Bias MSE
b~
212
C.B. CHITTINENI estimated labeling accuracies and the proportion estimates obtained using maximum likelihood criterion to label the clusters with these probabilities of label imperfections are listed in Table 5. From Table 5, it is seen that there is considerable improvement in the proportion estimates with the use of estimated flmatrix over that obtained directly using imperfectly labeled patterns. 8. CONCLUDINGSUMMARY In the classification of imagery data such as in the machine processing of remotely sensed multispectral scanner data, unsupervised classification techniques have been found to be effective. Clustering techniques break up the image into its inherent modes. One of the crucial problems in the machine classification of imagery data is to label these clusters. This paper addressed the problem of labeling the modes and proposed various techniques. It is assumed that the a priori probabilities of the modes and the mode-conditional probability densities are available. It is also assumed that a set of labeled patterns from the classes of the data and a set of unlabeled patterns are given. The labels of these patterns might be imperfect. Using the given, labeled pattern set, the problem of assigning the class labels to the modes is formulated as a combinatorial problem. If the number of modes is small, best assignment of class labels to the modes can easily be obtained by exhaustive search, using criterion such as maximum likelihood. The problem is also formulated as that of obtaining probabilistic class label assignment to the modes using maximization of either the likelihood function or the probability of correct labeling as a criterion. Closed form solution is obtained for the probabilities of class labels to the modes with the maximization of lower bound on the likelihood function as a criterion. In the processing of remotely sensed data, one of the important objectives is to estimate the proportion of class of
ithm. ~2"2m The number of modes generated for each segment is listed in Table 2. The probabilities of label imperfections of AI labels or fl matrix are estimated for each segment and are listed in Table 2. The theory developed in Section 3 is applied in estimating the probabilities of class labels to the modes of each segment using AI labeled patterns and using ground truth labeled patterns. The proportion of class 1, the class of interest, is estimated for each segment using equation (3-13)and listed in Table 2. The theory developed in Section 6.1 is used with the AI labeled patterns and the corresponding fl matrix in estimating the probabilities of class labels to the modes. Equation (3-13) is then used in estimating the proportion of class 1 for each segment and the results are listed in Table 2. From Table 2, it is observed that the estimates obtained with the closed form solution of equation (311) for the probabilities of class labels to the modes are in close agreement with the ones obtained using the fixed point iteration scheme. Better proportion estimates are obtained by taking the imperfections in the AI labels into account through the# matrix instead of estimating the proportions directly using AI labeled patterns. The estimated proportions of class 1 by exhaustive search using maximum likelihood criterion and maximization of probability of correct labeling criterion with both the AI and ground truth labels are listed in Table 3. The estimated proportions of class 1 from the given, randomly labeled patterns are listed in Table 4 for all the processed segments. On comparison of Tables 2, 3 and 4, it is seen that there is improvement in the estimates through machine processing. For all the segments, the method developed in reference 19 is used to estimate the probabilities of label imperfections of AI labels. The number of labeled patterns used for each segment is listed in Table 5 and the number of unlabeled patterns used is 836. The
Table 4. Estimation of proportion of class 1 from randomly labeled patterns Number of labeled patterns 97 106 96 91 101 107 144 91 95 Proportion of class 1 estimated from labeled patterns AI labels 0.2061 0.1600 0.7395 0.2197 0.3069 0.0934 0.1002 0.2637 0.6316 0.86611E-01 0.13840E-01 GT labels 0.3368 0.2830 0.7604 0.2637 0.4950 0.2897 0.1389 0.2857 0.6484 0.37778E-03 0.l 0134E-02 GT proportion 0.348 0.231 0.744 0.301 0.524 0.291 0.164 0.306 0.596
Segment 1005" 1060" 1231* 15201 16001 16751 1805~" 1853" 18991 Bias MSE
* Segments in which class 1 is winter wheat. tSegments in which class 1 is spring wheat.
Table 5. Estimated labeling errors and proportion estimates Number of AI labeled patterns Wheat 20 17 0.0293 0.97371 0.7057 0.2154 0.3496 0.1932 0.1757 0.3563 0.6265 0.2109 0.2963 0.1085 0.1181 0.3246 71 25 89 77 [0.5455 0.4545] 1-0.0308 0.9692] 0.2723 0.2456 0.1975 Other Estimated//-matrix using method developed in reference 19 Computed fl-matrix comparing AI and GT labels Pt using fl of column 4 PI directly with AI labels
Segment
Location
GT proportion 0.348 0.231 0.744 0.301 0.524 0.291 0.164 0.306
1005
~o
1060
[0.8284 I_0.0165 F0.5732 1-0.0431
0.1716] 0.9835] 0.42681 0.95693
[0.5667 0.4333]
0.2173
1231
V0.9586 0.04141
1520 31 10 15 0.0005 0.9995] [0.5227 0.47731 1-0.0626 0.9374_[ 24 60

L0.0027 0.9973_1
20 70 97 129 67 0.0390 0.9610] r0.4211 0.5789] [_0.0640 0.9360J
71
1604
1675
1-0.1330 0.8670J [0.86290.1371] 0.0363 0.9637J [0.6155 0.3845] 0.00~ 1.0000] [0.54810.4519]
[0.9315 0.0685l 0.1304 0.8696_[ [0.79170.2083] 0.0149 0.9851] [0.46000.5400 l 0.1569 0.8431_[ [0.26670.7333]
1805
Sherman, Texas Cheyenne, Colorado Jackson, Oklahoma Big Stone, Montano Renville, N. Dakota Mcpherson, S. Dakota Gregory, S. Dakota
1853
1899
Ness, Kansas Walsh, N. Dakota
35 r0"9972 0.00281
[_0.0356 0.9644_1
[0.6342 0.36581
[0.8077 0.19231
[0.0615 0.9385-[
0.6216 FO" 0.13561

[_0.2500 0.7500_1
0.6282 0.4221E-01
0.6446E-02
0.596
,ias
0.83=E-0,
0.1357E-01
MSE
1"+,2
214
C.B. CHITTINENI Pattern Recognition and Imagery Processing, Chicago, Illinois, 1979, pp. 52-62. 19. C. B. Chittineni, Estimation of Probabilities of Label Imperfections and Correction of Mislabels. Technical Memorandum, LEMSCO-14356, JSC-16342, March (1980). 20. R. K. Lennington and M. E. Rassbach, Mathematical Description and Program Documentation for Classy: An Adaptive Maximum Likelihood Clustering Method. Technical Memorandum, LEC-12177, Lockheed Electronics Company, Inc., Houston, Texas (1979). 21. K. Fukunaga and D. L. Kessel, Nonparametric error estimation using unclassified samples, IEEE Trans. on Information Theory IT-19, 434-440 (1973).
interest. Using the given, labeled and unlabeled patterns, the problem of obtaining class labels to the modes is formulated as that of minimizing the variance of the proportion estimates of the classes. The criteria of the maximum likelihood, maximization of probability of correct labeling, and minimization of variance of proportion estimates are reformulated to take into account label imperfections in the given, labeled set for known probabilities of label imperfections. Furthermore, experimental results in the processing of remotely sensed multispectral scanner imagery data are presented.
REFERENCES 1. B. Everitt, Cluster Analysis, Wiley, New York (1974). 2. R. O. Duda and P. E. Hart, Pattern Classi/~cation and Scene .Analysis, Wiley, New York (1973). 3. R. P. Heydorn, Methods for Segment Wheat-Area Estimation. Proceedings of the LACIE Symposium, NASA/JSC, Houston, Texas (1978). 4. J.G. Carnes, Detailed Analysis of CAMS Procedures for Phase III Using Ground-Truth Inventories. Lockheed Tech. Memo LEC-13343, NASA/JSC (Houston), April (1979). 5. W.G. Cochran, Sampling Techniques, Wiley, New York (1953). 6. A. Rosenfeld, R. Hummel and S. W. Zucker, Scene labeling by relaxation operations, IEEE Trans. Systems, Man, and Cybernetics SMC-6,420--433, June (1976). 7. S. W. Zucker, E. V. Krishna Murty and R. L. Haar, Relaxation processes for scene labeling: convergence, speed and stability, IEEE Trans. Systems, Man, and Cybernetics SMC-8, 41-48, January (1978). 8. E. F. Beckenbach, Applied Combinatorial Mathematics, Wiley, New York (1964). 9. R. Fletcher and M. J. D. Powetl, A rapidly convergent descent method for minimization, Computer J. 6, 163-168 (1963). 10. I. L. Johnson, The Davidon-Fletcher-Powell penalty function method: a generalized iterative technique for solving parameter optimization problems. Technical note TN D-8251, NASA/JSC, Houston, Texas, May (1976). 11. L. Cooper and D. Steinberg, Introduction to Methods of Optimization, W. B. Saunders, Philadelphia (1970). 12. V. Hasselblad, Estimation of parameters for a mixture of normal distributions, Technometrics 8, 431-446 (1966). 13. N. E. Day, Estimating the components of a mixture of normal distributions, Biometrika 56, 463-474 (1969). 14. C. R. Rag, Linear Statistical Inference and Its Applications, Wiley, New York (1965). 15, W. H. Highleyman, The design and analysis of pattern recognition experiments, Bell Systems Tech. J. 41, 723-744 (1962). 16. C. B. Chittineni, Maximum Likelihood Estimation of Label Imperfection Probabilities and Its Use in the Identification of Mislabeled Patterns. Technical Memorandum, LEC-13678, JSC-16079, Houston, Texas, September (1979). Also to be presented at 1980 Machine Processing of Remotely Sensed Data Symposium, LARS, Purdue University. 17. A. Tennenbein, Estimation From Data Subject to Measurement Error. Ph.D. Dissertation, Statistics Department, Harvard University, Cambridge, Mass. (1969). 18. C. B. Chittineni, Learning With Imperfectly Labeled Patterns. Technical Memorandum, LEC-13068, JSC14867, April 1979. Also in Proc. IEEE Convergence on
APPENDIX A Fixed point iteration schemes for probabilistic cluster labeling with the criterion of probability of correct labeling Fixed point iteration schemes are presented in this appendix for obtaining a,, the probabilities of class labels to the clusters, assuming that the a priori probabilities qi of the classes can be approximately estimated from the given labeled patterns. A.I The labels of the given pattern set are perfect. Since the logarithm is a monotonic function of its argument, taking the log of equation (4-15) results in
P's - log(/~s) -- log

i 1
q~
Iffil
~t,e, .
(A-I)
A fixed point iteration scheme for computing the probabilities of class labels to the clusters, which maximizes it's of equation (A-1) subject to the constraints of equation (3-7), can easily be shown to be the following. a, qie,
~.. Qtlrqrerl
r=l
(A-2)
A.2 The labels of the aiven pattern set are imperfect. From equations (4-1) and (4-3), an expression for the probability of correct labeling can be written as
e s = Y. e(to = i)
i=1
p(~, --- i l x ) p ( x l t o
= i)dX. (A-3)
It is noted that, in equation (6-8), some components of v might be negative. To obtain fixed point iteration equations for a~, an expression for the probability of correct labeling can be obtained as follows. Consider the following
1
p ( x I to = i) = e(o~
C
i) ~, p(X, co = i, to' = j)
= ~ #,~p(Xlto' = j )
./=1
(A-4)
where it is assumed that p(Slto' = j ) = p(X ]co = i, co' = j). (A-5) In terms of probabilities of label imperfections, the a priori probabilities of perfect and imperfectly labeled classes are related as
C
P(to' = i) = ~
)=l
~jiP(cu =j).
(A-6)
Inverting equation (A-6), we get

P(t.o = i) ---- ~. v i j P ( t o ' = j ) . 1=1
(A-7)
From equations (A-3) and (A-4), using the given labeled patterns, an estimate for Ps can be obtained as
Table A-I. Estimation of proportion of class 1 with the criterion of probability of correct labeling Fixed point iteration scheme of equation (A-2) No. of features, d
d=8 d=4 d=6 d=6
With label imperfections, initial a,~ = 0.5, scheme of equation (A-I 1) O GT proportion m=ll m=ll m - - 12 m = 10
d=4 B
Segment 97 106 96 91 101 107 144 91 95 0.58156E-O1 0.84238E-02 - 0.22889E-02 0.34224E-03 0.6285 0.6259 0.3168 0.3185 0.1202 0.1710 0.1487 0.3108 0.6119 0.38111 E-02 0.69849E-03 0.1176 0.2977 0.2431 0.3428 0.5081 0.5001 0.2387 0.2763 0.2727 0.2305 0.5091 0.1773 0.1189 0.3546 0.6325 0.2960E-01 0.36048E-02 0.7255 0.7348 0.7497 0.7229 0.2170 0.2171 0.2754 0.2460 0.2745 0.3762 0.3583 0.2558
Location (county, state) AI labels G T labels
Number of labeled patterns
,8-matrix column 5 of Table 5, ~o~r
~-matrix column 4 of Table 5, ~ ,
No. of clusters, m
o. 0.348 0.231 0.744

~r ~a
1005
Sherman, Texas
1060
Cheyenne, Colorado
5"
1231
Jackson, Oklahoma
1520
Big Stone, Minnesota
0.301 m-7
d=8 d=8 d=6 d=8
1604
Renville, N. Dakota
0.524 m = 12 m--12 m--12 m=7 0.291

o
1675
McPherson, S. Dakota
1805
Gregory, S. Dakota
0.164 0.306 0.596 5"
1853
Ness, Kansas
1899
Walsh, N. Dakota
Bias MSE
t~
216
C.B. CHITTINEN! Substituting sample estimates for p(ta' = j), r/, can be computed from equations (A-7) and (A-9) for use in equation (ARID. A.3 Experimental results. Some simulation results in estimating the proportion of the class of interest using the schemes outlined in sections A. 1 and A.2 are presented in this section. The a priori probabilities in equation (A-2) are estimated as the sample estimates from the given labeled patterns. The same labeled patterns and the cluster statistics of Section 7 are used. The proportion of the class of interest, class 1, is estimated using equation (3-13). The fixed point iteration scheme of equation (A-I l) is used to obtain the probabilities of class labels to the clusters by taking into account the imperfections in the labels. The results are listed in Table A-I. From Table A-I, it is seen that the better estimates are obtained by taking into account the imperfections in the labels.
c where
'
Xj(k)]t
(A-8)
~l~j = P(t~ = i}flij.
(A-9}
Using equation {3-5) in equation (A-8) yields

C ~ m
Ps = F.
i=Ij=Z
,,i F. ~,,ej..
r=l
(A-IOI
By introducing Lagrangian multipliers, a fixed point iteration scheme for computing the optimal probabilities of class labels to the clusters, which maximizes Ps of equation (A-10) subject to the constraints of equation (3-7), can be easily shown to be the following.
C
Otri -C
J='C
(A-II)
~. ~rl ~ ~ljejr 1=1 j=!
About the Author--C. B. CHITTINENIreceived a B.S. degree from Mysore University, India, in 1966; an M.S. degree from the Indian Institute of Science, India, in 1968; and a Ph.D. degree from the University of Calgary, Canada, in 1970 - - all in electrical engineering. He was a postdoctoral fellow at the University of Waterloo, Canada, from 1971 to 1972 and a postdoctoral researcher at the University of California, Irvine, from 1972 to 1973, working in the areas of pattern recognition and image processing. In the Fall of 1973, he was a visiting assistant professor at State University of New York, Buffalo, and taught graduate courses on Information Theory. From 1974 to 1978, he was a senior engineer with the 3M Company, St. Paul, Minnesota, working on the development of systems for visual inspection, process monitoring, control, etc. From 1978 to 1981 he was a principal scientist with Lockheed Electronics Company, Inc., Houston, Texas, working primarily in the applications of pattern recognition and image processing for geological and remotely sensed data. Since 1981 he has been a senior engineering scientist with the Continental Oil Company, Ponca City, Oklahoma working on resource exploration problems. He has done research in such areas as pattern recognition, digital signal processing, digital image processing, modeling, and adaptive control and has published over 55 papers in the areas of his research. Dr. Chittineni is a member of IEEE.

Supervised

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supervised

Uploaded by

Copyright:

Available Formats

Pattern Rc,,,gmtlo, V.I. I'~. Nl~. t, PP. 21)1 21{~. 19R2. Printed in Great Britain.

SOME APPROACHES TO OPTIMAL CLUSTER LABELING WITH APPLICATIONS TO REMOTE SENSING

pEX,U) lto, U) = i] PEte,U) = i

Let the density function of the/th mode of class i be

p,,(X) ~ N(Md,, E~,).

Using equations (2-9)-(2-11) in equation (2-8) yields L > C } log(2n) - L 2 where

p(X I(o = i) = p(X Ill = l).

p(X [to = i) ~ N(M~, Y.f)

Then, the log likelihood criterion becomes L = T - n C log(2n)-

~,:_~ log( IZ,~I )

(X,(j) - M~) Yi ~(X,(j) - M~) .

p(X) = ~. p((o = i) p(X Io9 = i).

a~o~(X) > ~ ailog[gi(X)]

On comparing equations (3-2) and (3-3). the following assumption is made.

(3-4) Equation (3-4) can be rewritten as

p(co = , I x ) = ~ P(co = ilf~ = l)p(f~ = I l X ).

/pEn-- lx,ull ogl i,l

~'~ Nrerl r=t

4. C L U S T E R L A B E L I N G B A S E D O N T H E C R I T E R I O N O F PROBABILITY OF CORRECT LABELING

If the class-conditional densities are known, the a

q~ N-~ ~ {E[rl(j)] -- u~}

c 1 Ni ~. ,=IE e(~ = i ) ~ ; _E ,,=,.

1 '~ eu = ~j=~l pill. = l I X,(/)].

The probabilities q~ and a, are related as follows. qi = ~ e,6,;

Subject to the constraints

E(Ps) = ~. q,E(fi,) = ~ q,u, = Ps.

e,/= 1" l = 1,2 ..... m

(4-16) ~euf,=qi; i = 1 , 2 ..... C i = 1 , 2 ..... C

= ,=,~ qi2~./~,_ (E{[ri(-/)]z}- u/2).

4.2 A relationship between m a x i m u m likelihood and

a,p[f~ = l I Xi(j) ] . (4-17)

L = K I-[ I-I ()',~)"'~ FI [Pc(J)] vj+"~

An estimate ~i for the proportion q~may be obtained as follows.

ql = P(co = i) = Y~ P(co = i, COc = j) = Y~ Pc(J);.q. (5-5)

For particular a priori

probabilities and class-

(a) Confusion matrix of labeled panern set

Number belonging to each class

(b) Matrix of classifications of unlabeled set Classifier label

hi. ~ ~. Ilij. j=l c H.j = ~. nij.

Y" nq, the total number of labeled

patterns. Vi = number of unlabeled patterns for which the classifier label is j.

Pc(J) = N-~T,=,~"P ( w = J t Y ' ) = where

,=,Z p(n = s I v,i

1 Ni qi-~i ,: , p[o2 = j IX,(I)] ~. 1 N~ i)dX

U s i n g t h e given l a b e l e d a n d u n l a b e l e d p a t t e r n s , est i m a t e s for Pc(J) a n d 2 u c a n b e o b t a i n e d f r o m e q u a t i o n s (3-5), (5-9) a n d (5-10) as follows:

Using equations (5-1 I) and (5-12) in equation (5-5) yields

P(co' = i) = ~ flj, P(co = j )

P(co' = i)p(X Ico' = i) = ~ flj, P(co = j ) p ( X Io9 = j )

p(co' = i l X ) = ~ flj, p(co = j l X )

where it is assumed that p(S it = J) = p(X it' = i, to = j). (6-6)

Let fl be the matrix of probabilities of label imperfections where

Let v = (fir)- 1. (6-8)

P(co = i)p(X [to = i) = ~ vijP(co' = j ) p ( X Ico' =j).

6. F O R M U L A T I O N W I T H L A B E L I M P E R F E C T I O N S OF THE GIVEN LABELED SET

Using equations (3-5) and (6-5) in equation (6-10) yields L log

Optimal cluster labeling with applications to remote sensing

~, v,P((o' = I1 p(co=jlX)p(X]~o' = I)dX

C ~ C v.P(co'= I)I p(co=jlX)p(Xl~.'= l)dX i=II=l

[p(co = i] X)][P(co = i)p(X ]co = i)] dX

xF I ~p[(o=ilXfl)]]. (6-15, LNj

Using equations (3-5) and (6-3) in equation (6-15) yields

the problem can be formulated as follows.