You are on page 1of 20

Comparison of random forest, support

vector machine and back propagation


neural network for electronic tongue
data classification
Introduction

 Finding a good pattern recognition method is an


important aspect in e-tongue research field.
 Several successful patter recognition methods were
used in past.
BPPN , SVM ,LDA ,QDA etc.
 But most of these methods need data preprocessing
and some need to assume multivariate normal
assumption.
Introduction(Continued)

 By doing these will make data structure more complex


and difficult to interpret the results.
 Also it is time consuming.
 Random forest doesn’t need data processing.
 And no researches has conducted based on the
random forest.
Objective

 Introduce Random Forest for E-tongue data and


investigate the performance of it .
 And focus on the prediction performance in
classification.
 Then compare the performance of RF with currently
more popular methods, BPNN () and SVM()
Data

 Orange beverages of two concentration levels, 100%


and 10% original juice content.
 Chinese vinegars of two quality grade, grade one and
grade two (according to total acid content).
 Chinese vinegars of four kinds of materials, abbreviated
as alcohol (AlcoholV), grain sorghum (Gra-SorV),
glutinous rice (Glu-RiceV) and rice (RiceV)
 Chinese aromatic vinegar of twelve brands.
Data

 Each data set has 7 features


Each is the signal of one sensor response of e-tongue
Data

 Data was not preprocessed.


 1/6th of each data set were randomly taken as external
testing set.
Performance Assessment (multiclass)

 𝑁𝑐 = # 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠


 𝑁𝑛𝑐 = # 𝑜𝑓 𝑤𝑟𝑜𝑛𝑔 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
 CR =correct rate
 ER =Error rate
Performance Assessment (two-class)

SEN Class Sensitivity


SPE Class Specificity
EFF Efficiency
PRE Precision
ACC Overall prediction Accuracy
MCC Mathews correlation coefficient
TPR True positive rate
TNR True negative rate
Validation method

 5-fold cross validation (CV) procedure with 20 replications is used.

 The average of these 20 5-fold CV results was taken as the overall


assessment of the performance during model optimization.

 Independent (external) testing test was used to asses the


performance.
Results and Discussion –(PCA)

 PCA was performed on the four data sets prior to classification to


visualize the distribution of each sample of each class.

 The score plots of each sample vs. the first two PCs are presented
Results and Discussion –(PCA
Continued)
Results and Discussion –(PCA
Continued)

 These results imply that the investigated cases are non-


linear separable problems.
 The overlap areas means that the data structure is
complex.
 And linear classifiers would not be suitable.
Results and Discussion

 The Juice-concentration data

100% - positive and 10% - negative.


ACC is 100% on CV set and external testing set for all
3 classifiers.
Results and Discussion

 Vinegar - grade data (classification)


Grade 1 - positive and Grade 2 - negative
Results and Discussion

 Vinegar - grade data (Regression)


Results and Discussion

 Vinegar - material data (Correct Rates)

 Confusion matrix using SVM


Results and Discussion

 AromaticV –brand data


Results and Discussion

 Summary of results
Conclusion

 Random forest exhibits better classification performance


for E-tongue data than two other classification methods
without doing any data preprocessing.

 Especially for unbalanced , multiclass and small sample


data sets.

 Random forest has a great deal of potential to be


introduced to E tongue field for similar applications.

You might also like