neural network for electronic tongue data classification Introduction
Finding a good pattern recognition method is an
important aspect in e-tongue research field. Several successful patter recognition methods were used in past. BPPN , SVM ,LDA ,QDA etc. But most of these methods need data preprocessing and some need to assume multivariate normal assumption. Introduction(Continued)
By doing these will make data structure more complex
and difficult to interpret the results. Also it is time consuming. Random forest doesn’t need data processing. And no researches has conducted based on the random forest. Objective
Introduce Random Forest for E-tongue data and
investigate the performance of it . And focus on the prediction performance in classification. Then compare the performance of RF with currently more popular methods, BPNN () and SVM() Data
Orange beverages of two concentration levels, 100%
and 10% original juice content. Chinese vinegars of two quality grade, grade one and grade two (according to total acid content). Chinese vinegars of four kinds of materials, abbreviated as alcohol (AlcoholV), grain sorghum (Gra-SorV), glutinous rice (Glu-RiceV) and rice (RiceV) Chinese aromatic vinegar of twelve brands. Data
Each data set has 7 features
Each is the signal of one sensor response of e-tongue Data
Data was not preprocessed.
1/6th of each data set were randomly taken as external testing set. Performance Assessment (multiclass)
5-fold cross validation (CV) procedure with 20 replications is used.
The average of these 20 5-fold CV results was taken as the overall
assessment of the performance during model optimization.
Independent (external) testing test was used to asses the
performance. Results and Discussion –(PCA)
PCA was performed on the four data sets prior to classification to
visualize the distribution of each sample of each class.
The score plots of each sample vs. the first two PCs are presented Results and Discussion –(PCA Continued) Results and Discussion –(PCA Continued)
These results imply that the investigated cases are non-
linear separable problems. The overlap areas means that the data structure is complex. And linear classifiers would not be suitable. Results and Discussion
The Juice-concentration data
100% - positive and 10% - negative.
ACC is 100% on CV set and external testing set for all 3 classifiers. Results and Discussion
Vinegar - grade data (classification)
Grade 1 - positive and Grade 2 - negative Results and Discussion
Vinegar - grade data (Regression)
Results and Discussion
Vinegar - material data (Correct Rates)
Confusion matrix using SVM
Results and Discussion
AromaticV –brand data
Results and Discussion
Summary of results Conclusion
Random forest exhibits better classification performance
for E-tongue data than two other classification methods without doing any data preprocessing.
Especially for unbalanced , multiclass and small sample
data sets.
Random forest has a great deal of potential to be
introduced to E tongue field for similar applications.