ECG Signal Classification

Chapter 3
ECG Signal Classification
The previous chapter reviewed existing methods for ECG signal and image classifica-
tion. In this chapter, methods for ECG signal pre-processing, feature extraction and
classification is presented. Support Vector Machines (SVM), Auto associative Neural
Network (AANN), Gaussian Mixture Models (GMM) are the models used for classifi-
cation. Section 3.2 explains the method for feature extraction of ECG signal. Section
3.2.1 explains the procedure for ECG signal preprocessing. Section 3.2.2 describes
the Morphological feature extraction, Section 3.2.3 and Section 3.2.4 describes LPC
and MFCC computation respectively. Modeling techniques used for classification are
presented in Section 3.3. Section 3.4 presents the performance measures used in the
proposed work. The experimental results for normal/abnormal classification is given in
Section 3.5.1. Section 3.5.2 discusses the experimental results for disease classification.
In Section 3.5.3 the disease subcategory classification performance is presented.
3.1 Introduction
Feature extraction plays an important role in any classification task. It is the process of
finding the most informative and yet compact set of features, so that the effectiveness
of machine learning task can be enhanced. The main objective of the ECG feature
extraction process is to derive a set of parameters that best characterizes the ECG
signal. These parameters should contain maximum information about the ECG signal.
49
Hence the selection of these parameters is an important criterion to be considered for
proper classification. Cardiac disease classification, therefore, involves determination
of several characteristic features of the ECG signal [128]. More importantly, the most
common and appropriate ways of representing data for any classification and regression
problems are feature vectors.
In this work the normal and abnormal ECG classification is made in which three
major issues have been focused for abnormal category. They are Arrhythmia, Myocar-
dial Infarction and Conduction Blocks in which each disease is further classified into
three subcategories namely Supra ventricular tachycardia, Atrial fibrillation, Ventric-
ular tachycardia of Arrhythmia, Anteroseptal infarction, Anterior infarction, Inferior
infarction of Myocardial infarction, Atrio ventricular blocks, Left bundle branch blocks
and Right bundle branch blocks of Conduction blocks respectively.
Morphological features in an ECG are the essential features for diagnosing various
cardiac diseases. Morphological analysis of ECG signal adopts various signal processing
strategies over the past two decades [62]. Linear prediction technique is one of the most
powerful signal analysis technique for encoding good quality signals at a low bit rate,
and widely used in a variety of fields such as medical signals, speech and audio signal
processing. ECG is often contaminated by noise and artifacts, therefore, denoising is
the most significant stage introduced at each preprocessing stage. After pre-processing,
the second stage towards classification is to extract features from the signals. ECG
being a non-stationary signal, the irregularities may not be periodic and may show
up at different intervals. Selection of an efficient technique to analyze these types
of signals is an important task. Wavelet transform has been proven as a useful tool
for ECG [6] signal analysis and it is widely used in biomedical signal processing and
denoising applications [7], [8], [9].
50
In the proposed work three types of features are used for ECG signal classification.
They are Morphological features of ECG, Linear Prediction Coefficients (LPC) and
Mel Frequency Cepstral Coefficients (MFCC). In preprocessing stage the noise and
artifacts in the signal are removed using DWT. This chapter addresses preprocessing,
feature extraction and classification methods for ECG signal.
3.2 ECG Signal Feature Extraction

3.2.1 Preprocessing
The ECG recording retrieved from the databases may consist of various kind of noise
such as muscle contraction, electrode displacement, patient movements and so on.
Hence the signal should be pre-processed to remove these kinds of artifacts from the
signal for further processing. ECG signals are preprocessed using the filtering process.
In this work, the Daubechies filter of Discrete Wavelet Transform (DWT) is used for
denoising and Morphological feature extraction of the ECG signal. Throughout this
work, a sampling rate of 360 Hz, 16 bit monophonic, Pulse Code Modulation (PCM)
format of ECG signal is adopted.
Discrete Wavelet Transform
An ECG signal changes over the time marker with respect to heartbeat events. The
wavelet transform is a method for the complete frequency localization of a signal.
Wavelet transform may use long sampling intervals where low-frequency information is
needed and shorter sampling intervals where high-frequency information is available.
The major advantage of wavelet transform is its ability to perform multiresolution
analysis for event localisation with respect to all frequency components in data over
time to space. Thus, wavelet analysis is capable of revealing aspects of data that other
signal analysis techniques miss, such as breakdown points, and discontinuities in higher
51
derivatives [129].
Daubechies wavelets are compactly orthonormal wavelets that make discrete wavelet
analysis practicable [129]. The Daubechies wavelet is conceptually more complex but,
it picks up detail that is missed by the Haar wavelet algorithm [130]. Choosing a
wavelet function which closely matches the signal to be processed is of extreme impor-
tance in wavelet applications [131]. It is used to obtain the characteristic waves of the
ECG signal from which a set of features are derived. The 8 level wavelet decompo-
sition based on Daubechies 6 wavelet functions are considered here. The Daubechies
6 wavelets are chosen based on their shape and their ability to analyze the signal in
this particular application. The shape of Daubechies wavelets is similar to that of
the shape of QRS complex and their energy spectrum are concentrated around low
frequencies. Wavelet processing is based on the idea of sub-band decomposition and
coding. The two basic wavelet processes are decomposition and reconstruction. It
decomposes a signal into a set of basis functions called wavelets. Signal decomposition
using DWT is shown in Fig. 3.1. LoD and HiD are low pass and high pass decompo-
Fig. 3.1: Signal decomposition using DWT
52
sition filters respectively. 2 ↓ 1 or 1 ↓ 2 represents down sampling by 2. cA and cD
are the approximation and detail coefficients.
Wavelet transform theory uses two major concepts: scaling and shifting. Scal-
ing, through dilation or compression, provides the capability to analyse a signal over
different windows (sampling periods) in the data, while shifting, through delay or
advancement, provides translation of the wavelet kernal over the entire signal. The
wavelet transform is based on the principle of linear series expansion of a signal using
a set of orthonormal basis function. Through linear series expansion, a signal f(t) can
be uniquely decomposed as a weighted combination of orthonormal basis functions as

n
X
f (t) = ak ϕn (t) (3.1)
where n is an integer index with n ∈ Z (Z is a set of all integers), ak are weights and
ϕn (t) are orthonormal basis functions.
A single level decomposition puts a signal through two complementary low-pass
and high-pass filters. The output of the low-pass filter gives the Approximation (A)
coefficients, while the high pass filter gives the Detail (D) coefficients. The A and
D coefficients can be used to reconstruct the signal absolutely while run through the
mirror reconstruction filters of the wavelet family. Daubechies-6 wavelet family is used
for filtering the noise. The output of the filter is the down sampled version of the 8th
level coefficients and it is reconstructed to find the R Peak in which 8th level has larger
variations obviously. Minor variations are visible in the lower level itself. So the other
peaks such as P, Q, S and T are detected in this level of reconstruction. To obtain
the ECG features, the characteristic points P, Q, R, S and T are obtained at different
decomposition levels [128].
53
3.2.2 Morphological Feature Extraction
In this work Morphological features such as R peak count, QRS interval, PR interval,
QT interval, ST interval, RR interval, PP interval and TT interval are extracted.
R-peak has the highest amplitude in an ECG signal. R-peak is detected by using
the Daubechies 8th level reconstructed coefficients. The heart rate for each patient is
calculated by finding the distance between two R peaks (R-R). The other peaks are
identified by traversing the windowing function on either side of R peak. The Q and
S peaks are found by traversing on the left and right side of the R peak within the
specified window and locating the minimum or negative peak values. By traversing
the left side of the Q peak the maximum value is found to be the P peak. Similarly
by traversing the right side of the S peak, the maximum value is found to be the T
peak. The onset and offset of all points are calculated. Depending upon these data
points the Morphological features are extracted. The steps in wavelet decomposition
for 8 levels is given as follows:
8-level wavelet decomposition using Daubechies 6
• R peak: The R peak is detected by combining detail components D3, D4 and
D5 using adaptive threshold value. The values greater than threshold is taken
as R peak
T hresholdvalue = Max[signal] ∗ Mean[signal] (3.2)
• Q Peak: Q peak is detected, as local minimum point obtained within next
25 samples from the left side of R-peak (before R-peak) by combining detail
components D2, D3, D4 and D5
• P peak: P peak is detected obtained within next 5 samples from left side of Q
wave (before Q wave) by combining detail components D6 and D7
54
• S Peak: S peak is detected, as local minimum point obtained within next 5 sam-
ples from right side of R- peak (after R-peak) by combining detail components
D2, D3, D4 and D5.
• T Peak: T peak is detected, as maximum point obtained within 90 samples
from P peak by combining detail components D6 and D7
Based on all peak values obtained the wavelet domain parameters of ECG have been
calculated. They are
1. P-P interval (IP P ) is the mean of P-P interval durations. The P-P interval is
obtained by
IP P = Pi+1 − Pi , i = 1, 2, .....N − 1 (3.3)
2. R-R interval (IRR ) is the mean of R-R interval durations. The R-R interval is
obtained by
IRR = Ri+1 − Ri , i = 1, 2, .....N − 1 (3.4)
3. P-R interval (IP R ) is the time duration between successive P and R waves in
each beat. The P-R interval is obtained by
IP R = R − Pon−set (3.5)
4. QRS Duration (IQRS ) is the time duration from the beginning of the Q wave to
the end of the S wave. The QRS duration is calculated by
IQRS = TS − TQ (3.6)
5. QT Interval (IQT ) Duration is the time from the beginning of the Q-wave to the
end of the T wave. It is obtained by
IQT = Tof f −set − Q (3.7)
55
6. T-T interval (IT T ) is the mean of T-T interval durations, obtained by
IT T = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.8)
7. ST interval (IST ) is
IST = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.9)
The heart rate is calculated from the RR interval time series is given by (3.10)
60
HR = (3.10)
R − R interval (seconds)
3.2.3 Linear Prediction Coefficients
Linear Predictive Coefficients (LPC) analysis is one of the most powerful tools in signal
processing, especially speech signals which is able to extract the dominant features of
speech signal. The capability of precise estimation of signal parameters and high speed
computations is considered as an advantage of this technique which has become a base
to use these coefficients in evaluation of ECG signal changes [53]. A pth order LP
analysis is used to capture the properties of the signal.
In the LP analysis of ECG, each sample is predicted as linear weighted sum of
the past p samples, where p represents the order of prediction [47], [48]. If s(n) is the
present sample, then it is predicted by the past p samples as

p
X
ŝ(n) = − ak s(n − k) (3.11)
k=1
The LPC are obtained using Levinson-Durbin recursive algorithm. This is known
as LPC analysis. The difference between the actual and the predicted sample value is
termed as the prediction error or residual, and is given by

p
X
e(n) = s(n) − ŝ(n) = s(n) + ak s(n − k) (3.12)
k=1
p
X
= ak s(n − k), a0 = 1 (3.13)
k=0
56
For an ECG frame of size m samples, the mean square of prediction error over the
whole frame is given by,

" p
#2
X X X
E= e2 (m) = s(m) − ak s(m − k) (3.14)
m m k=1
Optimal predictor coefficients will minimize this mean square error. At minimum value
of E,
∂E
= 0, k = 1, 2, . . . p (3.15)
∂ak
Differentiating using (3.14) and equating to zero we get,
Ra = r (3.16)
where , a = [a1 a2 · · · ap ]T , r = [r(1)r(2) · · · r(p)]T , and R is a Toeplitz symmetric
autocorrelation matrix given by,

 
 r(0) r(1) ··· r(p − 1) 
 
 r(1) r(0) ··· r(p − 2) 
 
R=  .
 (3.17)
 .. .. .. .. 
 . . . 

 
r(p − 1) ··· ··· r(0)
using (3.16) can be solved for predictor coefficients using Durbin’s algorithm as follows:
E (0) = r[0] (3.18)

PL−1
r[i] − j=1 αji−1 · r[|i − j|]
ki = 1≤i≤p (3.19)
E (i−1)
αii = ki (3.20)
(i−1) (i−1)
αji = αj − ki · αi−j (3.21)
E i = (1 − ki2 ) · E i−1 (3.22)
The above set of equations is solved recursively for i = 1, 2 . . . , p. The final solution is
given by
(p)
am = αm 1≤m≤p (3.23)
57
where, am ’s are Linear prediction coefficients. In this work, 14th order LP coeffi-
cients are extracted. As previously mentioned, linear predictive coefficients are used
to directly estimate the parameters of speech signal, so we could refer to speakers
recognition, speech recognition, speech classification and signal dereverberation as im-
portant applications of this analysis. It is noteworthy to know that because of time
varying nature of the signal, coefficients must be calculated from short segments [132].
Consequently, in order to extract these coefficients in an interval including 100 samples
of QRS complex, 14 LPC coefficients are determined and these coefficients are used as
inputs of classification block.
3.2.4 Mel Frequency Cepstral Coefficients
Mel Frequency Cepstral Coefficients (MFCC) are short-term spectral features and are
widely used in the area of audio and speech processing. The mel frequency cepstrum
has proven to be highly effective in recognizing the structure of audio signals and in
modeling the subjective pitch and frequency content of audio signals. The MFCCs
have been applied in a range of audio mining tasks, and have shown good performance
compared to other features [49]. MFCC is computed by various authors in different
methods. Although MFCCs have been used in music identification, there is very few
work done for heart sound analysis using MFCCs. In this work the MFCC features
are used for analysing the Electrocardiogram signal.
MFCC filters are spaced linearly at low frequencies and logarithmically at high
frequencies to capture the important characteristics of ECG signal. To obtain MFCCs,
the ECG signals are segmented and windowed into frames of 10 seconds. Fig. 3.2
describes the procedure for extracting the MFCC features.
• Mel frequency wrapping: Magnitude spectrum is computed for each of these
58
Fig. 3.2: Block diagram of MFCC computation
frames using FFT and converted into a set of mel scale filter bank outputs.
The filterbank analysis provides a much more straightforward route to obtain
the desired non-linear frequency resolution. However, filterbank amplitudes are
highly correlated and hence, the use of a cepstral transformation in this case is
virtually mandatory. A simple Fourier transform based filterbank is designed
to give approximately equal resolution on a mel-scale. Fig. 3.3 illustrates the
general form of this filterbank. As can be seen, the filters used are triangular
and they are equally spaced along the mel-scale which is defined by
f
Mel(f ) = 2595 log10 (1 + ) (3.24)
700
To implement this filterbank, the window of ECG data is transformed using a
Fourier transform and the magnitude is taken. The magnitude coefficients are
then binned by correlating them with each triangular filter. Here binning means
that each FFT magnitude coefficient is multiplied by the corresponding filter
gain and the results are accumulated. Thus, each bin holds a weighted sum
representing the spectral magnitude in that filterbank channel.
Normally the triangular filters are spread over the whole frequency range from
59
zero upto the Nyquist frequency. However, band-limiting is often useful to
reject unwanted frequencies or avoid allocating filters to frequency regions in
which there is no useful signal energy. For filterbank analysis, lower and upper
frequency cut-offs can be set. When low and high pass cut-offs are set in this
way, the specified number of filterbank channels are distributed equally on the
mel-scale across the resulting pass-band.
freq
m1 mj mP
Energy in
... ...
Each Band
MELSPEC
Fig. 3.3: Mel scale filter bank
• Cepstrum: Logarithm is then applied to the the filter bank outputs followed
by discrete cosine transformation to obtain the MFCCs. Because the mel spec-
trum coefficients are real numbers (and so are their logarithms), they may be
converted to the time domain using the DCT. In practice the last step of taking
inverse DFT is replaced by taking DCT for computational efficiency. Typically,
first 13 MFCCs are used as features [49], [133].
In the proposed work 360 values are reduced to 8 dimensional Morphological features,
14 dimensional LPC and 13 dimensional MFCC features respectively. These features
are extracted as described for different categories of cardiac diseases.
3.3 Modeling Techniques
SVM, AANN and GMM classifiers are the models used in this work for classifying the
ECG data based on Morphological features, LPC and MFCC extracted from the ECG
60
signal.
3.3.1 Support Vector Machine
Support Vector Machine (SVM) is a statistic machine learning technique that has been
successfully applied in the pattern recognition area and is based on the principle of
Structural risk minimization [73], [74], [134], [135]. Fig. 3.4 shows the architecture of
the SVM. SVM constructs a linear model to estimate the decision function using non-
Fig. 3.4: Architecture of the SVM (Ns is the number of support vectors).
linear class boundaries based on support vectors. If the data are linearly separated,
SVM trains linear machines for an optimal hyperplane that separates the data without
error and into the maximum distance between the hyperplane and the closest training
points. The training points that are closest to the optimal separating hyperplane are
called support vectors.
SVM maps the input patterns into a higher dimensional feature space through
some nonlinear mapping chosen a priori. A linear decision surface is then constructed
in this high dimensional feature space. Thus, SVM is a linear classifier in the parameter
space, but it becomes a non-linear classifier as a result of the non-linear mapping of
the space of the input patterns into the high dimensional feature space.
61
Fig. 3.5: An example for SVM kernel function Φ(x) maps 2-dimensional input space to
higher 3-dimensional feature space. (a) Nonlinear problem. (b) Linear problem.
For linearly separable data, SVM finds a separating hyperplane which separates
the data with the largest margin. For linearly inseparable data, it maps the data in the
input space into a high dimension space x ∈ RI 7→ Φ(x) ∈ RH with kernel function
Φ(x), to find the separating hyperplane. An example for SVM kernel function Φ(x)
maps 2-Dimensional input space to higher 3-Dimensional feature space as shown in
Fig. 3.5. SVM was originally developed for two class classification problems. The N
class classification problem can be solved using N SVMs. Each SVM separates a single
class from all the remaining classes (one-vs-rest approach).
SVM generally applies to linear boundaries. In the case where a linear boundary is
inappropriate SVM can map the input vector into a high dimensional feature space. By
choosing a non-linear mapping, the SVM constructs an optimal separating hyperplane
in this higher dimensional space as shown in Fig. 3.5. The function K is defined as the
kernel function for generating the inner products to construct machines with different
types of non-linear decision surfaces in the input space.
(x,xi ) = Φ (x) .Φ (xi ) (3.25)
The kernel function may be any of the symmetric functions that satisfy the Mercer’s
62
Table 3.1: Types of SVM inner product kernels.
Types of kernels Inner Product Kernel K(xT , xi ) Details

Polynomial (xT xi + 1)p Where x is input patterns,
xi is support vectors,
" 2 #
T

x −x i

Gaussian exp − 2σ2
σ 2 is variance, 1 ≤ i ≤ Ns ,
Ns is number of support vectors,

Sigmoidal tanh β0 (xT xi ) + β1 β0 , β1 are constant values.
p is degree of the polynomial
conditions. There are several SVM kernel functions as given in Table 3.1. The dimen-
sion of the feature space vector Φ(x) for the polynomial kernel of degree p and for the
input pattern dimension of d is given by
(p + d)!
(3.26)
p! d!
For sigmoidal kernel and Gaussian kernel, the dimension of feature space vectors is
shown to be infinite. Finding a suitable kernel for a given task is an open research
problem. Given a set of ECG corresponding to N categories for training, N SVMs
are trained. Each SVM is trained to distinguish between one category and all other
categories in the training set. During testing, the class label l of an ECG x can be
determined using (3.27).


 n, if dn (x) + t > 0

l= (3.27)
 0, if dn (x) + t ≤ 0

where dn (x)= max {di (x)}N

i=1 , and di (x) is the distance from x to the SVM hyperplane
corresponding to category i. The classification threshold is t, and the class label l = 0
stands for unknown.
63
3.3.2 Autoassociative Neural Network Model
The five layer Autoassociative Neural Network (AANN) model is used to capture the
distribution of the ECG feature vectors. The general topology of AANN is discussed
in this section. In this network, the second and fourth layers have more units than
the input layer. The third layer has fewer units than the first or fifth. The processing
units in the first and third hidden layer are non-linear, and the units in the second
compression/hidden layer can be linear or non-linear. The network is trained using
backpropagation algorithm [84], [136]. As the error between the actual and the desired
output vectors is minimized, the cluster of points in the input space determines the
shape of the hypersurface obtained by the projection onto the lower dimensional space.
Fig. 3.6(b) shows the space spanned by the one dimensional compression layer for the
2 dimensional data shown in Fig. 3.6(a) for the network structure 2L 10N 1N 10N 2L,
where L denotes a linear unit and N denotes a non-linear unit. The non-linear units
use tanh(s) as the activation function, where s is the activation value of the unit. The
integer value indicates the number of units used in that layer. The backpropagation
learning algorithm is used to adjust the weights of the network to minimize the mean
square error for each feature vector. The solid lines shown in Fig. 3.6(b) indicate
mapping of the given input points due to the one dimensional compression layer. Thus,
one can say that the AANN captures the distribution of the input data depending on
the constraints imposed by the structure of the network.
In order to visualize the distribution better, one can plot the error for each input
data point in the form of some probability surface as shown in Fig. 3.6(c). The error
Ei for the data point i in the input space is plotted as pi = exp(−Ei /α) , where α is a
constant. Note that pi is not strictly a probability density function, but the resulting
surface is called probability surface. The plot of the probability surface shows a large
64
0.1 0.1
10 10
0.05 0.05
0 5 0 5
−4 −4
−2 0 −2 0
0 0
2 −5 2 −5
4 4
(a) (b)
(c)
Fig. 3.6: Distribution capturing ability of AANN model. From [1]. (a) Artifi-
cial 2 dimensional data. (b) 2 dimensional output of AANN model with the struc-
ture 2L 10N 1N 10N 2L. (c) Probability surfaces realized by the network structure
2L 10N 1N 10N 2L.
amplitude for smaller error Ei , indicating better match of the network for that data
point. The constraints imposed by the network can be seen by the shape the error
surface takes in both the cases. One can use the probability surface to study the
characteristics of the distribution of the input data captured by the network. Ideally,
one would like to achieve the best probability surface, best defined in terms of some
measure corresponding to a low average error.
During AANN training, the weights of the network are adjusted to minimize the
mean square error obtained for each feature vector. If the adjustment of weights is done
for all feature vectors once, then the network is said to be trained for one epoch. For
successive epochs, the mean square error is averaged over all feature vectors. During
testing phase, the features extracted from the test data are given to the trained AANN
65
model to obtain the average error.
The standard backpropagation neural network training algorithm is used to adjust the
weights in AANN. All the initial weights are randomly chosen by the backpropagation
training algorithm. Only the number of epochs is to be specified.
3.3.3 Gaussian Mixture Models
The probability distribution of feature vectors is modeled by parametric or non-
parametric methods. Models which assume the shape of probability density func-
tion are termed parametric. In non-parametric modeling, minimal or no assumptions
are made regarding the probability distribution of feature vectors. The potential of
Gaussian mixture models to represent an underlying set of ECG classes by individual
Gaussian components, in which the spectral shape of the ECG class is parameterized
by the mean vector and the covariance matrix, is significant. Also, these models have
the ability to form a smooth approximation to the arbitrarily-shaped observation den-
sities in the absence of other information [137]. With Gaussian mixture models, each
ECG is modeled as a mixture of several Gaussian clusters in the feature space. The
basis for using GMM is that the distribution of feature vectors extracted from a class
can be modeled by a mixture of Gaussian densities as shown in Fig. 3.7. For a D
dimensional feature vector x, the mixture density function for category s is defined as
M
p(x/λs ) = αsi fis (x)
P
i=1
The mixture density function is a weighted linear combination of m component uni-
modal Gaussian densities fis (.). Each Gaussian density function fis (.) is parameterized
by the mean vector µsi and the covariance matrix Σsi using
1
fis (x) = √ exp(− 12 (x − µsi )T (Σsi )−1 (x − µsi )),
(2π)d |Σsi |
66
Fig. 3.7: Gaussian mixture models
where (Σsi )−1 and |Σsi | denote the inverse and determinant of the covariance matrix
M
Σsi , respectively. The mixture weights (αs1 , αs2 , ..., αsM ) satisfy the constraint αsi =
P
i=1
s s
1. Collectively, the parameters of the model λ are denoted as λ = {αi ,µsi , Σsi },
s
i = 1, 2, · · · , M. The number of mixture components is chosen empirically for a given
data set. The parameters of GMM are estimated using the iterative expectation-
maximization algorithm [138].
The motivation for using Gaussian densities as the representation of ECG features
is the potential of GMMs to represent an underlying set of ECG classes by individual
Gaussian components in which the spectral shape of the ECG class is parameterized
by the mean vector and the covariance matrix. Also, GMMs have the ability to form
a smooth approximation to the arbitrarily-shaped observation densities in the absence
of other information [137]. With GMMs, each ECG is modeled as a mixture of several
Gaussian clusters in the feature space.
67
3.4 Performance Measures
Performance measurement is the process of collecting, analyzing and/or reporting in-
formation regarding the performance of a system or component. Sensitivity and speci-
ficity are statistical measures of the performance of a binary classification test, also
known in statistics as classification function.
Sensitivity:
Sensitivity (also called the true positive rate, or the recall in some fields) measures
the proportion of positives that are correctly identified as such. It refers to the test’s
ability to correctly detect patients who do have the condition. Mathematically, this
can be expressed as:
No. of T rue P ositives

Sensitivity = (3.28)
No. of T rue P ositives + No. of F alse Negatives
Specificity:
Specificity (also called the true negative rate) measures the proportion of negatives
that are correctly identified as such. It relates to the test’s ability to correctly detect
patients without a condition. Mathematically, this can also be written as:
No. of T rue Negatives

Specif icity = (3.29)
No. of T rue Negatives + No. of F alse P ositives
Precision:
In a classification task, the precision for a class is the number of true positives (i.e.
the number of items correctly labeled as belonging to the positive class) divided by
the total number of elements labeled as belonging to the positive class (i.e. the sum
of true positives and false positives, which are items incorrectly labeled as belonging
to the class).
T rue P ositives
P recision = (3.30)
T rue P ositives + F alse P ositives
68
Recall:
Recall is defined as the number of true positives divided by the total number of elements
that actually belong to the positive class (i.e. the sum of true positives and false
negatives, which are items which were not labeled as belonging to the positive class
but should have been).
T rue P ositives
Recall = (3.31)
T rue P ositives + F alse Negatives
F- Measure:
F-Measure is a measure of a test’s accuracy that considers both the precision and the
recall of the test to compute the score (Harmonic mean).
P recision ∗ Recall
F − Measure = 2 (3.32)
P recision + Recall
Accuracy:
The accuracy of a measurement system is a level of measurement that yields true (no
systemic errors) and consistent (no random errors) results.
T rue P ositives + T rue Negatives

Accuracy = (3.33)
T otal No. of Samples
3.5 Experimental Results
Experiments are carried out using different features in signal and image processing but
features like morphological, LPC and MFCC has given better performance compared
with other features.
Dataset: In order to validate the effectiveness of the proposed algorithms, assessment
of the performance in terms of accuracy should be made. As there is no golden rule to
determine the peak, onset and offset of the ECG waves, validation of the ECG feature
69
detection algorithms must be done using databases with manual annotations. The
basic idea here is to compare manually annotated results of the clinical features of a
specific heartbeat to the ones generated by the algorithms.
The experiments are conducted for the ECG wav data collected from different
Physiobank databases with different age groups of both men (1186) and women (814).
The total duration of the ECG signal is from 30 minutes to 1 hour, which is sampled
at 360 hertz and encoded by 16-bit, Pulse Code Modulation (PCM) format. The ECG
signal of 10 seconds duration is taken from each for experimentation. 2000 ECG clips
of all categories are taken for conducting experiments. For each disease 200 ECG clips
are considered in which 150 are given for training and 50 are given for testing. The ratio
of training and testing data in terms of accuracy is shown in Table 3.4. The dataset
collected from different sources are given in Table 3.2, 3.3. Along with the Physiobank
dataset, the realtime dataset collected from Raja Muthaiah Medical College Hospital
(RMMCH), Annamalai University and Mahatma Gandhi Medical College Hospital
(MGMCH), Pondicherry are also taken for conducting experiments. The reason for
using the database and dataset from hospitals is to evaluate our proposed algorithms on
standard 12-lead ECG for various disease categories to demonstrate their effectiveness.
Evaluation using SVM, AANN and GMM:
The ECG signal is preprocessed and the features namely Morphological features, Linear
Prediction Coefficients (LPC) and Mel Frequency Cepstral Coefficients (MFCC) are
extracted with 8, 14 and 13 dimensions. The ECG recording of about 30 minutes to
1 hour duration is taken for experimentation. The sampling rate is 360 Hertz and the
duration of training data is 10 seconds.
Initially the Morphological features with 8 dimensions, LPC features with 14 di-
mensions and MFCC features with 13 dimensions are trained using SVM. N SVMs
70
Table 3.2: Dataset I
S.No Category Data Source
MIT-BIH Atrial Fibrillation Database (afdb) ,
St.Petersburg INCART 12-lead Arrhythmia

1 Atrial Fibrillation
database (incartdb), MIT-BIH Supra
Ventricular Arrhythmia database and Hospitals
MIT-BIH Supra Ventricular Arrhythmia

Supra Ventricular
2 database (svdb), MIT-BIH Arrhythmia
Tachycardia
database and Hospitals
CU Ventricular Tacyarrhythmia
3 Ventricular Tachycardia Database (cudb), MIT-BIH Arrhythmia
database (incartdb), PTB Diagnostic ECG

4 Antero Septal Infarction
Database (ptbdb), European St-T Database
and Hospitals
PTB Diagnostic ECG Database (ptbdb),

5 Anterior Infarction
European St-T Database and Hospitals
PTB Diagnostic ECG Database (ptbdb),

6 Inferior Infarction
European St-T Database and Hospitals

AtrioVentricular
7 database (incartdb), MIT-BIH Arrhythmia
blocks
Database and Hospitals
71
Table 3.3: Dataset II
S.No Category Data Source

Left Bundle Branch
8 database (incartdb), MIT-BIH Arrhythmia
Blocks

Right Bundle Branch
9 database (incartdb),MIT-BIH Arrhythmia
Blocks
10 Normal MIT - BIH Database and Hospitals
Table 3.4: Ratio of training and testing data in terms of accuracy
Ratio of training and testing data Accuracy
70 : 30 97.40%
80 : 20 98.60%
60 : 40 96.00%
are created for each feature of ECG samples. For training, 1000 ECG samples are
considered which includes normal and nine abnormal categories. 100 feature vectors
from each category is considered for Morphological, LPC and MFCC features. The
training process analyzes ECG training data to find an optimal way to classify ECG
frames into their respective classes.
The derived support vectors are used to classify sub categories of the disease from
ECG data. For testing, 1000 ECG samples were considered. During testing, 8 dimen-
sional Morphological features, 14 dimensional LPC and 13 dimensional MFCC features
are given as input to SVM model and the distance between each of the feature vectors
72
and the SVM hyperplane is obtained.
The average distance is calculated for each model. The disease corresponding to
the ECG is decided based on the maximum distance. The same process is repeated for
all the sub categories of the diseases, and the performance is studied. The performance
of ECG classification for Polynomial, Gaussian and Sigmoidal kernels is studied. From
the analysis, Gaussian kernel function in SVM using MFCC features provides improve-
ment in performance for three levels of classification. Hence Gaussian kernal is applied
in this work. The performance of kernel functions for normal/abnormal classification
is shown in Table 3.5.
Table 3.5: Performance of SVM for normal/abnormal classification
Performance (in %) Polynomial Gaussian Sigmoidal
Morphological 92.72 97.90 94.30
LPC 82.30 94.00 91.21
MFCC 91.40 98.50 96.19
The distribution of the Morphological, LPC and MFCC feature vectors in the
feature space is captured using an AANN model. Separate AANN models are used to
capture the distribution of feature vectors of each class, and the network is trained for
400 epochs. One epoch of training is a single presentation of all the training vectors
to the network. For evaluating the performance of the system, the feature vector is
given as input to each of the models. The output of the model is compared with the
input to compute the normalized squared error.

||y−o||2
The normalized squared error (E) for the feature vector y is given by, E = ||y||2
where o is the output vector given by the model. The error (E) is transformed into a
confidence score (C) using C = exp(−E). The average confidence score is calculated
73
for each model. The class is decided based on the highest confidence score. The perfor-
mance of the system is evaluated, and the method achieves about 98.80% classification
rate using MFCC for normal/abnormal classification. The structure of AANN model
plays an important role in capturing the distribution of the feature vectors. After the
trial and error, the network structure obtained for three features is shown in Table
3.6. The structure seems to give good performance in terms of classification accuracy.
For testing, the feature vectors extracted from the various classes are given as input
to the model, and the corresponding class has the maximum confidence score.
Table 3.6: Structure of AANN for ECG signal classification
Feature Dimension AANN structure
Morphological 8 8L 16N 3N 16N 8L
LPC 14 14L 28N 7N 28N 14L
MFCC 13 13L 26L 6N 26N 13L
The number of units in the third layer (compression layer) determines the number
of components captured by the network. The AANN model projects the input vectors
onto the subspace spanned by the number of units (Nc ) in the compression layer. If
there are Nc units in the compression layer, then the ECG feature vectors are projected
onto the subspace spanned by Nc components to realize them at the output layer. The
effect of changing the value of Nc on the performance of normal/abnormal (Level-I)
classification is studied. There is no major change in the performance if Nc and Ne is
between 4 ≤ Nc ≤ 6 and 24 ≤ Ne ≤ 26 for MFCC features as Table 3.7 and 3.8.
The performance of the system decreases because there may not be a boundary
between the components representing the disease information and the training ECG
samples may not be sufficient for capturing the distribution of the feature vectors.
74
Table 3.7: Performance in terms of number of units in the compression layer (Nc ) for
normal/abnormal classification (Level-I)
Features Performance Evaluation for Compression
Layer
MFCC No.of units in Nc Nc =2 Nc =4 Nc =6 Nc =8
Accuracy (in %) 97.23 97.95 98.80 96.82
Table 3.8: Performance in terms of number of units in the expansion layer (Ne ) for
normal/abnormal classification.
Features Performance Evaluation for Expansion Layer
MFCC No.of units in Ne Ne =22 Ne =24 Ne =26 Ne =28
Accuracy (in %) 96.11 96.20 98.80 96.84
Table 3.6 shows the structure of AANN used in this work. The general topology
of AANN is discussed in Fig. 3.6. AANN performs identity mapping and hence input
and output layer contains same number of nodes.
In GMM the database comprises of ECG samples that leads to fitting of each
category to individual component. The component setting of 4 or more provides better
accuracy than others. Based on the characteristics of each disease the sub categories
are analysed. Various components in GMM using Morphological, LPC and MFCC
features are analyzed for three levels. The number of Gaussian mixtures is increased
from 2 to 10 and the performance in terms of classification accuracy is studied.
When the number of mixtures is 2, the performance is very low. When the mixtures
are increased from 2 to 4, the classification performance slightly increases. When
75
the number of mixtures varies from 4 to 10, there is no considerable increase in the
performance and the maximum performance is achieved. There is no considerable
increase in the performance when the number of mixtures is above 10. With GMM, the
best performance is achieved with 4 Gaussian mixtures for three levels of classification.
Table 3.9: Performance of GMM for normal/abnormal classification

❳❳❳
❳❳
❳❳❳ No. of Mixtures
❳❳❳
❳❳
❳❳❳
2 4 6 8
Performance (in %) ❳❳❳
❳❳
Morphological 80.79 95.64 94.92 94.89
LPC 79.50 82.50 82.46 82.11
MFCC 78.43 83.30 82.80 81.92
3.5.1 Normal/Abnormal Classification using SVM, AANN and

GMM (Level - I)
The classification is carried out in three levels in this work. First level is focused on
classification of ECG samples into normal or abnormal category. The performance for
normal/abnormal classification is shown in Table 3.10. From the experimental analysis
it is observed that MFCC with AANN classifier provides an optimum result than other
techniques for ECG signal. In normal/abnormal classification the ECG characteristic
of the normal ECG sample are alone considered and the characteristics of individual
diseases were not considered. The accuracy of normal/abnormal categories is shown
in Fig. 3.8.
76
Table 3.10: Performance of SVM, AANN and GMM for normal/abnormal classifica-
tion
Classifiers SVM AANN GMM
Performance (in %) Spec Sen Acc Spec Sen Acc Spec Sen Acc
Morphological 93.71 97.93 97.55 98.45 98.28 97.90 93.12 95.64 93.10
LPC 93.90 94.73 94.00 95.01 95.90 96.00 81.71 82.40 82.50
MFCC 98.21 97.95 98.50 98.79 98.64 98.80 86.90 83.30 87.60
Fig. 3.8: Performance of SVM, AANN and GMM for normal / abnormal classification
3.5.2 Disease Classification using SVM, AANN and GMM

(Level -II)
The second level focusses on classification of three cardiac diseases. The performance
of three diseases namely Arrhythmia, Myocardial Infarction and Conduction Blocks
using different classifiers are discussed in this Section. From the analysis it is observed
77
that AANN provides better performance compared to other classifiers is shown in
Table 3.11. The accuracy of AANN for three major cardiac diseases is shown in Fig.
3.9.
Table 3.11: Performance of AANN for disease classification (Level - II)
Arrhythmia
Performance (in %) Precision Recall F-score Accuracy
Morphological 97.50 97.50 97.50 98.50
LPC 94.97 94.50 94.73 96.02
MFCC 95.52 96.00 95.76 97.16
Myocardial Infarction
Morphological 97.97 97.00 97.48 98.33
LPC 95.50 95.50 95.50 96.23
MFCC 96.01 96.50 96.25 97.50
Conduction Blocks
Morphological 96.50 97.50 96.99 98.00
LPC 95.01 95.50 95.25 95.45
MFCC 96.46 95.50 95.97 97.33
3.5.3 Disease Subcategory Classification using SVM, AANN

and GMM (Level - III)
In third level the sub categories of Arrhythmia (Arr), Myocardial Infarction (MI) and
Conduction Blocks (CB) are considered. Performance of ECG classification for disease
subcategories using techniques namely SVM, AANN, GMM is shown in Figs. 3.10,
3.11 and 3.12. Table 3.11 shows the performance of AANN for level-II classification
where the morphological features for Arrhythmia shows a high precision of 97.50%,
78
Fig. 3.9: Performance of AANN for disease classification
for Myocardial infarction 97.97% and for Conduction blocks 96.50% respectively. In
the literature F-measure and Accuracy are the two main performance measures for
classification. Hence it is used in this work. F-Measure is a measure of a test’s
accuracy that considers both the precision and the recall of the test to compute the
score (Harmonic mean).
P recision ∗ Recall
F − Measure = 2 (3.34)
P recision + Recall
The accuracy of a measurement system is a level of measurement that yields true (no
systemic errors) and consistent (no random errors) results.
T rue P ositives + T rue Negatives

Accuracy = (3.35)
T otal No. of Samples
In this work hierarchical classification is made. In level I the normal and abnormal
classification is made. In level II the three major cardiac diseases classification is made
namely Arrhythmia, Myocardial Infarction and Conduction blocks and in level III the
disease subcategory classification is made.
79
Fig. 3.10: Performance of Arrhythmia
Fig. 3.11: Performance of Myocardial Infarction
80
Fig. 3.12: Performance of Conduction Blocks
For Arrhythmia three subcategories namely Supraventricular Arrhythmia, Atrial
Fibrillation and Ventricular Tachycardia, for Myocardial infarction three subcategories
namely Anteroseptal Infarction, Anterior Infarction and Inferior Infarction and for
Conduction blocks three subcategories namely atrioventricular blocks, Left bundle
branch blocks and right bundle branch blocks are classified respectively.
Novelty of the Work: The novelty of the work is signal is processed using image
processing techniques.
3.6 Summary
This chapter discusses the Morphological, LPC and MFCC feature extraction and
classification of ECG data using SVM, AANN and GMM classifiers. Nine categories
of cardiac diseases is classified in the proposed work. The performance of the system
is studied for all the nine categories. The performance of the system is evaluated on a
81
large dataset collected from the Physiobank database and real time dataset from hos-
pitals for normal and nine types of diseases. Most of the samples are correctly detected
and it is observed that AANN with morphological features gives better performance
when compared with other techniques.
82

ECG Signal Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECG Signal Classification

Uploaded by

Copyright:

Available Formats

Chapter 3

ECG Signal Classification

classification is presented. Support Vector Machines (SVM), Auto associative Neural

proposed work. The experimental results for normal/abnormal classification is given in

In Section 3.5.3 the disease subcategory classification performance is presented.

proper classification. Cardiac disease classification, therefore, involves determination

problems are feature vectors.

three subcategories namely Supra ventricular tachycardia, Atrial fibrillation, Ventric-

ular tachycardia of Arrhythmia, Anteroseptal infarction, Anterior infarction, Inferior

and Right bundle branch blocks of Conduction blocks respectively.

processing. ECG is often contaminated by noise and artifacts, therefore, denoising is

up at different intervals. Selection of an efficient technique to analyze these types

denoising applications [7], [8], [9].

feature extraction and classification methods for ECG signal.

3.2 ECG Signal Feature Extraction

such as muscle contraction, electrode displacement, patient movements and so on.

format of ECG signal is adopted.

Discrete Wavelet Transform

wavelet transform is a method for the complete frequency localization of a signal.

needed and shorter sampling intervals where high-frequency information is available.

The major advantage of wavelet transform is its ability to perform multiresolution

this particular application. The shape of Daubechies wavelets is similar to that of

frequencies. Wavelet processing is based on the idea of sub-band decomposition and

Fig. 3.1: Signal decomposition using DWT

are the approximation and detail coefficients.

be uniquely decomposed as a weighted combination of orthonormal basis functions as

ϕn (t) are orthonormal basis functions.

A single level decomposition puts a signal through two complementary low-pass

peaks such as P, Q, S and T are detected in this level of reconstruction. To obtain

decomposition levels [128].

QT interval, ST interval, RR interval, PP interval and TT interval are extracted.

for 8 levels is given as follows:

8-level wavelet decomposition using Daubechies 6

• R peak: The R peak is detected by combining detail components D3, D4 and

T hresholdvalue = Max[signal] ∗ Mean[signal] (3.2)

• Q Peak: Q peak is detected, as local minimum point obtained within next

components D2, D3, D4 and D5

wave (before Q wave) by combining detail components D6 and D7

D2, D3, D4 and D5.

• T Peak: T peak is detected, as maximum point obtained within 90 samples

from P peak by combining detail components D6 and D7

calculated. They are

IP P = Pi+1 − Pi , i = 1, 2, .....N − 1 (3.3)

IRR = Ri+1 − Ri , i = 1, 2, .....N − 1 (3.4)

each beat. The P-R interval is obtained by

the end of the S wave. The QRS duration is calculated by

end of the T wave. It is obtained by

IQT = Tof f −set − Q (3.7)

IT T = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.8)

IST = Ti+1 − Ti , i = 1, 2, .....N − 1 (3.9)

3.2.3 Linear Prediction Coefficients

computations is considered as an advantage of this technique which has become a base

analysis is used to capture the properties of the signal.

In the LP analysis of ECG, each sample is predicted as linear weighted sum of

present sample, then it is predicted by the past p samples as

termed as the prediction error or residual, and is given by

whole frame is given by,

where , a = [a1 a2 · · · ap ]T , r = [r(1)r(2) · · · r(p)]T , and R is a Toeplitz symmetric

autocorrelation matrix given by,

E (0) = r[0] (3.18)

E i = (1 − ki2 ) · E i−1 (3.22)

to directly estimate the parameters of speech signal, so we could refer to speakers

recognition, speech recognition, speech classification and signal dereverberation as im-