Professional Documents
Culture Documents
Abstract - In this paper, the cry characteristics of newborn immediate medical care. This scoring system includes 5
infant were investigated and correlated with Apgar score. The components; heart rate, respiratory effort, muscle tone, reflex
Apgar score is a rapid method to evaluate the physical condition irritability and color, which is each of them is given a score of
of newborn infants at 1 and 5 minutes after birth, and may be 0, 1 or 2. The Apgar score is the sum of the 5 components.
repeated later if the score is and remains low. The cry of
Scores 3 and below are regarded as critically low, 4 to 6 fairly
premature and mature infants with low and normal Apgar scores
was analyzed using principle component analysis (PCA). Pre- low, and 7 to 10 generally normal. Premature infants are more
processing of the voice or unvoiced segments of cry signals to have low Apgar scores (i.e 3 and below) than normal
include zero rate crossing, short time energy and filtering. infants [7].
Through principle component analysis, the reduced dimension This paper presents principal component analysis (PCA) of
cry signal is investigated to extract features to be correlated with mel-frequency cepstral coefficients (MFCC) computed from
Apgar scores. This work provides the foundation for the design infant cries with low and high Apgar scores. It reveals the
of an automatic algorithm to replace the manual Apgar scoring differences in the PCA results between the low and high
system. Apgar scores.
I. INTRODUCTION
II. ANALYSIS TECHNIQUES
The crying of infants is the sign of life after birth and the
A. Mel Frequency Cepstral Coefficient (MFCC)
first tool of communication. It involves characteristics of
MFCC is used to encode signal [8].The steps involved in
vocalizations, facial expressions and limb movements [1]. As
this process are frame blocking, windowing, fast fourier
an adult speech, infant cry used to communicate about their
transform (FFT), mel – frequency wrapping and cepstrum.
needs or problem. Previous researchers state that infant cry
In frame blocking, the signal is blocked into frames. Then,
consists of useful information regarding the physical,
hamming window is applied for each individual frame so that
psychological and pathological state of the infant. It has been
the signal discontinuities at the beginning and end of each
shown that there exist significant differences among the
frame can be minimized. The next step is the Fast Fourier
various types of crying, like healthy infant cry, pain cry and
Transform, which converts each frame from the time domain
pathological cry [2, 3]. Infants at medical risk such as
into the frequency domain. The FFT is a fast algorithm to
premature infants or infants with metabolic disturbances cry at
implement the Discrete Fourier Transform (DFT).
higher frequency than normal. That indicates the infants may
Human perception of the frequency contents of sounds
have problem and need immediate medical treatment.
for speech signals does not follow a linear scale. Thus for
Many researchers analyze cries to relate with disease.
each tone with an actual frequency, a subjective pitch is
They used various analysis techniques such as auditory
measured on a scale called the ‘mel’ scale. The mel-frequency
analysis, time domain analysis, frequency domain,
scale is linear frequency spacing below 1000Hz and a
spectrographic and computer – based analysis. Michelsson et
logarithmic spacing above 1000Hz. As a reference point, the
al. [4] defined healthy and unhealthy cry types by
pitch of a 1kHz tone, 40dB above the perceptual hearing
spectrography. They introduced modified spectrogram to
threshold, is defined as 1000 mels. Therefore, equation (1) can
analyse infant cries with hypothyroidism, asphyxia or
be used to compute the mels for a given frequency, f in Hz [9].
meningitis. Schonweiler et al. [1, 5, 6] investigated the cries of
hearing impaired infants. They found differences in the
mel(f)= 2595*log10(1+f/700). (1)
duration of the cry signals between 3 healthy infants and 4
infants with hearing diseases.
To simulate the subjective spectrum filter bank is used,
The clinical status of newborn infant is assessed at 1 and
one filter for each desired mel frequency component. The
5 minutes after birth using a scoring system known as Apgar
filter bank has a triangular bandpass frequency response, and
score. This score may be repeated later if the score is and
the spacing as well as the bandwidth is determined by a
remains low. The method was designed to help doctors or
constant mel-frequency interval [9]. In the final step, the log
nurses to assess an overall physical condition of newborn so
mel spectrum are convert back into time. The result is called
that they could quickly determine whether the baby needed
the mel frequency cepstrum coefficients (MFCCs). Actually,
210
Mel-spaced filterbank 0.6
2 0.4
0.2
1.8
Amplitude
0
-0.2
1.6 -0.4
-0.6
1.4 -0.8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (s)
1.2
6000
1 5000
Frequency (Hz)
4000
0.8 3000
2000
1000
0.6 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Time (s)
0.4
Fig. 4. Cry signal with low Apgar score and its spectrogram after filtering
0.2
0
0 0.5 1 1.5 2 2.5
A. Mel Frequency Cepstrum Coefficient (MFCC)
Frequency (Hz) 4
x 10 The MFCC features obtained from cries with low and
Fig.2. The MFCC Filter Bank used in the analysis high Apgar scores are shown in Fig. 5 and Fig. 6. Note that the
amplitude of the coefficients 2 to 8 for high Apgar score is
To eliminate redundancy in the results of MFCC, PCA was below 10 dB. Only coefficients 9 and 10 have amplitudes
conducted on all 10 coefficients. The eigenvalue – one criterion between 10 to 20 dB. The MFCC features of low Apgar scores
and scree test were performed to determine the eigenvalue for encompasses different pattern compared to high Apgar Scores.
each MFCC. In the eigenvalue – one criterion, MFCC that have High energy (amplitude approaches 20 dB) can be observed
eigenvalue that is greater than one is retained. In the scree test, from the MFCC feature starting from coefficients 6 to 10.
the principal components of the MFCC were selected based on MFCCs
to confirm the results obtained from the scree test, the analysis 80
20
spectrogram before and after filtering are shown in Fig. 3 and -20
3
which is in agreement with those reported by Wasz et al [13]. Length of Frame
0 1
2
Coefficient
0.1 MFCCs
Amplitude
0.05
-0.05
-0.1
100
-0.15
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time (s)
80
14000
60
12000
10000
40
Frequency (Hz)
Amplitude
8000
6000
20
4000
2000 0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Time (s)
-20
Fig. 3. Cry signal with low Apgar score and its spectrogram -40
140
120
100 10
9
80 8
7
60 6
40 5
4
20 3
2
0 1
Length of Frame
Coefficient
211
B. Principal Component Analysis (PCA) the sixth coefficient to the tenth coefficient, there is small
Table 1 and 2 show the eigenvalues for high (normal cry) difference in eigenvalues. In low apgar score, a large
and low (premature cry) Apgar score infant respectively. In difference occurred between coefficient 1 and 8. Note that
high Apgar score, the eigenvalue for coefficient 1 is 3.461 there is a relatively small difference between coefficient 8 and
whereas the eigenvalue for coefficient 2 is 2.465. This 10. These coefficients are regarded as trivial. The differences
variation is consistent with the earlier statement that the first in eigenvalues for both cases can be observed clearly in Fig. 8.
components extracted tend to account for relatively large The scree test results suggest that coefficients 1 to 6 for high
amounts of variance, while the later components account for Apgar score and coefficients 1 to 8 for low Apgar score should
relatively smaller amounts. Table 2 also shows the same be retained.
variation. The total of eigenvalue for both cases is exactly the The Scree Test
TABLE 1
HIGH APGAR SCORE (NORMAL INFANT CRY) 2.5
Eigenvalue
2
1 3.4612 0.9962 0.3461 0.3461
2 2.4650 1.1283 0.2465 0.5926
1.5
3 1.3367 0.5039 0.1337 0.7263
4 0.8329 0.1003 0.0833 0.8096 1
5 0.7325 0.3543 0.0733 0.8828
6 0.3783 0.0851 0.0378 0.9207 0.5
212
TABLE 3
ANALYSIS OF PRINCIPAL COMPONENT ON HIGH APGAR SCORE (NORMAL) INFANT CRY
Sample PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Bnormal_1 0.1394 -0.4756 0.3213 -0.1877 0.3991 -0.2369 0.0889 0.0714 0.5999 0.1660
Bnormal_2 -0.2450 0.4180 -0.4017 -0.0907 -0.2563 -0.0466 -0.1115 0.0306 0.6673 0.2629
Bnormal_3 0.4299 0.1146 -0.0442 0.2594 0.3146 0.4460 -0.6246 0.0082 0.1835 -0.1055
Bnormal_4 0.3009 0.2583 0.2335 0.6618 -0.0949 -0.0680 0.3974 0.3084 0.0809 0.2775
Bnormal_5 -0.2226 0.2909 -0.3128 0.0147 0.7747 0.1098 0.3866 0.0015 -0.0622 -0.0466
Cnormal_1 0.4509 0.2289 0.0076 -0.0881 -0.0044 -0.1641 0.1729 -0.8163 0.0234 0.1130
Cnormal_2 -0.0163 0.5017 0.3311 -0.1204 0.1818 -0.6283 -0.3410 0.1697 -0.0955 -0.2048
Cnormal_3 -0.4536 0.0150 0.3390 0.1485 0.1498 0.1138 -0.2846 -0.2566 -0.1837 0.6641
Cnormal_4 -0.3957 -0.1819 0.0136 0.5978 -0.0048 -0.1944 -0.0553 -0.3652 0.2361 -0.4719
Cnormal_5 0.1724 -0.3118 -0.5969 0.2220 0.1066 -0.5037 -0.2279 0.0667 -0.2271 0.3075
Explained Variance 0.1108 0.1029 0.1110 0.0888 0.0839 0.0958 0.1071 0.1043 0.0945 0.1008
Proportion of Tot Variance (%) 11.0826 10.2931 11.0978 8.8788 8.3861 9.5842 10.7125 10.4324 9.4500 10.0825
Cumulative of Tot Variance (%) 11.0826 21.3758 32.4735 41.3524 49.7385 59.3227 70.0352 80.4676 89.9175 100.0000
TABLE 4
ANALYSIS OF PRINCIPAL COMPONENT ON LOW APGAR SCORE (PREMATURE) INFANT CRY
Sample PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Prem2_1 0.0731 -0.5620 0.0651 -0.0915 0.3414 -0.2102 0.1643 -0.1039 0.5176 0.4469
Prem2_2 -0.1594 0.5633 0.1088 -0.1308 -0.2335 0.1500 -0.1557 0.0854 0.3219 0.6451
Prem2_3 0.3810 0.3532 -0.0826 0.1409 -0.2615 -0.3219 0.4834 -0.2904 0.3978 -0.2349
Prem2_4 0.4727 0.0318 -0.0855 0.0781 -0.0273 -0.5279 -0.6837 0.0150 -0.0705 0.1002
Prem2_5 -0.1893 -0.0155 -0.4893 0.6681 0.0958 0.2644 -0.2213 -0.3329 0.1911 0.0533
Prem3_1 -0.3147 0.3517 0.0810 0.1135 0.5259 -0.4578 0.1939 -0.2520 -0.3690 0.1856
Prem3_2 0.0498 0.0379 0.7329 0.0838 0.1122 0.2223 -0.2794 -0.4942 0.1474 -0.2140
Prem3_3 0.5206 -0.0080 -0.1831 -0.2201 0.0444 0.3762 0.1492 -0.4360 -0.4112 0.3493
Prem3_4 -0.3588 -0.0157 -0.3437 -0.6214 -0.1012 -0.1311 -0.2272 -0.4726 0.1401 -0.2112
Prem3_5 -0.2521 -0.3397 0.1871 0.2186 -0.6703 -0.2517 0.0951 -0.2556 -0.2867 0.2628
Explained Variance 0.1106 0.1094 0.1111 0.1105 0.1108 0.1024 0.1085 0.0396 0.1074 0.0899
Proportion of Tot Variance (%) 11.0559 10.9360 11.1110 11.0476 11.0774 10.2353 10.8535 3.9584 10.7395 8.9854
Cumulative of Tot Variance (%) 11.0559 21.9919 33.1030 44.1506 55.2280 65.4633 76.3168 80.2752 91.0146 100.0000
The analysis of cry signals with high and low Apgar [1] Gyorgy Varallyay Jr. “Future Prospects of the Application of the
Infant Cry in the Medicine”, Periodica Polytechnica Ser. El. Eng., vol.
score using MFCC and PCA have been described in this
50, No 1-2, 2006
paper. In the MFCC analysis, 10 coefficients were extracted [2] Orozco J. and Garcia C. A. R., “Detecting Pathologies from Infant Cry
for each 1 second signal. The MFCC feature of low Apgar Applying Scaled Conjugate Gradient Neural Networks”, Proceeding of
score has higher energy than that of high Apgar score. ESANN, 249 – 354, 2003
[3] Boukydis, C. F. Z., “Perception of Infant Crying as an Interpersonal
The PCA results show that the eigenvalues of high Apgar
Event. In: Infant Crying: Theoretical and Research Perspectives”, ed.
score are larger than that of low Apgar score infants. Based on Lester, B. M & Boukydis, C. F. Z Plenum Press, New York, 1985.
the results obtained from the eigenvalue – one criterion, scree [4] Michelsson, K.- Michelsson, O., “Phonation in the Newborn, Infant
test and cumulative percent of variance, it is confirmed that Cry”, Int. J. Pediatr. Otorhinolaryngol., 49/1 pp. S297 – S301 (1999).
[5] Schonweiler, R.- Kaese, S.-Moller, S. Rinscheid, A.-Ptok, M.,
coefficient 2 and 3 should be retained for high and low Apgar
“Neuronal Networks and Self-organizing Maps: New Computer
scores. Even though all methods agree that coefficient 1 Techniques in the Acoustic Evaluation of the Infant Cry” Int. J. Pediatr.
should be retained, based on the MFCC analysis results, this Otorhinolaryngol., 38 pp. 1 – 11 (1996).
coefficient should be ignored since it does not have significant [6] Schonweiler, R.- Kaese, S.-Moller, S. Rinscheid, A.-Ptok, M.,
“Classification of Spectrographic Voice Patterns Using Self-organizing
information.
Neuronal Networks (Kohonen Maps)in the Evaluation of the Infant Cry
The scree test and cumulative percent of variance also with and without Time-delayed Feedback”, Int. J. Pediatr.
suggest that coefficients 4 to 6 should be considered. Otorhinolaryngol., 38/2 pp. 181(1996).
Therefore, further investigation has to be carried out using a [7] MD Michael O. Gardner and MD Robert L. Goldenberg, “Predicting
Low Apgar Scores of Infants Weighing Less Than 1000 grams: The
large number of samples. The results obtained in this study are
Effect of Corticosteroids”, Elsevier, Inc., 1995.
very useful since they provide the foundation for the design of [8] Huang, X., Acero, A., Hon, H. “Spoken Language Processing: A Guide
an automatic algorithm to replace the manual Apgar scoring to Theory, Algorithm, and System Development”, Prentice Hall, Inc.,
system. USA, 2001.
[9] Md. Rashidul Hasan, Mustafa Jamil, Md. Golam Rabbani Md. Saifur
Rahman, “Speaker Identification using Mel Frequency Cepstral
Coefficients”, 3rd International Conference on Electrical & Computer
Engineering ICECE 2004, Dhaka, Bangladesh, 28-30 December 2004.
213
[10] Jr., J. D., Hansen, J., and Proakis, J. “Discrete-Time Processing of
Speech Signals”, second ed. IEEE Press, New York, 2000.
[11] F. Soong, E. Rosenberg, B. Juang, and L. Rabiner, "A Vector
Quantization Approach to Speaker Recognition", AT&T Technical
Journal, vol. 66, March/April 1987, pp. 14-26.
[12] Http: support.sas.com/publishing/pubcat/chap5/55129.pdf.
[13] O. Wasz-Hockert, J. Lind, V.Vuorenkoski, T. Partanen and E. Valanne,
“The Infant Cry – A Spectrographic and Auditory Analysis”, Spastic
International Medical Publications in Association with William
Heinemann Medical Books Ltd., 1968
214