Developing An Appropriate Data Normalization Method: Balemir Uragun and Ramesh Rajan

2011 10th International Conference on Machine Learning and Applications
Developing an appropriate data normalization method

Balemir Uragun and Ramesh Rajan
Monash University
Physiology Department
Clayton, Victoria, Australia
b.uragun@hotmail.com and Ramesh.Rajan@monash.edu
Abstract — The issue of data normalization has been levels at the two ears as a sound source moves about an
extensively studied with respect to many different applications. animal and are created by head and body shadowing effects
However, before implementing any data normalization, it is which affect high frequency sounds more than low frequency
necessary to establish whether the data requires normalization, sounds [15]. There is a vast literature on the importance of
and this can be determined by a simple test-bench test. Our
test-bench compared seven well-known normalization
ILDs and how neurons at various brain levels respond to
techniques for nine possible sensitivity functions recorded from ILDs that cover a wide azimuthal range across frontal space,
brain neurons to variations in Interaural Level Differences from opposite one ear across to opposite the other. We focus
(ILD), a cue to azimuthal location of a sound source. These on application of data normalization techniques to the
nine realistic ILD functions were systematically modified to be response heterogeneity in ILD functions (plots of the
delivered at the most suitable normalization technique. This strength of neuronal responses to variations in ILDs)
method then helped to select a coherent normalization recorded from neurons in an obligatory midbrain auditory
technique for the data before applying the statistical technique relay structure, the Inferior Colliculus. To examine the effect
of Cluster Analysis which can be explored in future studies. of different normalization methods, we created a test bench
of theoretical ILD functions. These functions included
Keywords - Data Normalization; Interaural Level Difference
variations in different features of ILD functions to simulate
I. INTRODUCTION TO DATA NORMALIZATION all the ILD variants reported in the literature. Use of this
standard simulated set of ILD functions allowed direct
Data normalization is a scaling process for numbers in a comparison between different normalization methods and we
data array and is used where there is the heterogeneity in the carried out a three-step procedure to find the best ‘tailored’
numbers renders difficult any standard statistical analysis. data normalization for prototypical ILD functions.
The data is then often normalized before any application
process is applied and therefore data normalization is usually
termed as data pre-processing. Many different data II. GENERATING PROTOTYPICAL ILD FUNCTIONS
normalization techniques have been developed in diverse Our own database and an extensive literature review
applications such as in diagnostic circuits in electronics [1], showed that four prototypical ILD functions can be recorded
temporal coding in vision [2], predictive control systems in in [different] neurons at all levels of the brain above the
seismic activities [3], modeling labor market activity [4], brainstem. These four functions (Figure 1) consist of (a) two
pattern recognition [5], and extensively in microarray data Sigmoid functions where neuronal responses vary
analysis in genetics, [6], [7], [8], [9], [10], [11], [12], [13]. sigmoidally across a wide range of ILDs with a plateau of
The purpose of data normalization depends on the responses in the ILD range favoring either one ear or the
proposed application, and hence data normalization includes other, (b) a Peaked function where neuronal responses are
use of linear scaling to compress a large dynamic range [1], peaked at some ILD within the range encompassing frontal
scaling of values to correct for variation in laser intensity space, and (c) Insensitive functions where neuronal
[11], handling obscure variation [6] or removing systematic responses vary very little with ILDs. Each of these four
errors in data [5], [10], [13], or efficiently removing broad response categories encompasses functions that can
redundancy in a non-linear model as an optimal vary in metrics defining the features of the ILD function, e.g.
transformation for temporal processing [2]. Although the position along the ILD axis of the peak of responses or of the
benefits of data normalization depend on data type, data size slope from maximum to minimum responses, the steepness
and normalization method, generally the advantages of data of the slope– features that have been variously discussed to
normalization are (a) to give a more meaningful range of be defining information-bearing elements that may be used
scaled numbers for use, (b) to rearrange the data array in a to derive azimuthal location of a sound source [16].
more regular distribution, (c) to enhance the correctness of In the simulated ILD sensitivity functions, 13 ILD values
subsequent calculations, and (d) to increase the significance were used ranging from +30 dB (30 dB louder in one ear) to
or importance of the most descriptive numbers in a non- -30 dB (30 dB louder in the other ear) as detailed in Section
normally distributed data set. V below. Neuronal responses were represented as spikes /
stimulus on a scale from ‘0’ to ‘100’ (“m”, maximum
Here we examine the use of data normalization methods response). This normalized scale allowed us to simulate ILD
to model responses of brain neurons to variations in an functions in absolute values across all normalization tests to
important parameter for localization of the azimuthal allow comparison of effects across the tests.
location of high-frequency sounds, Interaural Level
Differences (ILDs) [14]. ILDs are the difference in sound
978-0-7695-4607-0/11 $26.00 © 2011 IEEE 195

DOI 10.1109/ICMLA.2011.53
III. GENERATING VARIANTS OF ILD FUNCTION function (v#1s of Fig. 1A and Fig. 1I) made it look like a
To appropriately model the range of ILD functions seen Sigmoidal ILD function.
experimentally, we generated the four prototypical ILD Data perturbation was carried out in three-steps: (i)
functions (i.e. Sigmoidal-EI, Sigmoidal-IE, Peaked, and generate 13 random numbers varying between -0.06 and
Insensitive) and then added in variants in feature elements +0.06 for every unit (i.e. ±6 for the maximum of 100 spike
of each class to produce a data set of nine possible variants counts), (ii) add the 13 random numbers arbitrarily to each
in ILD functions. For Sigmoidal functions, three important ILD pattern (each ILD pattern contains 13 numbers and
features likely to be critical for coding ILDs were (a) each number presents the number of spike counts), and (iii)
position along the ILD axis of the cut-off of the slope (the monitor the previous steps to verify that perturbed data
point where the ILD function starts to turn over), (b) validly fit in the range from minimum to maximum (e.g.,
steepness of slope, and (c) ILD at which the maximum log normalization may produce errors for some values if
firing rate occurs. For Peaked functions, the three important divided by zero).The ±6% perturbation was applied to the
features were (a) position of peak, (b) ILD at which the nine ILD functions to produce the final test bench.
maximum firing rate occurs, and (c) position of cut-off (tip V. GENERATING THE FINAL TEST BENCH ILD FUNCTIONS
of the function) with varying steepness of slope. For
insensitive function, the sole feature was ILD at which the Simulated ILD functions were generated using MATLAB
maximum firing rate occurs. Additionally, to model Statistical toolbox (v5.0.2) to generate a matrix of ILD data
experimentally observed variants between prototypical which could be tested with seven common normalization
classes, we identified two transitional ILD types. In one a techniques, and to graphically display results. In theory, the
peaked function transitioned to a Sigmoidal function as one nine possible ILD variants can be classified using
arm of the peak function moved gradually to increasing mathematical models. This approach has been used to
spike counts (“vector” v#1 through v#4 in Fig. 1G). In the classify neuronal Inter Spike Intervals (ISI) distributions
second transitional type, spike counts on both descending where around 200 ISI histograms from various neurons
arms of a peaked function increased so that it transitioned to were classed using 10 different mathematical models [17].
becoming insensitive ILD function (Fig. 1H). However, our data has not been categorized yet to allow
Thus the nine different test bench ILD functions were: classification by a suitable mathematical model. Therefore,
(i) Three categories of variants in Sigmoidal ILD the following method was applied.
functions with variations in (a) the maximum firing rate For the ILD axis, we used ILDs from +30dB through 0dB
occurs (Fig. 1A), (b) location of cut-off along the ILD axis to -30dB, in 5 dB increments as used in many
(Fig. 1B), and (c) steepness of slope of cut-off (Fig. 1C). electrophysiological studies [18], [19], [20], [21], including
(ii) Three categories of variants in peaked ILD functions our own studies that provided the neuronal data here. In our
with variations in: (a) maximum firing rate occurs (Fig. studies, ILDs were generated by varying the levels in both
1D), (b) location of cut-off along the ILD axis (Fig. 1E), and ears symmetrically about some average level (the Average
(c) steepness of the slope of cut-off (Fig. 1F). Binaural Intensity (ABI) constant method [22]). At any one
(iii) For the two types of transitional ILDs (transitioning ABI level, the 13 ILD levels were represented on a scale of
from Peaked to Sigmoid by increasing firing rate 1 to 13, where 1= +30dB and 13= -30dB. We initially
unilaterally in one arm of the peaked function, or from mathematically modeled four types of ILD sensitivity
Peaked to Insensitive by increasing firing rate bilaterally in functions to mimic the four general forms of ILD sensitivity
both arms of the peaked function): (a) changes in one limb (see Section II) [23], [24], [25], [21]. Then, to mimic the
only to change from peaked to Sigmoidal (Fig. 1G), (b) variations in neuronal response functions that are classed
changes in both limbs to change from peaked to insensitive experimentally as being of the same general form, the basic
(Fig. 1H), and (c) variations in the maximum firing rate prototypes were intentionally varied systematically (v#4s of
occurs of insensitive response functions (Fig. 1I). Fig. 1A, 1G, 1D and 1I), as explained in detail later.
VI. DATA NORMALIZATION: RESULTS AND DISCUSSION
IV. APPLYING SPIKE COUNT PERTURBATIONS Since the appropriate normalization technique for ILD
To produce more realistic looking ILD functions we data is unknown, we tested seven widely-used techniques
applied a small and statistically-insignificant perturbation to against our data set of nine ideal ILD function variants.
deform the shape of the nine ILD functions. All points in the A. Normalization by mean correction
data groups (viz., numbers of spike counts) were arbitrarily
perturbed within ± 6% of the original values. The ± 6% The covariance matrix, or mean-corrected data, removes
perturbation range was determined from initial visual mean values from each vector’s variables one by one (Table
inspection of different test ranges which showed that 1.1). In this method “µ” is the scalar mean value of “Xn”
perturbation by < 6% made the ILD functions still look too data, which has “n” number of ILD patterns. Data
ideal and perturbation by > 6% made it too easy to confuse normalization by the covariance matrix is data scaling that
different ILD patterns. For example, a 7% increase in one has a zero mean [26], [27]. The covariance matrix is set by
part (and 7% decrease in another part) of the insensitive ILD each row of observations with each column of variables that
measures the linear relationship between variables. The
covariance matrix method is not often used and the more
196
preferred method is use of correlation matrices for reduction technique was also unsuitable for the ILD functions. The
of data dimension (such as in Principal Component number of spike counts for all nine ILD functions originally
Analyses) because the correlation matrix is a normalized varied from nearly 0 to 100 (spikes/stimulus). Logarithmic
measure of linear relationship between variables.[26]. The normalization scaled down the spike count range by log2 but
result of this data normalization technique is (1) Maximum the outcome was an exaggerated perturbation of ILD
and minimum values of all normalized data variables are functions, which deformed the original functions. Moreover,
spread between negative and positive values, which some irregular transformations occurred, especially that
misrepresent the original values of the ILD data set, and (2) transitions from Peaked to Sigmoidal and to Insensitive ILD
variations of the four Insensitive ILD patterns are
functions were not smooth as in the original data sets.
exaggerated by normalization.
F. Unit Total Probability Mass normalization
B. Data normalization by a single maximum value
The unit total probability mass (UTPM) normalization or
All unnormalized data are scaled to range between 0 and total intensity normalization [9] is achieved by dividing
a single maximum value (Table 1.2). This maximum value each vector’s element by the sum of that vector’s variables,
is the largest of the maxima across all unnormalized data and multiplying by the mean (Table 1.6, where “Xn”= raw
sets and is used to divide each matrix component to generate data, “µn”= mean value, and “Vn”= normalized data) [3]
a “Vn(i, j)” normalized matrix, i.e. the maximum is always [31] [4] [32]. It has been used in the cumulative distribution
=1 and the minimum always = 0 (where, ∀ Xn(i) ≠0). This function [33] as a discrete type of data normalization and in
technique is often used for microarray data [13], [28]. normalizing input for cluster analysis [8]. It was well suited
However for our data it resulted in value changes that were to our test bench because: (1) The perturbation in variations
so small that it was not possible to distinguish very small of Sigmoidal, Peaked, and Insensitive ILD functions with
changes among similar types of ILD patterns. varying spike counts were not distorted; (2) The positions of
varying cut-off and varying slopes of the ILD functions
C. Data normalization by each vector’s maximum value were kept the same as in the original functions; and (3) The
Data points in each ILD are divided by the maximum irregularity transformation (transitions from Peaked to
spike count in that function, i.e., each vector’s maximum Sigmoidal and to Insensitive ILD functions) were not too
“Xn (i, j)” is used to normalize the “Xn” raw matrix (Table smooth as in the original data set. The maximum values of
1.3). It has the advantage that all functions are scaled to a each ILD functions are all scaled down by ~13% without
maximum of “1” (i.e. the maximum spike count for that losing their original shapes.
function) and other values are effectively expressed as a G. Normalization by data standardization
proportion of this maximum spike count for that function.
Data standardization is achieved by dividing the mean
The disadvantage of this procedure was that the shapes of
subtracted data (mean-corrected) by its standard deviation,
ILD functions with small spike counts became distorted
(Table 1.7: mean “µn” value is subtracted from each data
because each vector’s maximum spike counts were all
point “Xn” and then is divided by its SD “σn”). The
normalized to the same maximum of 1. Thus, this technique
variances of standardized variables are 1 and therefore
was not suitable for all ILD functions, especially those with
covariance of standardized variables always ranges between
small spike count changes with slight perturbations.
‘-1’ and ‘+1’ [28]. An advantage of this normalization is
D. Data normalization by each vector’s standard deviation that data are expressed in comparable units. On some
An uncommon technique is where each data point is occasions, data have been standardized by zero mean and
normalized by its standard deviation (Table 1.4: “σ” = SD, unity SD (like σ = 1 in [34], towards a standard procedure
“X” = raw matrix, and “V” = normalized matrix). This for Principal Component Analysis). This normalization can
produces better data spread, especially for asymmetric also make a more efficient front-end application for neural
functions (e.g., our Sigmoidal functions) [29]. However, it network training [1]. However, this technique was not
was unsuitable for our data because it did not preserve the suitable for the insensitive type of ILD functions as
different number of spike counts in a proportionally scaled perturbations of the ILD functions were adversely affected
manner for similar types of nonlinear ILDs. by this technique, causing ILD function distortion. The
maxima and minima values in this normalization method are
E. Logarithmic normalization also spread between negative and positive values to keep the
Logarithmic normalization is widely used especially mean = 0 and SD = 1 for more nonlinear type of functions
when data analysis involves large number of data [30]. It is i.e. not for insensitive type ILD functions.
a nonlinear procedure [10] that is suitable to deal with
nonlinear data [12]. Each data point “Xn” is divided by the VII. CONCLUSION AND DISCUSSION
mean value “µn” for that function. Then the logarithm to In general terms normalization is signal intensity divided
base 2 (log2) “Vn” is calculated (Table 1.5). With this by a reference value, to reduce systematic errors in data
transformation there is decreased variance [8], as large spike variables [30]. Data normalization also maximizes variance
counts are reduced more than low ones. However, this [34], which is especially important before applying data
197
dimension reduction technique for ILD type data. Data [11] D. Venet, "MatArray: a Matlab toolbox for microarray data,"
Bioinformatics, v. 19, pp. 659-60, 2003.
normalization is often a prerequisite for statistical data
[12] J. Weiner III, C. Zimmerman, H. Gohlmann, and R. Herrmann,
analysis, and finding a suitable scaling technique for the "Transcription profiles of the bacterium Mycoplasma pneumoniae
data is an important task. In a novel approach for the field, grown at different temperatures," Nucleic Acids Research, v. 31, pp.
we developed a test bench of prototypical ILD functions to 6306-20, 2003.
[13] C. Workman, L. Jensen, H. Jarmer, R. Berka, L. Gautier, H. Nielser,
investigate appropriate normalization techniques. We found
and et al, "A new non-linear normalization method for reducing
that the unit total probability mass normalization method variability in DNA microarray experiments," Genome Biology, v. 3,
was the best for ILD response functions. 2002.
Other data normalization techniques can be generated by [14] D. Irvine, "IID in the cat: changes in sound pressure level at the two
ears associated with azimuthal displacements in the frontal horizontal
variation of existing ones, such as dividing by sum of all
plane," Hearing Research, v. 26, pp. 267-86, 1987.
signals or by the standard error signals after corrected mean [15] W. Hartmann and B. Rakerd, "ILD: Diffraction and localization by
values [12]. Also slight variations of normalization human listeners," The Journal of the Acoustical Society of America, v.
techniques used here could give a procedure useful for a 129, p. 2622, 2011.
[16] B. Grothe, M. Pecka, and D. McAlpine, "Mechanisms of Sound
non-linear feature for the data [10]. For our data this could
Localization in Mammals," Physiological Reviews, v.90, pp. 983-
be achieved by multiplication of the mean value of data for 1012, 2010.
our unit total probability mass normalization technique for [17] H. Tuckwell, Stochastic processes in the neurosciences. Philadelphia,
each ILD functions. PA, USA: Society for Industrial and Applied Mathematics, 1989.
[18] D. Irvine, The Auditory Brainstem: A Review of the Structure and
In addition to visual comparisons of the result of a
Function of Auditory Brainstem Processing Mechanisms, 1st ed. N.Y.:
selected normalization technique against the raw data, other Springer-Verlag, 1986.
methods are also available for better selection of the correct [19] D. Irvine, V. Park, and L. McCormick, "Mechanisms underlying the
normalization technique. The quality of the normalization sensitivity of neurons in the lateral superior olive to IID," Journal of
Neurophysiology, v. 86, pp. 2647-66, 2001.
technique can be estimated by: (i) calculating the sum of
[20] T. Lohuis and Z. Fuzessery, "Neuronal sensitivity to interaural time
squares of differences between the model and normalization differences in the sound envelope in the auditory cortex of the pallid
histogram, (ii) using Pearson correlation coefficients bat," Hearing Research, v. 143, pp. 43-57, 2000.
between the values before and after data normalization [35]. [21] D. Phillips and D. Irvine, "Responses of single neurons in
physiologically defined area AI of cat cerebral cortex: sensitivity to
Such a quantification method for normalization selection is
IID," Hearing Research, v. 4, pp. 299-307, 1981.
worth investigation, but is beyond the scope of this study. [22] L. Aitkin, The auditory midbrain: structure and function in the
central auditory pathway. Clifton, N.J.: Humana Press, 1986.
REFERENCES [23] L. Aitkin, D. Irvine, J. Nelson, M. Merzenich, and J. Clarey,
[1] M. Aminian and F. Aminian, "Neural-network based analog-circuit "Frequency representation in the auditory midbrain and forebrain of a
fault diagnosis using wavelet transform as preprocessor," IEEE marsupial, the northern native cat," Brain Behaviour and Evoluation,
Transactions on Circuits and Systems II: Analog and Digital Signal v. 29, pp. 17-28, 1986.
Processing, v. 47, pp. 151-6, 2000. [24] L. Aitkin, The auditory cortex: structural and functional bases of
[2] M. Buiatti and C. van Vreeswijk, "Variance normalisation: a key auditory perception, 1st ed. London: Chapman & Hall, 1990.
mechanism for temporal adaptation in natural vision?," Vision [25] L. Aitkin, M. Merzenich, D. Irvine, J. Clarey, and J. Nelson,
Research, v. 43, pp. 1895-906, 2003. "Frequency representation in auditory cortex of the common
[3] Y. Kosugi, M. Sase, H. Kuwatani, N. Kinoshita, T. Momose, J. marmoset," Journal of Comparative Neurology, v. 252, pp. 175-85,
Nishikawa, and T. Watanabe, "Neural network mapping for nonlinear 1986.
stereotactic normalization of brain MR images," Journal of Computer [26] J. Jackson, A user's guide to principal components. N. Y.: Wiley &
Assisted Tomography, v. 17, pp. 455-60, 1993. Sons, Inc., 1991.
[4] G. Skoog and J. Ciecka, "Probability mass functions for additional [27] H. Moghaddam and K. Zadeh, "Fast adaptive algorithms and
years of labor market activity induced by the Markov (increment- networks for class-separability features," Pattern Recognition, v. 36,
decrement) model," Economics Letters, v. 77, pp. 425-31, 2002. pp. 1695-702, 2003.
[5] H. Kim, D. Kim, and S. Bang, "Face recognition using the mixture-of- [28] S. Sharma, Applied multivariate techniques. N.Y.: John Wiley, 1996.
eigenfaces method," Pattern Recognition Letters, v. 23, pp. 1549-58, [29] A. Zaknich, Neural networks for intelligent signal processing vol. 4.
2002. River Edge, NJ, USA: World Scientific, 2003.
[6] B. Bolstad, R. Irizarry, M. Astrand, and T. Speed, "A comparison of [30] D. Geschwind and J. Gregg, Microarrays for the neurosciences: an
normalization methods for high density oligonucleotide array data essential guide. Cambridge, MA, USA: MIT Press, 2002.
based on variance and bias," Bioinformatics (Oxford,UK), v. 19, pp. [31] Z. Ahmad, L. Balsamo, B. Sachs, B. Xu, and W. Gaillard, "Auditory
185-93, 2003. comprehension of language in young children: neural networks
[7] E. Dougherty, J. Barrera, M. Brun, S. Kim, R. Cesar, Y. Chen, M. identified with fMRI," Neurology, v.60, pp. 1598-605, 2003.
Bittner, and J. Trent, "Inference from clustering with application to [32] S. Patra and R. Misra, "Evaluation of probability mass function of
gene-expression microarrays," Journal of Computational Biology, v. flow in a communication network considering a multistate model of
9, pp. 105-26, 2002. network links," Microelectronics and Reliability, v. 36, pp. 415-21,
[8] J. Kasturi, R. Acharya, and M. Ramanathan, "An information 1996.
theoretic approach for analyzing temporal patterns of gene [33] C. C. Abnet and D. M. Freeman, "Deformations of the isolated mouse
expression," Bioinformatics (Oxford,UK), v. 19, pp. 449-58, 2003. tectorial membrane produced by oscillatory forces," Hearing
[9] J. Quackenbush, "Microarray data normalization and transformation," Research, vol. 144, pp. 29-46, June 2000.
Nature Genetics, v. 32, pp. 496-501, 2002. [34] J. Lattin, P. Green, and J. Carroll, Analyzing multivariate data. Pacific
[10] G. Tseng, M. Oh, L. Rohlin, J. Liao, and W. Wong, "Issues in cDNA Grove, CA, USA: Thomson Brooks/Cole, 2003.
microarray analysis: quality filtering, channel normalization, models [35] I. Sidorov, D. Hosack, D. Gee, J. Yang, M. Cam, R. Lempicki, and D.
of variations and assessment of gene effects," Nucleic Acids Research, Dimitrov, "Oligonucleotide microarray data distribution and
v. 29, pp. 2549-57, 2001. normalization," Information Sciences, v. 146, pp. 67-73, 2002.
198
TABLE 1: SEVEN DATA NORMALIZATION METHODS (WITH THE EQUATIONS) WERE APPLIED TO NINE (FROM “A” TO “I”)
PROTOTYPICAL ILD FUNCTIONS. THE RESULT WAS PRESENTED IN THIS TABLE WITH THE MINIMUM OF MINIMA (FOUR VECTORS)/
MAXIMUM OF MAXIMA (FOUR VECTORS) VALUES WERE ALL SHOWN IN SPIKE COUNTS.
Fig. 1: The four ILD patterns, Sigmoidal-EI (A4), Sigmoidal-IE (G4), Peak (D4), and Insensitive (I2) are all described in numbers of spike
counts “#sp.c.” (spikes/ stimulus) which varied between maxima of ‘100’ and minima of ‘0’ zero units, within -30dB to +30dB ILD level
differences. Nine possible ILD functions are generated from those four typical ILD sensitive function variations. These are; Sigmoidals with
varying number of spike count (# sp.c.) spikes/stimulus (A) position of the cutoff (B), the steepness of the slope (C), and four Peaked with
varying number of spikes/stimulus (D), the cutoff (E), the cutoff & slope (F), and Peaked with unilateral transition to Sigmoidal (G), and Peaked
with bilateral transition to Insensitive (H), and four Insensitive with varying the number of spike count spikes/stimulus (I).
199

Developing An Appropriate Data Normalization Method: Balemir Uragun and Ramesh Rajan

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing An Appropriate Data Normalization Method: Balemir Uragun and Ramesh Rajan

Uploaded by

Copyright:

Available Formats

2011 10th International Conference on Machine Learning and Applications

Developing an appropriate data normalization method

978-0-7695-4607-0/11 $26.00 © 2011 IEEE 195

You might also like