You are on page 1of 10

“BOON TO BLIND” USING “JAWS” SOFTWARE

an ARTIFICIAL INTELLIGENCE application


flat surface. The display sits under the computer
keyboard.

ABSTRACT:
The first part of this paper introduces the necessity of
refreshable Braille display, device which rests on the
It is used to present text to computer users who are blind
keyboard by means of raising dots through holes on a flat
and cannot use a normal computer monitor. Speech
surface .The second part focuses on the fundamentals of
synthesizers are also commonly used for the same task,
the speech synthesis which converts text to speech, and the
and a blind user may switch between the two systems
speech units. The third part portrays the speech production
depending on circumstances .Because of the complexity
by the vocal chords and acoustic phonetics. The next part is
of producing a reliable display that will cope with daily
dealt with the detailed description on various converted
wear and tear, these displays are expensive. Usually, only
speech units
40 or 80 Braille cells are displayed. Models with 18-40
(words,sylablles,demisylablles,diphone,allophone,phone
cells exist in some note taker devices.On some models the
me), the way in which these primitive speech units are
position of the cursor is represented by vibrating the dots,
combined to form a full speech with phonetic sounds . The
and some models have a switch associated with each cell
process indulged in the analysis of speech signals and the
to move the cursor to that cell directly.The mechanism
various synthesis methods, the coding of vocal tract
which raises the dots uses the piezo effect: Some crystals
parameters using Linear Predictive Coding (LPC)
expand, when a voltage is applied to them. Such a crystal
synthesis with its block diagram
is connected to a lever, which in turn raises the dot. There
and finally the conclusion part ,followed by bibliography
has to be such a crystal for each dot of the display, i.e
eight per character.
INDEX
• Introduction The software, that controls the display is called a
• Why JAWS screen reader. It gathers the content of the screen from the
• Fundamentals of speech synthesis operating system, converts it into braille characters and sends it

• Speech units to the display. Screen readers for graphical operation systems are

• Speech production and Acoustic units especially complex, because graphical elements like windows or
slidebars have to be interpreted and described in text form.A
• Synthesis methods
new development, called the rotating-wheel Braille display, was
• conclusion
developed in 2000 by the National Institute of Standards and
Technology (NIST) and is still in the process of
commercialization. Braille dots are put on the edge of a spinning
INTRODUCTION:-
wheel, which allows the blind user to read continuously with a
The refreshable Braille display is an electro-
stationary finger while the wheel spins at a selected speed. The
mechanical device for displaying Braille characters,
Braille dots are set in a simple scanning-style fashion as the dots
usually by means of raising dots through holes in a
on the wheel spins past a stationary actuator that sets the
Braille characters. As a result, manufacturing complexity is
greatly reduced and rotating-wheel Braille displays will be the conflicting demands of maximising the quality of
much more inexpensive than traditional Braille displays. speech, while minimising memory space, algorithmic
complexity, and computation speed. A block diagram
of the steps in speech synthesis is shown below. The
text might be entered by keyboard or optical
character recognition or obtained from a stored data
base. Speech synthesizers then convert the text into a
sequence of speech units by the lexical access
routines. Using large speech units, such as phrases
and sentences, can give high quality output speech
but requires much more memory.

FUNDAMENTALS OF SPEECH SYNTHESIS:

Speech synthesis is the automatic generation of


speech waveforms based on an input of text to
synthesize and on previously-analyzed digital
speech data. The critical issues for current
speech synthesizers concerns trade-offs among

fig: block diagram of text to speech synthesis

The stored speech units are retrieved and concantenated to summary, the most important components of a
output the synthetic speech. Speech is often modeled as synthesizers algorithm are
the response of a time-varying linear filter (corresponding
to the vocal tract from the glottis to the lips) to an 1. stored speech units: storage of speech
excitation waveform consisting of braoadband noise, a parameters, obtained from natural speech and
periodic waveform of pulses, or a combination. In organized in terms of speech units and
2. Concatenation routines: a program of rules to human speech production: coarticulation, intonation, and
concatenate these units, smoothing the vocal-tract excitation. Most synthesisers reproduce speech
parameters to create time trajectories. bandwidth in the range of 0-3 kHz (e.g. for telephone
applications) or 0-5 kHz (for higher quality). Frequencies
Real time speech synthesiser has been available for many up to 3 kHz are sufficient for vowel perception since
years. Such speech is generally intelligible, but lacks vowels are adequately specified by the first three
naturalness. Quality inferior to that of human speech is formants. The perception of some consonants, however, is
usually due to inadequate modelling of three aspects of highly impaired if energy in the 3-5 kHz range is omitted.

SPEECH UNITS:

The choices of speech units determine the storage memory required and the quality of the synthetic speech. Some possible units are
described in the following table. The number in the bracket under the column "Quantity" is the sufficient number of the subword units
to describe all English words.

Units Quantity Descriptions Advantages/Disadvantages


Words 300,000 The fundamental units of a sentence. Advantages:
(50,000)  yield high quality speech
 simple concatenation synthesis algorithm

Disadvantages:

 limited by the memory requirement


 concatenation of isolated words will degrade
intelligibility and naturalness of synthetic speech
 need to adjust the duration of word
Syllables 20,000 It consists of a nucleus (either a vowel or diphthong) Disadvantage:
(4400) plus some neighbouring consonants.  syllable boundary is uncertain
Demisyllable 4500 It is obtained by dividing syllables in half, with the cut Advantages:
(2000) during the vowel, where the effects of coarticulation  preserve the transition between adjacent phones
are minimal.  simple smoothing rules
 produce smooth speech
Diphone 1500 It is obtained by dividing a speech waveform into Advantages:
(1200) phone-sized units, with the cuts in the middle of each  preserve the transition beteween adjacent phones
phone.  simple smoothing rulesproduce smooth speech
Allophone 250 They are phonemic variants. Advantage:
 reduce the complexity of the interpolation
algorithm comparing to phonemes
Phonemes 37 It is the fundamental unit of phonology. Advantages:
 memory requirement is small

Disadvantages:

 require complex smoothing rules to represent the


coarticulation effect
 need to adjust intonation at each context
Consider the English word "segmentation". Its representation according to each of the above subword unit sets is:

The vertical lines is the segmentation lines with respect to The easiest way to understand the nature of phonemes is
the speech units. The diphone unit /s_eh/ is different from to consider a group of words like ``hid'', ``head'', ``hood'',
the demisyllable unit /s_eh/ even we use the same and ``hod''. All these words are made up of an initial, a
symbols. The segmentation lines give us a picture on how middle, and a final element. In our examples, the initial
each speech units related to each other. and final elements are identical, but the middle elements
are different. It is the difference in this middle element
SPEECH PRODUCTION AND that distinguish the four words. Similarly, we can find
ACOUSTIC-PHONETICS:
those sounds which differentiate one word from another
The English phonemes can be classified according to how for all the words of a language. Such distinguishing
they are produced by the vocal organ. An understanding sounds are called phonemes. There are about fifty
of speech production mechanism will help us to analyse phoneme units in English.
the speech sounds. Acoustic phonetics is the study of the
physics of the speech signal. When sound travels through A phoneme is not a single sound but a set of sounds
the air from the speaker's mouth to the hearer's ear it does sharing similar characteristics. The elements in the set is
so in the form of vibrations in the air. It is possible to called allophones. Every phoneme consists of several
measure and analyse these vibrations by mathematical allophones, i.e., several different ways of pronunciation.
technhiques studied in the physics of sound. The square brackets are usually used to indicate the
allophonic symbols. For example, pre-vocalic stops have
SPEECH UNITS: both a closure phase and a release phase. Stop releases are
classified further as aspirated or unaspirated. Aspiration is
The acoustic signals are converted by listener into a
a breathy noise generated as air passes through the
sequence of words and sentences. The most familiar
partially closed vocal folds and into the pharynx. The
language units are words. They can be thoughts of as a
allophonic symbol [ph] is used to represent a plosive
sequence of smaller linguistic units, phoneme, which are
sound /p/ with an aspiration.
the fundamental units of phonology. We will use
ARPABET symbols in the rest of this tutorial.
In the study of speech, a phone is a unit at the phonetic are approximately 10,000 syllables. A even larger
level. There is not always a one-to-one correspondence linguistic unit is the word, which normally consists of
between the units at the phonetic level and those at the sequences of several phonemes that combine into one or
phonemic level. For example, the word ``can't'' is more syllables. Words are combined into still longer
phonemically /k ae n t/ (4 phonemic units), but may be linguistic units called sentences. The structure of
pronounced [k \~ae t] with the nasal consonant phoneme sentences is described by the grammar of the language.
absorbed into the preceding vowel as nasalisation (3 Grammar includes phonology, morphology, syntax, and
phonetic units). semantics.

Phoneme can be combined into larger units called The following figure shows a segment of vowel /ix/. The
syllables. A syllable ussually consists of a vowels quasi-periodicity (almost periodic) of voiced speech can
surrounded by one or more consonants. In English there be observed.

air stream to flow through at a high enough velocity to


produce turbulence. This turbulent air sounds like a hiss
Fricative sounds are generated by constricting the vocal
e.g. /hh/ or /s/.
tract at some point along the vocal tract and forcing the

following by a sudden release of it. This mechanism


produces sounds like /p/ and /g/. The following figure
Plosive or stop sounds are resulted from blocking the
shows the stop /g/. The silence before the burst is the stop
vocal tract by closing the lips and nasal cavity, allowing
closure.
the air pressure to build up behind the closure, and
Affricates is a combination of stop and fricative sounds. found in.ACOUSTIC PHONETICS ANALYSIS OF
Maner of articulation is concerned with airflow i.e. the SPEECH SIGNALS:
paths it takes and the degree to which it is impeded by
vocal tract constrictions. Stops, fricatives, and affricates Based on the knowledge of the speech production

are collectively called obstruent phonemes which are mechanism and the study of acoustic phonetics, we are

weak and aperiodic and are primarily excited at their able to extract a set of features which can best represent a

major vocal tract constriction. particular phoneme. One of the popular techniques is the
spectrogram which describe how the frequency contents
While manner of articulation and voicing partition of a speech signal change with time.
phonemes into three broad classes, it is the place of
articulation (point of narrowest vocal tract constriction) Suppose we have a signal which is sampled at 16 KHz,

that enables finer discrimination of phonemes. Vowels are the typical steps to compute the spectrogram are described

primarily described in terms of tongue position and lip as follows. The speech is blocked by Hamming window

rounding. The significant places of articulations are the with duration of 256 samples and is overlapped by 200

lips (bilabial), lips and teeth (labio-dental), teeth (dental), samples. A s56-point FFT (fast Fourier transform) is

upper gums (alveolar), hard plate (palatal), soft plate applied to each windowed speech. The power spectra in

(velar), and glottis (glottal). A table shows the places and dB is plotted. The formant frequencies are displayed by

manners of articulation for English consonants can be horizontal bars in the spectrogram while the vertical lines
there indicate the pitch period (i.e. the inverse of the
fundamental frequency).
smooth the transitions but it can yield good
quality speech.
The formant frequency is the resonance frequency of the
o Terminal analog synthesisers (e.g.
vocal tract. The formant frequencies of a vowel are
LPC vocoder) model the speech output of the
determined by the parameters of the vocal tract
vocal tract without explicitly taking account of
configuration such as the length of vocal tract, the
articulator movements. This method generate
position of tongue, and the shape of lips. The first three
lower quality speech but it is more efficient.
formants are the primary cues to recognise English
o Articulatory synthesizer s directly model the
vowels. For unvoiced sounds, such as /hv/ and /jh/ in the
vibrating vocal cords and the movement of
spectrogram, the primary cues for recognition is the
tongue, lips, and other articulators. Due to the
energy distribution along the frequency axis.
difficulty of obtaining accurate three-
dimensional vocal tract representations and of
modeling the system with a limited set of
Synthesis Methods:
parameters, this method has yiwlded lower-
Synthesisers can be classified by how they parameterise quality speech.
the speech for storage and synthesis. o Formant synthesizers use a buzz generator to
simulate the vocal cords and individual formant
o Waveform synthesizers concatenate speech resonators to simulate the acoustic resonaces of
units using stored waveform. This method the vocal tract.
requires large memories and good algorithm to
We will introduce some popular speech synthesis
methods in the following sections.
Linear Predictive Coding Synthesis:

Linear predictive coding (LPC) synthesis is a very coefficients is typically 10 to 20 i.e. 2 for each formants
efficient method. All the vocal tract parameters are in the 4-5 kHz bandwidth of the speech signal, plus a few
represented in a set of LPC coefficients, which are for modelling spectral zeros and the source of vocal tract
calculated automatically from the natural speech signals. excitation. Speech synthesis model based on LPC is as
The number of shown as below.

In parametric synthesis, such as LPC, speech units are


divided into short frames (10-30 ms) of samples; the
The LPC synthesis filter is excited by either a periodic
speech signal is analysed through a "time window" to
(typically impulsive) source or a noise source, depending
obtain a spectral representation of the unit.
on whether the analysed speech is estimated to be voiced
Parameterisation of successive frames adequately
or not. The excitation is specified by a gain factor, a
models the speech if the frames are short compared to
voiced/unvoiced (periodicity) bit, and (if voiced) a
vocal tract motion. For each fames, LPC analysis
fundamental frequency value.
produces a set of p real-valued predictor coefficients ak

,which represent an optimal estimate to the spectrum of a typically periodically (every 10-30 ms frame). Often the
frame of speech using poles .Spectral and excitation parameters from succesive frames are linaerly
parameters are fetched from the stored speech units,
interpolated during a frame, to allow more frequenct parameters. Formant synthesisers have an advantage over
updates to the synthesiser. LPC systems in that bandwidths can be more easily
manipulated and that zeros can be directly introduced into
Formant Synthesisers: the filter. However, locating the popes and zeros
automatically in natural speech is a more difficult task
Formant synthesiser employs formant resonances and
than automatic LPC analysis. Most of formant
bandwidths to represent the stored spectrum. The vocal
synthesisers use a model similar to that shown below.
tract filter is usually represented by 10-23 spectral

fig: block diag of format synthesizer

A parallel bank of resonators may also be used to


generate both vowels and consonants. Compared with
The excitation is either a periodic train of impluses (for the cascaded approach, parallel synthesis required
voiced speech), pseudo-random noise (for unvoiced calculation of each formant amplitude as well as an extra
speech), or periodically shaped noise (for voiced amplitude multiplication per formant in the synthesiser.
fricatives). These sources are multiplied by a gain (i.e.
voice amplitude, aspiration amplitude, and frication Synthesis of nasals usaually require an extra resonator
amplitude) to adjust the amplitude of the synthetic speech. because the production of nasals including the nasal
The vocal tract is usually modelled as a cascade of cavity and hence the acoustic path of nasals is longer
resonators. Each representing either a formant or the than for vowels. This increases the number of resonance
spectral shape of the excitation source. A cascade of in the speech bandwidth.
resonators are often used for vowel generation.
CONCLUSION:

PRESENTED BY:
Finally ,an effective ,interactive communicative
R.S.V.N.Praveen Kmar (04761a1251).(phone: 9849178896)
medium has been deviced with the glimpse of artificial
Mailid: praveenrsvn@yahoo.co.in
intelligence using the Braille displays. Speaker
S.K.Srinath(04761a1229).
independent speech synthesizers could presume highly
Mailid:bannu5s@gmail.com(phone :9440496675)
sophisticated algorithms,that consume much time and
Lakireddy Balireddy College of Engineering
space complexities

Thus the JAWS persuade the blind people to make use


of computer as usual as a normal man.

BIBLOGRAPHY:

The following websites have been taken for


reference:
1. www.wikipedia.com
2. www.asel.udel.edu

You might also like