You are on page 1of 6

Towards Non-intrusive Speech Quality Assessment for Modern Telecommunications D. Picovici and A. E.

Mahdi
Department of Electronic and Computer Engineering, University of Limerick, Ireland Abstract In a competitive telecommunication market where price differentials have been minimized, quality of service (QoS) has become very important. Speech quality is a major dimension of perceived QoS, and the ability to monitor and design for this quality is a top priority. Traditionally, the perceived quality of speech has been measured by expensive and time-consuming subjective listening tests. Over the last decade, numerous attempts have been made to supplement subjective tests with objective estimators of perceived speech quality. This paper presents issues involved in modern objective speech quality assessment systems, in regard to their practicality, efficiency and accuracy. It also identifies the necessity for a non-intrusive, output-only based technique, and outlines some new ideas that could be utilised to develop an assessment method that uses only the output or the degraded speech signal.

1.

Introduction

Customers are now able to select their telecommunications service provider according to price and quality of service. The decision is no longer restricted by limited technology or fixed by monopolies. Large domains of services are available with different costs and quality of service. It is very important that a service provider is able to predict customers perception of quality in order to optimise and maintain the networks. Traditionally, networks have been characterised by applying linear assessment techniques and simple engineering metrics. As networks become more complex it is obvious that a more appropriate assessment is required. This role has typically been filled with subjective tests [1], which can be used to collect information regarding perceived speech quality. Unfortunately such tests are often very expensive, timeconsuming and labour-intensive. In some situations, as in the case of codec development and optimisation work, there is no need to conduct subjective tests. In this case the techniques involved relies on objective estimators of perceived speech quality, along with informal listening tests. The work presented here is an attempt to discuss issues involved in modern objective speech quality assessment systems, and identify the need for a non-intrusive objective assessment approach. It also outlines some potential ideas that could be utilised to develop non-intrusive speech quality assessment methods. Following this introduction, Section 2 standard methods used in telephony systems for measuring subjective speech quality. Various procedures and methods used in input-output based objective measures of speech quality, with more focus on the ITU-T recommended PSQM system, are presented in Section 3. Section 4 discusses the disadvantages of input-output based measures and highlights the need for a non-intrusive means of assessing the speech quality. It then introduces the concept of such measure, outlining two existing non-intrusive techniques. An outline of a new approach proposed by the authors for a non-intrusive, output-based proposing speech quality assessment approach is given in Section 5. The paper concludes with a review of all describes techniques, emphasising the need for non-intrusive speech quality assessment techniques that would fulfil the requirements of modern telecommunications services. 2. Subjective Speech Quality Speech quality could be estimated using subjective tests in which human participants appreciate the performance of a system. During these tests the subjects rate the quality in accordance with opinion scale, which symbolize the benchmark of quality. The most

commonly used methods for measuring the subjective quality of speech transmission over telephony systems have been standardized and recommended by the International Telecommunications Union (ITU-T). The recommended speech quality rating scales for both listening-only and conversational test is a 5-point category scale usually known as the quality scale [1]. Listeningonly tests can also be assessed via the listening effort scale. In conversational tests, a binary difficulty scale follows the (connection) quality scale. These scales are shown in Table.1
Listening-Quality Scale Quality of the speech/connection Score Excellent 5 Good 4 Fair 3 Poor 2 Bad 1 Listening-Effort scale Effort required to understand the meaning of sentences Score Complete relaxation possible; no effort required 5 Attention necessary; no appreciable effort required 4 Moderate effort required 3 Considerable effort required 2 No meaning understood with any feasible effort 1 Conversation Difficulty Scale Did you and your partner have any difficulty in hearing over the connection? Yes 1 No 0

Table.1:ITU recommended speech quality measurement scales

A listening test is performed by having a number of subjects (people) hear recordings degraded by candidate networks, and provide their opinion of the quality of each recording, or the effort required to understand it, on a scale of 1-5 (see Table.1). In conversational tests, subjects talk over the network under test before voting on its quality. Mean Opinion Score (MOS) is the averaged votes across subjects as a measure of the subjective quality of the test network. A main drawback of subjective testing is cost. In order to obtain realistic variability in voting, a relatively large numbers of subjects must be employed. Even with such large panels of subjects, the variance of MOS can still be high. The quality as expected by a customer will be different depending on whether, for example, the service is a cheap mobile telephone or a premium audio conference link. The constraints imposed by the need to limit cost and process size also limit the ability of subjective testing in accounting for other factors, such as expectation and environmental noise. On the other hand, different aspects of performance such as listening effort and listening quality are dependent on the opinion scale used. Hence, it will be more practical if an automatic assessment system exists whereby measures of both classes of quality as well as other engineering metrics could be obtained. 3. Objective Measures of Speech Quality With the rapidly evolving voice communication systems, there is currently an increasing need for robust objective speech quality measures that correlate well with subjective speech quality. Such measures would be valuable assessment tools of codecs and communication systems during design and validation stages.

Over the last decade, researchers and engineers in the field of objective measures of speech quality have developed different techniques based on various speech analysis models. Currently, the most popular techniques are those based on psychoacoustics models, referred to as perceptual domain measures [2]. In general, objective speech quality measures can be categorized according to the domain they perform: time domain, spectral domain or perceptual domain. Time domain measures are generally applicable to analogue or waveform coding systems in which the target is to reproduce the waveform. SNR and SNRseg are typical time domain measures [3]. Spectral domain measures have more credit than time-domain measures and are less susceptible to the occurrence of time misalignments and phase shift between the original and the coded signals. Most spectral domain measures are almost related to speech codec design. Their achievement is limited by the ability of the speech production models, applied in the coding process, to define the listeners auditory response. Perceptual domain measures based on models of human auditory perception have the best chance of predicting subjective quality of speech. In these measures, speech signals are transformed into a perceptually related domain including human auditory models. Theoretically, perceptually relevant information is both sufficient and necessary for a precise assessment of perceived speech quality. The perceived quality of the coded speech will be independent of the type of coding and transmission. It is estimated by a distance measure between perceptually transformed speech signals.
Input Speech Device Under Test Perceptual Transformation Distance Measure Perceptual Transformation Estimates of Perceived Speech Quality

Figure 1. Perception-based Approach to Quality Estimation

Currently there are a number of measures that can be classified as perceptual domain measures. These include the Bark Spectral Distortion (BSD), Perceptual Speech Quality (PSQM), Modified BSD (MBSD), Measuring Normalizing Blocks (MNB), PSQM+, Telecommunication Objective Speech Quality Assessment (TOSQA) and Perceptual Analysis Measurement System (PAMS) [3]. In between these, the PSQM was approved by ITU-T Study Group 12 and published by ITU as Recommendation P.861 in 1996 [3]. PSQM is a mathematical process that provides a measurement of the subjective quality of speech. Its objective is to create scores, which correlate well with subjective tests, especially MOS score. The PSQM testing process is shown in Figure 2. The output scale reflects a perceptual distance measure between the original and degraded signal. An intermediate scale is used to transform PSQM score into estimated MOS score. PSQM was created for application to the telephone band signals and, hence, it measures the distortion introduced by speech codecs in relation to human perception factors. As a particular application it is used for estimating speech quality when low bit-rate voice compression codecs or vocoders are implemented. The speech analysis model used in PSQM is depicted in Figure 3. The PSQM score in practice range from 0 to maximum 20. As an example, a 0 score suggests a perfect correlation between the input and output signals, which

most of the time is classified as perfect clarity. Higher score indicate increasing levels of distortion, often interpreted as lower clarity.
Output Signal (test) PSQM Score PSQM Input signal (reference) Transform from PSQM objective scale to Subjective scale

Input speech

Speech Encoding Decoding

Estimated MOS Score

Figure 2.PSQM Testing Process


Time-synchronized output signal

Perceptual Modelling

Internal Representation of output signal Quality Score Cognitive Modelling

Audible differences in internal representations

Input signal

Perceptual Modelling

Internal Representation of input signal

Figure 3. The PSQM Model

4. Non-Intrusive Speech Quality Measure All the objective assessment techniques presented in Section 3 are based on an input-to-output approach. In input-to-output objective quality assessment methods, the speech quality is estimated by measuring the distortion between an input and an output signal. Using a regression technique, the distortion values are then mapped into estimated quality. This means to use any of these measures it is necessary to gain access to both ends of a network connection. In many situations, such intrusive quality assessment poses few problems. First, it is very difficult to achieve synchronisation between the input and the output. Secondly, the measurements can be seriously affected by background noise, as in the case of mobile networks, and hence would not provide true measure of the networks quality of service. On the other hand, in some situations the original speech is not available, as in case of mobile communications or satellite communications. It is not always possible to have access to both ends of a network connection to perform intrusive speech quality prediction. Two main reasons for this could be: too many connections must be monitored, or because the far end locations are unknown. Specific distortions may only appear at the times of peak traffic when it is not possible to disconnect the clients and perform networks tests. In these situations speech quality prediction requires a listening model, which uses only the in-service signal. Having such non-intrusive assessment would be very appropriate is for predicting quality. Another benefit in using non-intrusive techniques could be for example the possibility to monitor the quality of service level agreements with customers

who may want to buy different quality connections at a varying tariff. Network infrastructure can be increased using low price non-intrusive units, which can monitor quality all the time. A full intrusive assessment would then be performed in order to provide detailed network diagnostics when the problem is identified. One approach to achieve a non-intrusive speech quality measure is to base the measurement on the output (or received) speech only. Figure 4 shows how a non-intrusive assessment might be implemented.
in-service speech stream to customer good extract talker characteristic speech parameterization quality prediction acceptable poor

Figure 4. Non -intrusive assessment

The field of estimating the speech quality using only received speech without access to the input record is relatively new area. Most recently, a couple of attempts to develop more credible non-intrusive speech quality measurements based on perceptual analysis have been reported. An example of these is an output based speech quality measure which uses only the visual effect of a spectrogram of the received speech signal, reported in [4]. A spectrogram is a two dimensional representation of time dependent frequency analysis, and contains acoustic and phonetic information of the speech signal. Framing the spectrograms into blocks and using digital image processing, the method achieved a reported correlation factor of 0.65 with the subjective score. An algorithm which uses perceptual-linear prediction (PLP) coefficients to compare the output speech with a set of vectors derived from a variety of undegraded source speech material is described in [5]. Most recently, Gray et al [6] reported a novel use of the vocal-tract modelling techniques which enables prediction of the quality of a network degraded speech stream to be made in non-intrusive way. 5. New Output-based Assessment Approach A new approach for developing a working model for a robust output-based objective speech quality measure which correlates well with predicted subjective test, have been suggested by the authors. The approach, which is based on the model reported in [5], is depicted in Figure 5.
Perceptual Transformation & Extraction of Speakerindependent Parameters

Received Speech Signal

Auditory Distance (AD) Distance Measures Logistic Function

Estimated MOS Score

Undegraded Source Speech Signals

Data Mining: Classification (Determine Closest Test Vectors) Perceptual Transformation & Extraction of Speakerindependent Parameters Data Mining: Clustering Of Vectors

Reference Book

Figure 5. New approach in output based speech quality assessment

The approach can be outlined as follows. Speaker independent parameters are derived from the received speech signal using perceptual transformation techniques, such as mel-cepstrum and PLP [5]. Similar parameter vectors derived from a variety of undegraded source speech signals are used to produce a reference code-book corresponding to high quality speech. Clustering techniques such as those offered by modern data mining techniques [7] will be used to optimally define such a code-book. The vector derived from the received signal and the nearest test vector taken form the code-book will then be compared using distance measures techniques. Data mining classification techniques are again used here to accurately determine the nearest test vectors. Following this comparison the auditory distance scale will be mapped to a subjective quality scale using an appropriate logistic function. Work on the development, implementation and verification of the approach is currently well under-way. 6.Conclusions We have attempted to discuss issues involved in speech quality assessment for modern telecommunication systems, emphasising the necessity for a non-intrusive, output-based technique. A new potential approach for such technique, which is currently under investigation, has been outlined. There is no doubt that subjective tests remain the more accurate means for predicting the perceived speech quality in telephony systems. However, day-by-day the subjective method is becoming more inefficient requiring too much time and resources to accomplish. Hence, the objective speech quality measures, which can provide an automatic assessment means, turn out to be more suitable for a rapidly changing domain such as telecommunications. Currently all objective speech quality assessment systems are inputoutput based perceptual domain measures. Such intrusive assessment techniques require the withdrawal of the network under test from normal service, interrupting its availability to customers and causing loss of revenue to the service provider. Therefore the availability of a robust non-intrusive objective speech quality assessment method, whose scores correlate well with the subjective tests, is of a crucial importance for todays telecommunications service providers. References:
[1] ITU-T Rec. P.800, Methods for Subjective Determination of Transmission Quality, August 1996. [2] S.Voran, Objective Estimation of Perceived Speech Quality-Part I: Development of the Measuring Normalizing Block Technique, IEEE Trans. on Speech and Audio Process., Vol., No. 4, pp. 371382, 1999. [3] J. Anderson, Methods for Measuring Perceptual Speech Quality, Agilent Technologies-White Paper, USA, May 2001. [4]O.C. Au and K. H. Lam A Novel Output-Based Objective Speech Quality Measure for Wireless Communication, IEEE Proceedings of ICSP 98, Vol. 1, pp. 666-669, Beijing, China, Oct. 1998. [5] C. Jin and R. Kubichek, Vector Quantization Techniques for Output-Based Objective Speech Quality, Proc. ICASSP-96, Vol.1, pp. 491-494, Atlanta, May 1996. [6] P. Gray, M. P. Hollier and R. E. Massara, Non-Intrusive Speech-Quality Assessment Using VocalTract Models, IEE Proc. Vis. Image Signal Process., Vol. 147, No. 6, PP. 493-501, 2000. [7] P. R. Limb and G. J. Meggs, Data Mining Tools and Techniques, BT Technol. J., Vol. 12, No. 4, pp. 32-41, 1994.

You might also like