Professional Documents
Culture Documents
i := i + 1 j := j + 1 pf PSNR
Image O i j
Decompression Tolerable
SSIM
values? yes
Image Oi Psychovisual
deviation no
no no
The non-critical spot identification is outlined in Figure This approach assumes an application which always
1. The imaging application is run with a set of M inputs calculates the same function. This is given in many real
I1 , I2 , . . . , IM in the absence of faults to obtain reference systems. For instance, the JPEG compressor used in this
images O1 , O2 , . . . , OM . In case of the JPEG compressor paper is assumed to perform JPEG compression only, al-
considered in this paper we perform JPEG compression though it theoretically could be possible to use the same
with M uncompressed images I1 , I2 , . . . , IM as inputs. hardware for different calculations for which the metrics
Let pf1 , pf2 , . . . , pfK be the set of all permanent errors from Section 3 would not be valid. The data produced
(stuck-at faults) in the flip-flops. For every input Ii and of the JPEG compression might also be processed by a
every permanent fault pfj , we run the application with machine based on, e.g., pattern-matching algorithms in a
input Ii assuming fault pfj to obtain the (possibly erro- biometric application, rather than a human. Metrics which
neous) compressed output image. This image is then de- assume human cognitive system may not be effective in
compressed (with no additional errors injected), resulting such a scenario.
pf
in image Oi j . We apply three metrics introduced in Sec- If the use of the system is not known in advance (as is
pf
tion 3 to image Oi j to determine whether a human user often the case with programmable or reconfigurable hard-
would distinguish it from the error-free image Oi . ware), the switchable protection feature can be used: if the
If a permanent stuck-at fault leads to a perceptible ef- block is used according to its ‘main’ specification (here:
fect (i.e., at least one of three metrics reports a value image compression), protection is switched off for non-
which exceeds a pre-defined threshold), the correspond- critical spots; if the block is used in an unknown appli-
ing soft error (i.e., the same stuck-at fault with duration cation, the complete protection is switched on resulting
of one cycle) is considered critical. The soft errors whose in larger power consumption. This technique imposes
corresponding permanent fault resulted in output images area cost for the protection even if it is not used; this
with imperceptible deviations for all M inputs are con- cost is believed to be less important compared to the ad-
sidered non-critical and excluded from hardening. This is ditional power consumption in state-of-the-art manufac-
done because a soft error with one cycle duration is intu- turing technologies. In safety-critical applications, full
itively less severe than a permanent fault which persists (rather than selective) hardening is preferable. In this pa-
during the whole run time of the application. per, we consider situations in which full hardening is not
an option due to power consumption or area constraints. on the FPGA [21, 22, 35, 36], however we did not have
The proposed approach is not complete in the sense access to the required FPGA type. We did not use any
that there could exist an additional input I ∗ for which specific injection software [37–40]. We forced the out-
an error on a spot identified as non-critical would lead put signal of a flip-flop to the desired erroneous value for
to an image with a severe rather than tolerable effect. It the duration of the simulation run (if permanent errors
would theoretically have been possible to employ com- were considered) or for one clock cycle (if soft errors were
plete methods such as model checking instead. We per- considered). We used images from the USC-SIPI Image
formed our analysis based on simulation for the following Database (http://sipi.usc.edu/database/) as
reasons: input images to be compressed.
We first performed the analysis outlined in Figure 1
• Complexity: while formal methods did develop in for M = 7 images. We classified an erroneous image
the last years, it is still unrealistic to formulate met- with the following characteristics as tolerable: PSNR ≥
rics from Section 3, which include complex math- 30 dB, SSIM ≥ 95% and psychovisual deviation ≤ 22. In
ematical functions, as properties which stretch over the beginning of the analysis, all possible error locations
millions of cycles necessary to calculate the entire were considered to be non-critical spots.
image and to perform the verification using state-of-
For the first considered image Obst, 5853 out of 10558
the-art tools. Formal methods could be applied for
permanent errors were found tolerable. The locations of
rather simple design and properties in [12].
the remaining errors were excluded from the list of non-
• Experiments: as will be demonstrated in Section 5, critical spots. For the second considered image Barbara,
even small number of inputs M leads to a selection 74 of these errors produced images violating the specifi-
of non-critical spots on which no severe soft errors cation. Their locations were also excluded from the list of
occur. It can be expected that a severe soft error on a non-critical spots. For five additional images Girl, Gold-
location on which a permanent error is most probably hill, Lena, Pens and Tiffany, none of the remaining errors
tolerable is extremely improbable. showed intolerable effect. Hence, we considered the total
of 5779 spots (54.7%) non-critical.
• Inherent incompleteness: there can be no absolute Table 1 reports the breakdown of non-critical spots for
protection against all possible soft errors, as there is sub-modules of the circuit. For each sub-module, the per-
a (very small) probability of an event not covered by centage of its flip-flops among all flip-flops of the cir-
the protection scheme such as an error in multiple cuit and the percentage of non-critical spots in that sub-
flip-flops or an error in a protected flip-flop. Even module are given. The largest fraction of non-critical
a complete method to select non-critical flip-flops spots are found in Q-ROM which stores values used in
would not rule out any chance of a severe error. quantization. This may be due to the fact that quantiza-
tion is inherently associated with information loss and ad-
ditional impact caused by the errors is limited. The largest
5 Experimental Results fraction of critical spots are located in blocks with rela-
tively large shares of combinational logic: compression
We applied our analysis on a JPEG Compressor available buffers and arithmetic parts of DCT-2D core.
in VHDL source code from www.opencores.org Some errors did not have any effect at all, i.e., the er-
[34]. The compressor reads an uncompressed image and roneous chip calculated exactly the same image as the
writes the compressed data into a memory. If synthesized
on FPGA XCV21000-4 by Xilinx with 40 MHz clock fre-
quency, it can compress up to 24 352 × 288-pixel images Sub-module # flip-flops Percentage of
per second. The compressor consumes 5018 slices, 8877 (% of total circ.) non-critical spots
look-up-tables, 3947 flip-flops, 77 I/Os, 13 16-bit RAMs Q-tables 2.7% 56.1%
Huffman ROM 4.8% 60.8%
and 2 48-bit DSP blocks. There are a total of 5279 flip-
Q-ROM 3.6% 71.3%
flops available for error injection and thus a total of 10558
Y compression buffer 4.1% 26.7%
permanent stuck-at-0 and stuck-at-1 errors. Cb compression buffer 3.9% 26.6%
Cr compression buffer 3.9% 28.8%
5.1 Characterization DCT-2D core (arithmetic) 4.9% 29.7%
DCT-2D core (registers) 72.1% 64.4%
We performed software error injection (both soft and per- Total 100% 54.7%
manent) on VHDL code. It would be possible to synthe-
size the circuit on an FPGA and perform error injection Table 1: Percentages of non-critical spots per sub-module
(a) (b)
(c) (d)
Figure 2: Image Barbara compressed in absence of errors (a), in presence of a tolerable permanent error (b) and in
presence of a severe permanent error (c), (d)
error-free chip. For such images, PSNR was ∞, SSIM nent fault injected and subsequently decompressed with-
was 100% and psychovisual deviation was 1. The per- out fault injection. The X-coordinate gives the value of
centage of permanent errors which did not have any effect PSNR, the Y-coordinate indicates the value of SSIM. Data
at all varied between 20.0% and 22.8% among the seven points according to images which exceeded the limit for
considered images, which is far below 54.7%. Hence, al- psychovisual deviation (22) are shown in red, other data
lowing a limited deviation indeed increases the number of points are shown in black.
non-critical spots significantly.
To evaluate the relevance of individual metrics for ac-
ceptability calculation, we repeated the characterization
of image Obst using only one of the three considered
metrics. When only the PSNR value was required not to
exceed the acceptability threshold and SSIM and psycho-
visual deviation were both ignored, the number of non-
critical spots was 6350. The same experiment using SSIM
only resulted in 5854 non-critical spots. Employing psy-
chovisual deviation as the only metric yielded 8773 non-
critical spots. It appears that SSIM with acceptability
threshold of 95% is the most challenging criterion within
the composite metric.
Figure 3 shows the PSNR, SSIM and psychovisual de-
viation values for images generated during characteriza-
tion of image Obst. Every data point corresponds to Figure 3: PSNR, SSIM and psychovisual deviation values
one image compressed by the JPEG circuit with a perma- in characterization of image Obst
Image # err # sim No effect Tolerable Threshold variations
# % # % psych. deviation PSNR SSIM
≤ 21 ≤ 23 ≥ 25 ≥ 35 ≥ 90 > 99
Yacht 1 300 300 100.0 300 100.0 300 300 300 300 300 300
Boats 1 300 300 100.0 300 100.0 300 300 300 300 300 300
Fruits 10 300 299 99.7 300 100.0 300 300 300 300 300 300
Flower 10 300 299 99.7 300 100.0 300 300 300 300 300 300
Flower 100 300 284 94.7 300 100.0 300 300 300 300 300 300
Yacht 100 300 289 96.3 300 100.0 300 300 300 300 300 300
Figure 3 suggests good, yet not perfect, correlation be- percentage of simulations in which an image fulfilling the
tween PSNR and SSIM. Psychovisual deviation does not specification was produced (‘Tolerable’). Images which
seem to track well with other metrics. This is probably were used to derive the non-critical spots are marked by
due to the fact that PSNR and SSIM are global metrics an asterisk (*). Most simulations resulted in the error-free
while psychovisual deviation concentrates on 8 × 8 pixel image. The remaining few simulations yielded tolerable
blocks. Still, images with an unacceptable psychovisual images. No severe error according to the used definition
deviation tend to have rather low PSNR and SSIM values. showed up in any of the simulations. Hence, the exper-
iments support that no hardening is needed on the spots
5.2 Validation identified as non-critical by the proposed analysis.
To validate the effectiveness that the selection of non- We repeated the experiment with soft errors injected
critical spots, we performed a number of simulations on spots not identified as non-critical. Figure 4 illustrates
where 1, 10 or 100 soft errors per simulation (a complete tolerable and severe error effects produced by simulations
image compression) were randomly injected on spots of image Cornfield. The results of the experiment are
identified as non-critical. The results are summarized in quoted in Table 3. In addition to the absolute percentages
Table 2. The table reports the number of errors injected (given in columns ‘%’), the normalized percentages ob-
per simulation run (‘# err’), the total number of performed tained assuming that soft errors occur on both critical and
simulations (‘# sim’), the number and the percentage of non-critical spots and that soft errors on non-critical spots
simulations in which an image identical to the reference do not result in severe effects are reported in columns
image was produced (‘No effect’) and the number and the ‘%n ’. Normalized percentage is calculated by multiply-
(a) (b) (c)
Figure 4: Image Cornfield compressed in absence of errors (a), in presence of a tolerable soft error (b) and in presence of
a severe soft error (c)
ing the actual percentage by 43.7% (the portion of critical variation. Most important, no errors on non-critical spots
spots in the circuit) and adding 100% (assumed probabil- resulted in intolerable images even if the acceptability
ity that a soft error on a non-critical spot does not result in thresholds were tightened.
a severe effect) multiplied by 54.7% (the portion of non-
critical spots in the circuit). It can be seen that a limited
yet significant number of severe errors do occur if criti- 6 Conclusions
cal spots are not hardened. Hence, if selective hardening
is done, it should concentrate on the critical spots rather Selective hardening is a promising direction to obtain soft
than all locations in the circuit. error rate improvement at realistic cost. We demonstrated
that by considering the inherent error tolerance of an ap-
plication a significant number of spots in a circuit can
5.3 Variations of the thresholds be classified as non-critical and excluded from harden-
ing. Taking high-level information normally ignored dur-
To evaluate whether the found solution is stable under
ing circuit robustness optimization into account the num-
variations of acceptability thresholds (22 for psychovisual
ber of circuit spots to be hardened could be reduced by
deviation, 30 for PSNR and 95 for SSIM), we repeated the
more than a factor of two at no cost. The concept of cog-
evaluation of the found solution with one of the thresholds
nitive resilience is not restricted to the JPEG compressor
increased or decreased (the other two thresholds were kept
studied in this paper or generally imaging circuits. Inves-
on the nominal values). The final six columns of Tables
tigating cognitive resilience in other application fields is
2 and 3 contain the numbers of experiments where effect
an interesting point of future research. Moreover, alter-
tolerable under the new definition were observed.
native methods to validate the non-criticality of a circuit
For errors injected on non-critical spots, all experi-
location not based on simulation are of interest.
ments resulted in a tolerable image. In contrast, the data
in Table 3 does show some sensitivity to the thresholds, in
particular for the highest error rate (100 errors injected).
Tightening the threshold for psychovisual deviation often
Acknowledgment
leads to many unacceptable outcomes of the experiment,
This work was supported in part by the DFG project Re-
though no effect at all is observed altogether for image
alTest (BE 1176/15-1).
Cablecar. It appears that some errors influence a limited
area of the image which is captured by the psychovisual
deviation but is averaged out when PSNR and SSIM is 7 References
calculated.
Loosening PSNR and SSIM thresholds have very sim- [1] P.E. Dodd and L.W. Massengill. Basic mechanisms and modeling
of single-event upset in digital microelectronics. IEEE Trans. on
ilar effects, whereas tightening the SSIM threshold seems Nuclear Science, 50(3):583–602, 2003.
to have larger influence on the results compared to tight-
[2] H.T. Nguyen and Y. Yagil. A systematic approach to SER esti-
ening the PSNR threshold. Overall, the outcome of the mation and solutions. In Int’l Reiliability Physics Symp., pages
experiment is quite stable under acceptability threshold 60–70, 2003.
[3] P. Shivakumar, M. Kistler, W. Keckler, D. Burger, and L. Alvisi. and Integrated Systems, page 137, 2005.
Modeling the effect of technology trends on the soft error rate of [23] N.J. Wang, J. Quek, T.M. Rafacz, and S.J. Patel. Characteriz-
combinational logic. In Int’l Conf. on Dependable Systems and ing the effects of transient faults on a high-performance proces-
Networks, pages 389–398, 2002. sor pipeline. In Int’l Conf. on Dependable Systems and Networks,
[4] N. Seifert, X. Zhu, and L.W. Massengill. Impact of scaling on 2004.
soft-error rates in commercial microprocessors. IEEE Trans. on [24] C. Weaver, J. Emer, S.S Mukherjee, and S.K. Reinhardt. Tech-
Nuclear Science, 49(6):3100–3106, 2002. niques to reduce the soft error rate of a high-performance micro-
[5] I. Polian, J.P. Hayes, S. Kundu, and B. Becker. Transient fault char- processor. In Int’l Symp. On Comp. Architecture, pages 264–275,
acterization in dynamic noisy environments. In Int’l Test Conf., 2004.
2005. [25] X. Li and D. Yeung. Application-level correctness and its impact
[6] R. Baumann. Soft errors in advanced computer systems. IEEE on fault tolerance. In Int’l Symp. on High Performance Computer
Design & Test, 22(3):258–266, 2005. Architecture, pages 181–192, 2007.
[7] K. Mohanram and N. Touba. Partial error masking to reduce soft [26] J.W.S. Liu, W.-K. Shin, K.-J. Lin, R. Bettati, and J.-Y. Chung. Im-
error failure rate in logic circuits. In Int’l Symp. on Defect and precise computations. Proc. of the IEEE, 82(1):83–94, 1994.
Fault Tolerance, pages 433–440, 2003. [27] I. Polian, B. Becker, M. Nakasato, S. Ohtake, and H. Fujiwara.
[8] K. Mohanram and N.A. Touba. Cost-effective approach for reduc- Period of grace: A new paradigm for efficient soft error harden-
ing soft error failure rate in logic circuits. In Int’l Test Conf., pages ing. In GI/ITG Workshop “Testmethoden und Zuverlässigkeit von
893–901, 2003. Schaltungen und Systemen”, 2006.
[9] C. Zhao, S. Dey, and X. Bai. Soft-spot analysis: Targeting com- [28] Y. Wang, S. Wenger, J. Wen, and A.K. Katsaggelos. Error resilient
pund noise effects in nanometer circuits. IEEE Design & Test of video coding techniques. IEEE Signal Process. Mag., 17(4):61–
Comp., 22(4):362–375, 2005. 82, 2000.
[10] A.K. Nieuwland, S. Jasarevic, and G. Jerin. Combinational logic [29] W.-Y. Kung, C.-S. Kim, and C.-C.J. Kuo. Spatial and temporal
soft error analysis and protection. In Int’l On-Line Test Symp., error concealment techniques for video transmission over noisy
2006. channels. Trans. Circuits and Systems for Video Tech., 16(7):789–
[11] J.P. Hayes, I. Polian, and B. Becker. An analysis framework for 802, 2006.
transient-error tolerance. In VLSI Test Symp., pages 249–255, [30] Z. Wang and A.C. Bovik. Modern Image Quality Assessment.
2007. Morgan and Claypool Publishers, 2006.
[12] S.A. Seshia, W. Li, and S. Mitra. Verification-guided soft error [31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image
resilience. In Design, Automation and Test in Europe Conf., 2007. quality assessment: From error visibility to structural similarity. In
[13] M. Zhang, S. Mitra, T.M. Mak, N. Seifert, N.J. Wang, Q. Shi, IEEE Transactions on Image Processing, volume 13, pages 600–
K.S. Kim, N.R. Shanbhag, and S.J. Patel. Sequential element 612, April 2004.
design with built-in soft error resilience. IEEE Trans. on VLSI, [32] SSIM C++ implementation. http://mehdi.rabah.free.fr/SSIM/, Juni
14(12):1368–1378, 2006. 2007.
[14] M. Breuer. Multi-media applications and imprecise computation. [33] V. Joshi, R.R. Rao, D. Blaauw, and D. Sylvester. Logic SER reduc-
In EUROMICRO, pages 2–7, 2005. tion through flip flop redesign. In Int’l Symp. on Quality Electronic
[15] I. Chong and A. Ortega. Hardware testing for error tolerant multi- Design, pages 611–616, 2006.
media compression based on linear transforms. In Int’l Symp. on [34] JPEG hardware compressor - opencores.org. http://www.open-
Defect and Fault Tolerance in VLSI Systems, 2005. cores.org/projects.cgi/web/jpeg/overview, Februar 2007.
[16] H. Chung and A. Ortega. Analysis and testing for error tolerant [35] J. Arlat, M. Aguera, L. Amat, Y. Crouzet, J.-C. Fabre, E. Mar-
motion estimation. In Int’l Symp. on Defect and Fault Tolerance tins, and D. Powell. Fault injection for dependability validation:
in VLSI Systems, 2005. a methodology and some applications. In Trans. on Soft. Eng.,
[17] I. Chong, H. Cheong, and A. Ortega. New quality metric for mul- volume 16, pages 166–182, 1990.
timedia compression using faulty hardware. In Int’l Workshop on [36] H. Madeira, M. Rela, and J.G. Silva. RIFLE: A general purpose
Video Processing and Quality Metrics for Consumer Electronics, pin-level fault injector. In Proc. First European Dependable Com-
2006. puting Conference (EDCC-1), pages 199–216, Berlin, 1994.
[18] I. Polian, D. Nowroth, and B. Becker. Identification of critical [37] E. Czeck and D. Siewiorek. Effects on transient gate-level faults
errors in imaging applications. In Int’l On-Line Test Symp., pages on program behavior. In Proc. 20th Symp. on Fault Tolerant Com-
201–202, 2007. (Poster). puting (FTCS-20), pages 236–243, Newcastle Upon Tyne, 1990.
[19] V. Bhaskaran and K. Konstantinides. Image and Video Compres- [38] G. Choi, R. Iyer, and V. Carreno. Simulated fault injection: A
sion Standards: Algorithms and Architectures. Kluwer Academic methodology to evaluate fault tolerant microprocessor architec-
Publishers, kluwer international series in engineering and com- tures. In IEEE Trans. on Reliability, volume 39, pages 486–490,
puter science edition, 1995. 1990.
[20] D.P. Siewiorek and R.S. Swarz. Reliable Computer Systems – De- [39] E. Jenn, J. Arlat, M. Rimen, J. Ohlsson, and J. Karlsson. Fault
sign and Evaluation. Digital Press, 1992. injection into VHDL models: The MEFISTO tool. In Proc.
[21] P. Civera, L. Macchiarulo, M. Rebaudengo, M. Sonza Reorda, and 24th Symp. on Fault Tolerant Computing (FTCS-24), pages 66–75,
M. Violante. An FPGA-based approach for speeding-up fault in- Austin, USA, 1994.
jection campaigns on safety-critical circuits. Jour. of Electronic [40] V. Sieh, O. Tschache, and F. Balbach. VERIFY: Evaluation of re-
Testing: Theory and Applications, 18(3):261–271, 2002. liability using VHDL-models with embedded fault descriptions.
[22] S. Kundu, M.D.T. Lewis, I. Polian, and B. Becker. A soft error In 27th International Symposium on Fault-Tolerant Computing
emulation system for logic circuits. In Conf. on Design of Circuits (FTCS ’97), pages 32–36, Seattle, USA, 1997.