Los Alamos National Laboratory

Los Alamos National Laboratory

Delivering science and technology to protect our nation and promote world stability

Mitigation Working Group

Providing relevant benchmarks for testing mitigated circuits for field-programmable gate arrays and mitigated software for microprocessors.

Contact Us  

  • Point of Contact
  • Heather Quinn
  • Email
Mitigation Working Group

Mitigation Working Group

Benchmarking Mitigated Circuits and Software for
High-Reliability Applications

The Mitigation Working Group has been working together since January 2014 to provide relevant benchmarks for testing mitigated circuits for field-programmable gate arrays (FPGAs) and mitigated software for microprocessors. The intention of this project is to provide a standard set of codes/circuits and input vectors for testing that would cover a number of realistic compute scenarios.

The working group provides information about the unmitigated circuits/codes so there is a basis for determining whether mitigation methods effectively mask radiation- and reliability-induced errors and allows mitigation methods to be compared for power, effectiveness, and overhead.

Finally, we also provide these results with information about the compilation/synthesis process and runtime environment. We believe all of this information will provide a basis for repeatable test results and provides standards for other researchers upon which to build.

FPGA Benchmark

The FPGA benchmark leverages ITC’99. This benchmark meets all of our requirements, including a variety of realistic algorithms, defined inputs, scalability, and portability. This benchmark is specifically designed for testing and includes a set of input vectors that are designed by the automated test pattern generation community.

The working group has been actively working on testing mitigated and unmitigated versions of the B13 circuit on several FPGAs to provide a baseline for the community. In the future, we would like to test the rest of the ITC'99 circuits.

Software Benchmark

The software benchmark is more of a moving target, because there is no existing benchmark that meets our needs. We are currently using these codes in the software benchmark:

  • Advanced Encryption Standard (AES) 128,
  • Cache test,
  • CoreMark,
  • Fast Fourier Transform (FFT),
  • Hotspot,
  • HPCCG,
  • Matrix multiply (MxM), and
  • Quicksort (Qsort)

A microcontroller implementation of these codes can be found on GitHub. We've been working on standards for implementing the software codes for different classes of microprocessors, but have not completed that process yet.

Current Work

While the original benchmark was not designed for basic characterization of components, there has been a growing interest in creating a benchmark to help researchers characterize FPGAs and microprocessors.  The current benchmark codes are specifically designed to highlight different types of circuits and codes that realistically mimic how systems use FPGAs and microprocessors.  It was never designed to measure the sensitivity of the different architectural features, such as mathematical units, clock trees or control logic.

The on-going issue is the inability to predict the cross section of untested software codes using the existing data from the benchmarks.  As the current benchmarks allow researchers and designers to compare how different the test codes perform on different architectures, this deficiency is a feature and not problem.  At this point, there is no discussion about changing the existing benchmarks to include characterization codes.  Currently, part of the team is trying to develop a machine learning technique for extracting the cross sections of these finer architectural features from the existing data, and another part of the team is designing codes that would allow for radiation testing of single architectural features.  These new characterization codes were tested on multiple microprocessors over Thanksgiving weekend 2017.  Once we have more information about the different architectural features, we can then build a model that allows us to predict the cross section of untested code. 

At the same time, the basic characterization of Xilinx FPGAs is the bailiwick of the Xilinx Radiation Test Consortium (XRTC).  We are cooperating with the XRTC to see if a standard set of characterization circuits for FPGAs could be developed.  It is likely that circuits used to characterize older FPGAs could be released openly to help researchers develop new test circuits for newer architectures.

Current and Past Members
  • Miguel Aguirre, Universidad de Sevilla
  • Arno Barnard, Stellenbosch University
  • Larry Clark, Arizona State University
  • Luis Entrena, Universidad Carlos III de Madrid
  • Steven Guertin, Jet Propulsion Laboratory
  • David Kaeli, Northeastern University
  • Fernanda Lima Kastensmidt, Universidade Federal do Rio Grande do Sul
  • Heather Quinn, Los Alamos National Laboratory
  • Paolo Rech, Universidade Federal do Rio Grande do Sul
  • Matteo Sonza Reorda, Politecnico di Torino
  • William H. Robinson, Vanderbilt University
  • Luca Sterpone, Politecnico di Torino
  • Gary Swift, Swift Engineering & Radiation Services, LLC
  • Michael Wirthlin, Brigham Young University

We have telecons twice a month to discuss progress and issues with the benchmark. If you want to join the telecons, please contact Heather Quinn.

Publications
  1. A. G. D. Oliveira et al., "Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators," 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), Austin, TX, 2017, pp. 577-588.
  2. G. Previlon et al., "Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience," 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, 2017, pp. 898-901.
  3. Quinn et al., "Robust Duplication With Comparison Methods in Microcontrollers," in IEEE Transactions on Nuclear Science, vol. 64, no. 1, pp. 338-345, Jan. 2017.
  4. Lunardi et al., "Experimental and Analytical Analysis of Sorting Algorithms Error Criticality for HPC and Large Servers Applications," in IEEE Transactions on Nuclear Science, vol. 64, no. 8, pp. 2169-2178, Aug. 2017.
  5. M. Lins et al., "Register file criticality on embedded microprocessor reliability," 2016 16th European Conference on Radiation and Its Effects on Components and Systems (RADECS), Bremen, 2016, pp. 1-5.
  6. A. Tambara et al., "Analyzing the Impact of Radiation-Induced Failures in Programmable SoCs," in IEEE Transactions on Nuclear Science, vol. 63, no. 4, pp. 2217-2224, Aug. 2016.
  7. Oliveira et al., "Input Size Effects on the Radiation-Sensitivity of Modern Parallel Processors," 2016 IEEE Radiation Effects Data Workshop (REDW), Portland, OR, USA, 2016, pp. 1-6.
  8. Santini et al., "Reliability Analysis of Operating Systems and Software Stack for Embedded Systems," in IEEE Transactions on Nuclear Science, vol. 63, no. 4, pp. 2225-2232, Aug. 2016.
  9. A. G. de Oliveira et al., "Evaluation and Mitigation of Radiation-Induced Soft Errors in Graphics Processing Units," in IEEE Transactions on Computers, vol. 65, no. 3, pp. 791-804, March 1 2016.
  10. Quinn et al., "Software Resilience and the Effectiveness of Software Mitigation in Microcontrollers," in IEEE Transactions on Nuclear Science, vol. 62, no. 6, pp. 2532-2538, Dec. 2015.
  11. Quinn et al., "Using Benchmarks for Radiation Testing of Microprocessors and FPGAs," in IEEE Transactions on Nuclear Science, vol. 62, no. 6, pp. 2547-2554, Dec. 2015.
  12. Quinn et al., "The Use of Benchmarks for Radiation Testing”, MAPLD 2015
  13. Quinn et al., "The Use of Benchmarks for High-Reliability Systems”, SELSE 2015