Supercomputer Benchmarking

Supercomputer Benchmarking
By: John Dorfner, Wesley Jones, and Eric Ng
Cray-1
CDC 1604
Origin 2000
RS/6000 SP
Overview

Definition of Benchmark Introduction to Benchmark Suites SPEChpc96 Suite Livermore Loops The Linpack Benchmark The Top 8 Supercomputers HPC Challenge Benchmark Cray 1-A vs. IBM Cluster 1600 inside the IBM Cluster 1600 Conclusion
Benchmark
def.
A measurement or standard that serves as a point of reference by which process performance is measured. Benchmarking is a structured approach for identifying the best practices from industry and government, and comparing and adapting them to the organization's operations. Such an approach is aimed at identifying more efficient and effective processes for achieving intended results, and suggesting ambitious goals for program output, product/service quality, and process improvement.
www.ichnet.org
Supercomputer Benchmarking
The number and type of benchmark suites used to study supercomputer performance has varied widely over the years. In early studies, an ad hoc collection of programs was typically used to measure the performance of a given system relative to a known performance benchmark. Eventually, this practice evolved into groups of programs explicitly designed as supercomputer benchmark suites. The most widely used benchmarks for performance on supercomputing clusters are: the SPEChpc96 suite; the Livermore Loops; and for scientific machines, the Linpack Kernels.
Some general examples of individual computer benchmarks: Dhrystone - Integer benchmark for UNIX systems Whetstone - Floating point benchmark for minicomputers I/O benchmarks MIPS Synthetic benchmarks Kernel benchmarks SPECint / SPECfp Summarizing
SPEChpc96 Suite
In 1995, the Standard Performance Evaluation Corp. (SPEC) announced the release of SPEChpc96, the first standard benchmark suite specifically designed for measuring high-performance computing. SPEChpc96 was developed by SPEC's High Performance Group (HPG), which includes several leading high-performance computer vendors, systems integrators, and major universities and research institutes.
SPEChpc96 allows users and vendors of high-end computers to make objective performance comparisons across different hardware platforms.
Specific scientific and industrial applications are represented within the SPEChpc96 benchamrk suite. The first two SPEChpc96 benchmarks are: SPECseis96, a seismic processing application SPECchem96, a computational chemistry application Since SPECseis96 and SPECchem96 can be run in both serial and parallel modes, the SPEChpc96 suite can be used for general performance comparisons over a broad range of high-performance computing systems. This list includes multiprocessor systems, workstation clusters, distributed memory parallel systems, and traditional vector and vector parallel supercomputers.
SPEChpc96 Suite: Metrics

The SPECseis96 and SPECchem96 suites each generate four metrics. Each program represents a different problem size and is used to characterize the scalability of the application as well as the entire system.
The SPEChpc96 metrics are as follows: SPECseis96_SM SPECseis96_MD SPECseis96_LG SPECseis96_XL SPECchem96_SM SPECchem96_MD SPECchem96_LG SPECchem96_XL. The metrics are unitless. They are derived as follows: metric = (86400 seconds) / (elapsed time of benchmark in seconds)
Since these benchmarks are both compute-intensive and data-intensive, the above metrics are used to reflect the performance of the entire system. This includes the processors, memory access, I/O bandwidth, interconnect topology, etc. For example, the SPECseis96_XL requires processing of 100GB of data.
Livermore Loops
Livermore Loops is a set of kernels consisting of loops from real Fortran programs. Introduced in 1970, this supercomputer benchmark was initially comprised of 14 kernels of numerically intensive applications written in Fortran. The number of kernels was increased to 24 in the 1980's. Performance measurements are taken in units of Millions of Floating Point Operations Per Second or MFLOPS. The program also evaluates the results for computational accuracy. A main aim of the Livermore design was to avoid producing single number performance comparisons. The 24 kernels can be executed three times each at a range of do-loop spans to produce short, medium and long vector performance measurements. In this mode, if overall averages are quoted, the geometric mean may be interpreted as a characteristic rate of computation for the suite. However, it is more realistic to retain the range of statistics in terms of geometric, harmonic and arithmetic means, minimum and maximum.
Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel Kernel
1: an excerpt from a hydrodynamic code. 2: an excerpt from an Incomplete Cholesky-Conjugate Gradient code. 3: the standard Inner Product function of linear algebra. 4: an excerpt from a Banded Linear Equations routine. 5: an excerpt from a Tridiagonal Elimination routine. 6: an example of a general linear recurrence equation. 7: an Equation of State fragment. 8: an excerpt of an Alternating Direction, Implicit Integration code. 9: an Integrate Predictor code. 10: a Difference Predictor code. 11: a First Sum. 12: a First Difference. 13: an excerpt from a 2-D Particle-in-Cell code. 14: an excerpt of a 1-D Particle-in-Cell code. 15: a sample of how casually FORTRAN can be written. 16: a search loop from a Monte Carlo code. 17: an example of an implicit conditional computation. 18: an excerpt from a 2-D Explicit Hydrodynamic code. 19: a general Linear Recurrence Equation. 20: an excerpt from a Discrete Ordinate Transport program. 21: a matrix X matrix product calculation. 22: a Planckian Distribution procedure. 23: an excerpt from 2-D Implicit Hydrodynamics. 24: finds the location of the first minimum in X.
Livermore Loops: Kernels
******************************************** THE LIVERMORE FORTRAN KERNELS: * SUMMARY * ******************************************** Computer : CRAY-YMP C90 (240 MHz) System : UNICOS 7.C, loaded Compiler : CFT77 5.0.1.17 Date : 92.02.18 Testor : Charles Grassl, CRI MFLOPS RANGE: REPORT ALL RANGE STATISTICS: Mean DO Span = 167 Code Samples = 72 Maximum Rate = 826.0859 Mega-Flops/Sec. Average Rate = 190.5636 Mega-Flops/Sec. GEOMETRIC MEAN = 86.2649 Mega-Flops/Sec. Median Q2 = 83.5138 Mega-Flops/Sec. Harmonic Mean = 40.7302 Mega-Flops/Sec. Minimum Rate = 6.7925 Mega-Flops/Sec.
Livermore Loops: Kernel Output
Mean Precision = 11.07 Decimal Digits <<<<<<<<<<<<<<<<<<<<<<<<<<<*>>>>>>>>>>>>>>>>>>>>>>>>>>> < BOTTOM-LINE: 72 SAMPLES LFK TEST RESULTS SUMMARY. > < USE RANGE STATISTICS ABOVE FOR OFFICIAL QUOTATIONS. > <<<<<<<<<<<<<<<<<<<<<<<<<<<*>>>>>>>>>>>>>>>>>>>>>>>>>>>
The Linpack Benchmark

The Linpack Benchmark measures a computers floating-point rate of execution, Mflop/s, by running a mathematics application that solves a dense system of linear equations. Over the years, the characteristics of the benchmark have changed. Today, in fact, there are three benchmarks included in the Linpack Benchmark report. The Linpack Benchmark grew out of the Linpack software project. It was originally intended to give end-users an indication of length of time it would take to solve certain matrix problems.
The three benchmarks in the Linpack Benchmark report are: Linpack Fortran n = 100 benchmark Linpack n = 1000 benchmark Linpacks Highly Parallel Computing benchmark Mflop/s, millions of floating point operations per second, execution rate refers to 64-bit floating-point operations of either addition or multiplication. Gflop/s are billions of floating-point operations per second and Tflop/s are trillions of floating-point operations per second.
Linpack: Performance Example
Measured Gflop/s: Peak rate of execution in billions of floating point operations per second. Size of Problem: The matrix size at which the measured performance was observed. Size of Perf: The size of problem needed to achieve the measured peak performance. Theoretical Peak Gflop/s: The theoretical peak performance for the computer.
The Top 8 Supercomputers
The Top 8 Supercomputers Table Key

Rank: Position within the TOP500 ranking Manufacturer: Manufacturer or vendor Computer: Model type indicated by manufacturer or vendor Installation Site: Customer Location: Location and country Year: Year of installation/last major update Installation Area: Field of Application Processors: Number of processors Rmax: Maximum LINPACK performance achieved Rpeak: Theoretical peak performance Nmax: Problem size for achieving Rmax N1/2: Problem size for achieving half of Rmax
HPC Challenge Benchmark

A Group of 20 top researchers has initiated a program to redefine the benchmarks used to measure high-performance systems under the direction of the High Productivity Computing Systems program under the Defense Advanced Research Projects Agency (DARPA). It is designed to broaden the Linpack benchmark of raw floating-point operations/second (flops). They have established a target date of 2006 to release new a benchmark.
The HPC Challenge benchmark consists of 5 hardware performance metrics: HPL - the Linpack TPP benchmark which measures the floating point rate of execution for solving a linear system of equations STREAM - a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernels RandomAccess - measures the rate of integer random updates of memory PTRANS (parallel matrix transpose) - exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network b_eff (effective bandwidth benchmark) - a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns
vs. Cray 1-A

1978
IBM Cluster 1600

2002
Inside the IBM 1600 cluster
The diagram above shows a schematic view of the two-cluster configuration
The diagram above shows the configuration of a single cluster
Conclusion
Benchmarking refers to a measurement standard that serves as a point of reference by which process performance is measured
Three of the more popular suites for benchmarking supercomputers are the SPEChpc96 suite, the Livermore Loops, and for scientific machines, the Linpack Kernels
The performance ratios, for important HPC features, between supercomputers of the past and those used today, is vastly different
As the High Performance Computing industry grows, the benchmarks used upon supercomputers must also grow in order to provide a yard stick by which these systems can be measured
For more information

www.top500.org www.spec.org/hpg www.llnl.gov www.ecmwf.int/services/computing/overview/supercomputer_history.html www.microsoft.com/windows2000/hpc/ www.ibm.com www.sgi.com www.hp.com www.cray.com

Supercomputer Benchmarking

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Supercomputer Benchmarking

Uploaded by

Copyright:

Available Formats

Supercomputer Benchmarking

By: John Dorfner, Wesley Jones, and Eric Ng

SPEChpc96 Suite: Metrics

Livermore Loops: Kernels

Livermore Loops: Kernel Output

The Linpack Benchmark

Linpack: Performance Example

The Top 8 Supercomputers

The Top 8 Supercomputers Table Key

HPC Challenge Benchmark

vs. Cray 1-A

IBM Cluster 1600

Inside the IBM 1600 cluster

The diagram above shows a schematic view of the two-cluster configuration

The diagram above shows the configuration of a single cluster

For more information

www.top500.org www.spec.org/hpg www.llnl.gov www.ecmwf.int/services/computing/overview/supercomputer_history.html www.microsoft.com/windows2000/hpc/ www.ibm.com www.sgi.com www.hp.com www.cray.com

You might also like