High Performance Computing refers to the practice of aggregating computing
power in a way that delivers much higher performance than one could get out of a typical desktop computer in order to solve large problems in science, engineering or business. A supercomputer is a computer at the frontline of contemporary processing capacity.
While earlier supercomputers used only a few processors, but with time, machines with thousands of processors began to appear. Systems with massive numbers of processors generally take one of two paths: In one approach (e.g., in distributed computing), a large number of discrete computers (e.g., laptops) distributed across a network (e.g., the Internet) devote some or all of their time to solving a common problem; each individual computer (client) receives and completes many small tasks, reporting the results to a central server which integrates the task results from all the clients into the overall solution. In another approach, a large number of dedicated processors are placed in close proximity to each other (e.g. in a computer cluster); this saves considerable time moving data around and makes it possible for the processors to work together (rather than on separate tasks). The use of multi-core processors combined with centralization is an emerging trend.
Some Benchmarks:
INTEL OPTIMIZED LINPACK BENCHMARK FOR OS X
The LINPACK Benchmarks are a measure of a system's oating point computing power. Intel Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark. It solves a dense (real) system of linear equations (Ax=b) (where A is the matrix of coefcients, x is the column vector of variables and b is the column vector of solutions), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. It uses partial pivoting to assure the accuracy of the results.
The above image shows a test run on a MacBook with a 2.0 GHz quad-core Intel Core i7-4750HQ Crystal Well processor with 6 MB on-chip L3, 128 MB L4 cache and 8 GB of RAM.
The time complexity of the algorithm used is O(n 3 ) where n is the number of linear equations or the size of matrix A.
THE FHOURSTONES BENCHMARK
Fhourstones is an integer benchmark that efciently solves positions in the game of Connect-4, as played on a vertical 7x6 board. The measurements are reported as the number of game positions searched per second. The benchmark involves (after warming up on three easier positions) solving the entire game, which takes about ten minutes on contemporary PCs, scoring between 1000 and 12,000 kpos/sec (kpos/sec means thousand game positions searched per second). The default input le features 4 positions of increasing complexity, the last one being the starting position. Completing this benchmark amounts to solving the game of Connect-4.
The image below shows a test run on a MacBook with a 2.0 GHz quad-core Intel Core i7-4750HQ Crystal Well processor with 6 MB on-chip L3, 128 MB L4 cache and 8 GB of RAM.