linpack C version Solaris SParc

por | 10 abril, 2011
SOURCE: linpackc
BUILD:
cc -DDP  -DUNROLL  -Xa -dalign linpackc.c -o linpackcdb -lm -xcache=64/64/2:5120/256/10 -m64

————————————————————————————————

A  0, 16 1800 32.0 US-IV+   2.4

./linpackcdb
Unrolled Double Precision Linpack

Unrolled Double Precision Linpack

NTIMES= 1000
 norm. resid      resid           machep         x[0]-1        x[n-1]-1
 1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
 times are reported for matrices of order   100
 dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
 0.00       0.00       0.00     215459       0.01       0.06
 0.00       0.00       0.00     217575       0.01       0.06
 0.00       0.00       0.00     218267       0.01       0.06
 0.00       0.00       0.00     217998       0.01       0.06
 times for array with leading dimension of 200
 0.00       0.00       0.00     218823       0.01       0.06
 0.00       0.00       0.00     221220       0.01       0.06
 0.00       0.00       0.00     221505       0.01       0.06
 0.00       0.00       0.00     220783       0.01       0.06
Unrolled Double  Precision 217998 Kflops ; 1000 Reps

217998 x 8 CPU's= 1,743,984 Kflops

————————————————————————————————

SPARC64-VII mode

cc -DDP  -DUNROLL  -Xa -dalign linpackc.c -o linpackcdb -lm -xcache=64/64/2:5120/256/10 -m64

Precision Linpack
Unrolled Double Precision Linpack
NTIMES= 1000
times for array with leading dimension of 200
 0.00       0.00       0.00     321172       0.01       0.04
 0.00       0.00       0.00     320274       0.01       0.04
 0.00       0.00       0.00     319083       0.01       0.04
 0.00       0.00       0.00     323038       0.01       0.04
Unrolled Double  Precision 323038 Kflops ; 1000 Reps

 

16 CPU's  * 323038 = 5,168,608 Kflops

———————————–

A    0    900  8.0 US-III+  2.3

Unrolled Double Precision Linpack

NTIMES= 1000
 norm. resid      resid           machep         x[0]-1        x[n-1]-1
 1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
 times are reported for matrices of order   100
 dgefa      dgesl      total       kflops     unit      ratio

Unrolled Double  Precision 110142 Kflops ; 1000 Reps
 times for array with leading dimension of 200
 0.01       0.00       0.01     111345       0.02       0.11
 0.01       0.00       0.01     107108       0.02       0.11
 0.01       0.00       0.01     106858       0.02       0.11
 0.01       0.00       0.01     110220       0.02       0.11
Unrolled Double  Precision 108042 Kflops ; 1000 Reps
8CPU's *108042 = 864,336 Kflops

——————————————————-

0      1000 MHz  SUNW,UltraSPARC-T1     on-line

./linpackcdb 

Unrolled Double Precision Linpack

NTIMES= 1000
 norm. resid      resid           machep         x[0]-1        x[n-1]-1
 1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
 times are reported for matrices of order   100

times for array with leading dimension of 200
 0.16       0.00       0.16       4227       0.47       2.90
 0.16       0.00       0.16       4225       0.47       2.90
 0.16       0.00       0.16       4215       0.47       2.91
 0.15       0.00       0.16       4383       0.46       2.80
Unrolled Double  Precision  4303 Kflops ; 1000 Reps

24 Threads * 4303 Kflops = 103,272 Kflops

 

_________________________________________________

gcc -DDP  -DUNROLL  -dalign linpackc.c -o linpackcdb -lm -m64

Quad-Core AMD Opteron(tm) Processor 2352
vendor: Hynix Semiconductor (Hyundai Electronics)
physical id: 4
bus info: cpu@0
slot: CPU socket #0
size: 2100MHz
capacity: 4230MHz
width: 64 bits

Unrolled Double Precision Linpack

 times are reported for matrices of order   100
 dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201
Unrolled Double Precision Linpack

 0.00       0.00       0.00     228965       0.01       0.05
 times are reported for matrices of order   100
times for array with leading dimension of 200
 0.00       0.00       0.00     228889       0.01       0.05
 0.00       0.00       0.00     343333       0.01       0.04
 0.00       0.00       0.00     343505       0.01       0.04
 0.00       0.00       0.00     269428       0.01       0.05
Unrolled Double  Precision 267330 Kflops ; 1000 Reps

8 THREADS * 267330 Kflops = 2,138,640
---------------------------------------------

blade T6320

0      1415 MHz  SUNW,UltraSPARC-T2     on-line

Unrolled Double Precision Linpack

 times are reported for matrices of order   100
 dgefa      dgesl      total       kflops     unit      ratio
 times for array with leading dimension of  201

0.01       0.00       0.02      45674       0.04       0.27
 0.01       0.00       0.01      45952       0.04       0.27
 0.01       0.00       0.02      45342       0.04       0.27
 0.01       0.00       0.01      49138       0.04       0.25
Unrolled Double  Precision 44945 Kflops ; 1000 Reps

64 Threads x 45342 = 2,901,888

————————————————————————————

Mac OX 10.6.7
CPU: 2.16 GHZ Intel Core 2 Duo

Download linpackcmac.c

gcc -DDP  -DUNROLL  -dalign linpackcmac.c -o linpackcdb -lm -fnested-functions -m64

$ Unrolled Double Precision Linpack

NTIMES= 1000 

 norm. resid      resid           machep         x[0]-1        x[n-1]-1
 1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
 times are reported for matrices of order   100
 times for array with leading dimension of 200
 0.00       0.00       0.00     319231       0.01       0.04
 0.00       0.00       0.00     325434       0.01       0.04
 0.00       0.00       0.00     318639       0.01       0.04
 0.00       0.00       0.00     329319       0.01       0.04
Unrolled Double  Precision 310696 Kflops ; 1000 Reps

318639 x 2 CPUs = 637,278 Kflops

————————————————————————————

Mac OX 10.6.7
CPU: 2.0 GHZ Intel Core 2 Duo

Cmini

Unrolled Double Precision Linpack

NTIMES= 1000
norm. resid      resid           machep         x[0]-1        x[n-1]-1
1.7        7.41628980e-14  2.22044605e-16 -1.49880108e-14 -1.89848137e-14
times are reported for matrices of order   100
dgefa      dgesl      total       kflops     unit      ratio
times for array with leading dimension of  201
0.00       0.00       0.00     282812       0.01       0.04
0.00       0.00       0.00     281767       0.01       0.04
0.00       0.00       0.00     282695       0.01       0.04
0.00       0.00       0.00     284255       0.01       0.04
times for array with leading dimension of 200
0.00       0.00       0.00     288273       0.01       0.04
0.00       0.00       0.00     290714       0.01       0.04
0.00       0.00       0.00     289001       0.01       0.04
0.00       0.00       0.00     290686       0.01       0.04
Unrolled Double  Precision 284255 Kflops ; 1000 Reps

2 cpus * 284255 = 568,510 Kflops