-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathMeasurements.txt
637 lines (589 loc) · 52.4 KB
/
Measurements.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
First value: Total time
Second value: Total time without transfers CPU-Device-CPU
Original Apple OpenCL FFT as implemented through FFT_OpenCL_A with split/full-plannar format (single FFTs, no batch):
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000022 GFlops/s
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000087 GFlops/s
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000245 GFlops/s
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000684 GFlops/s
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001866 GFlops/s
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004529 GFlops/s
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.008900 GFlops/s
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.023252 GFlops/s
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.048779 GFlops/s
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.088464 GFlops/s
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.159153 GFlops/s
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.358765 GFlops/s
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.669117 GFlops/s
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.812885 GFlops/s
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 1.170028 GFlops/s
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 1.396435 GFlops/s
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 1.729546 GFlops/s
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 1.906802 GFlops/s
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 2.015303 GFlops/s
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 1.890400 GFlops/s
@@@ m == 22: [Device: GeForce 9400, Vendor: NVIDIA] 1.528171 GFlops/s
@@@ m == 23: [Device: GeForce 9400, Vendor: NVIDIA] 1.571116 GFlops/s
Original Apple OpenCL FFT as implemented through FFT_OpenCL_A with interleaved format (single FFTs, no batch):
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000033 GFlops/s
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000130 GFlops/s
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000496 GFlops/s
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.001264 GFlops/s
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.003064 GFlops/s
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.007752 GFlops/s
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.017438 GFlops/s
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.037332 GFlops/s
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.079789 GFlops/s
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.174006 GFlops/s
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.317982 GFlops/s
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.479505 GFlops/s
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.654511 GFlops/s
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 1.030134 GFlops/s
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 1.501273 GFlops/s
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 1.739314 GFlops/s
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 2.101596 GFlops/s
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 2.195668 GFlops/s
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 2.326745 GFlops/s
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 2.250411 GFlops/s
@@@ m == 22: [Device: GeForce 9400, Vendor: NVIDIA] 2.050385 GFlops/s
@@@ m == 23: [Device: GeForce 9400, Vendor: NVIDIA] 2.113647 GFlops/s
First full-custom OpenCL implementation:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000004 GFlops/s (0.000005 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000024 GFlops/s (0.000056 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000097 GFlops/s (0.000210 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000285 GFlops/s (0.000535 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000445 GFlops/s (0.001074 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001827 GFlops/s (0.003816 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.003938 GFlops/s (0.008076 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.006346 GFlops/s (0.011361 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.013986 GFlops/s (0.024082 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.025545 GFlops/s (0.042517 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.044457 GFlops/s (0.074608 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.061823 GFlops/s (0.100784 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.118244 GFlops/s (0.191792 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.176999 GFlops/s (0.276071 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.132182 GFlops/s (0.162937 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.193469 GFlops/s (0.246788 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.263356 GFlops/s (0.346052 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.282603 GFlops/s (0.363516 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.281298 GFlops/s (0.353779 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.291675 GFlops/s (0.364987 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.271277 GFlops/s (0.335767 GFlops/s)
After making all inner kernel FFT units constants in the code (requires unrolling of gL loop):
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000004 GFlops/s (0.000007 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000029 GFlops/s (0.000054 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000048 GFlops/s (0.000227 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000151 GFlops/s (0.000570 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000195 GFlops/s (0.000394 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.000530 GFlops/s (0.001577 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.001086 GFlops/s (0.002344 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.003620 GFlops/s (0.008494 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.006717 GFlops/s (0.023151 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.013081 GFlops/s (0.036142 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.015463 GFlops/s (0.017807 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.008954 GFlops/s (0.058254 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.035348 GFlops/s (0.105508 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.151101 GFlops/s (0.203411 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.218779 GFlops/s (0.298998 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.289123 GFlops/s (0.407930 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.347693 GFlops/s (0.499590 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.401554 GFlops/s (0.584377 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.425001 GFlops/s (0.612133 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.419331 GFlops/s (0.640696 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.451663 GFlops/s (0.660093 GFlops/s)
After simplifying the complex multiplication for simple constant units:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000004 GFlops/s (0.000007 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000025 GFlops/s (0.000051 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000102 GFlops/s (0.000234 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000281 GFlops/s (0.000599 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000773 GFlops/s (0.001671 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001944 GFlops/s (0.004425 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004423 GFlops/s (0.009281 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.007775 GFlops/s (0.015025 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.015134 GFlops/s (0.025895 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.027158 GFlops/s (0.041447 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.041775 GFlops/s (0.058150 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.054048 GFlops/s (0.070199 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.087365 GFlops/s (0.115816 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.154490 GFlops/s (0.211807 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.228774 GFlops/s (0.319439 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.292965 GFlops/s (0.415397 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.352817 GFlops/s (0.513253 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.407117 GFlops/s (0.601817 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.430736 GFlops/s (0.630092 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.454898 GFlops/s (0.663256 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.464089 GFlops/s (0.685850 GFlops/s)
With intermediate buffers in registers and full unrolling:
Kernels take long to compile.
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000004 GFlops/s (0.000006 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000021 GFlops/s (0.000040 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000099 GFlops/s (0.000219 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000294 GFlops/s (0.000676 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000214 GFlops/s (0.001207 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001594 GFlops/s (0.003183 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004485 GFlops/s (0.010451 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.008346 GFlops/s (0.017276 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.015804 GFlops/s (0.030277 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.027064 GFlops/s (0.047016 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.038138 GFlops/s (0.065131 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.048711 GFlops/s (0.075773 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.083401 GFlops/s (0.119948 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.149395 GFlops/s (0.218487 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.227228 GFlops/s (0.340410 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.317261 GFlops/s (0.486460 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.378422 GFlops/s (0.581966 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.437693 GFlops/s (0.689294 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.459146 GFlops/s (0.696421 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.454188 GFlops/s (0.758989 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.505621 GFlops/s (0.768168 GFlops/s)
In-register intermediate buffers and simplification of 1/4 and 3/4 units:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000001 GFlops/s (0.000006 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000023 GFlops/s (0.000046 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000092 GFlops/s (0.000203 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000296 GFlops/s (0.000689 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000784 GFlops/s (0.001883 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.002068 GFlops/s (0.004803 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004421 GFlops/s (0.010417 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.008405 GFlops/s (0.017973 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.015304 GFlops/s (0.030348 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.027289 GFlops/s (0.047899 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.038973 GFlops/s (0.065759 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.052241 GFlops/s (0.079796 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.087419 GFlops/s (0.127218 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.150351 GFlops/s (0.221073 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.228365 GFlops/s (0.346198 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.315567 GFlops/s (0.488740 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.382473 GFlops/s (0.589061 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.446051 GFlops/s (0.697611 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.456072 GFlops/s (0.692670 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.478919 GFlops/s (0.762893 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.511682 GFlops/s (0.773384 GFlops/s)
After optimising cos/sin-computation:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000004 GFlops/s (0.000006 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000025 GFlops/s (0.000056 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000107 GFlops/s (0.000275 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000319 GFlops/s (0.000932 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000851 GFlops/s (0.002487 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.000567 GFlops/s (0.001859 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004498 GFlops/s (0.013522 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.010866 GFlops/s (0.029620 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.018426 GFlops/s (0.043715 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.033485 GFlops/s (0.076410 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.052381 GFlops/s (0.105784 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.069763 GFlops/s (0.132394 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.119054 GFlops/s (0.213617 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.222720 GFlops/s (0.431283 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.307521 GFlops/s (0.582430 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.390416 GFlops/s (0.842552 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.522257 GFlops/s (1.009710 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.605076 GFlops/s (1.197365 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.522565 GFlops/s (1.152464 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.669581 GFlops/s (1.259379 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.706218 GFlops/s (1.325056 GFlops/s)
+ Intermediate buffers in-register:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000003 GFlops/s (0.000006 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000026 GFlops/s (0.000054 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000097 GFlops/s (0.000251 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000088 GFlops/s (0.000219 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000261 GFlops/s (0.000794 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.002260 GFlops/s (0.006781 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.005184 GFlops/s (0.015209 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.010365 GFlops/s (0.028823 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.019027 GFlops/s (0.047231 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.032918 GFlops/s (0.068333 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.052682 GFlops/s (0.106026 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.071983 GFlops/s (0.142857 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.132966 GFlops/s (0.260598 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.229345 GFlops/s (0.457017 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.336997 GFlops/s (0.644691 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.423021 GFlops/s (0.818721 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.522105 GFlops/s (1.015742 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.604717 GFlops/s (1.200804 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.622732 GFlops/s (1.169032 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 0.671758 GFlops/s (1.260063 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 0.709181 GFlops/s (1.329348 GFlops/s)
After improving measurements by computing more FFTs per compiled kernel (semi-batch):
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000018 GFlops/s (0.000049 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000047 GFlops/s (0.000247 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000131 GFlops/s (0.000857 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000287 GFlops/s (0.000439 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001427 GFlops/s (0.003299 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004352 GFlops/s (0.016181 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.011305 GFlops/s (0.033452 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.019461 GFlops/s (0.039450 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.053697 GFlops/s (0.140548 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.100615 GFlops/s (0.247256 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.199186 GFlops/s (0.469585 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.297409 GFlops/s (0.659342 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.542798 GFlops/s (0.937474 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.653999 GFlops/s (1.001111 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.857477 GFlops/s (1.159987 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.827419 GFlops/s (1.212898 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 0.961815 GFlops/s (1.246846 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 0.895628 GFlops/s (1.083387 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 1.020551 GFlops/s (1.243378 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 1.170944 GFlops/s (1.388303 GFlops/s)
After separating load from pre-twiddles to improve pipelining:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA] 0.000001 GFlops/s (0.000005 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000020 GFlops/s (0.000053 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000080 GFlops/s (0.000240 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000277 GFlops/s (0.000727 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000795 GFlops/s (0.002069 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.001697 GFlops/s (0.004339 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.004734 GFlops/s (0.012605 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.010562 GFlops/s (0.033367 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.025756 GFlops/s (0.072380 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.054434 GFlops/s (0.152035 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.075640 GFlops/s (0.231259 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.201962 GFlops/s (0.476484 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.340909 GFlops/s (0.677831 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.479250 GFlops/s (0.900078 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.690213 GFlops/s (1.004320 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.796016 GFlops/s (1.152460 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 0.967621 GFlops/s (1.276302 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 1.031862 GFlops/s (1.381755 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 1.046191 GFlops/s (1.275783 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 1.106427 GFlops/s (1.350914 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 1.173458 GFlops/s (1.419609 GFlops/s)
After implementing partial plannar global memory layout to ensure global memory access coalescing (GlobalPlannarLevel = 5):
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000036 GFlops/s (0.004174 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000119 GFlops/s (0.007364 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000392 GFlops/s (0.011234 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000963 GFlops/s (0.014787 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.002003 GFlops/s (0.032723 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.005357 GFlops/s (0.069518 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.012580 GFlops/s (0.130641 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.028077 GFlops/s (0.220865 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.027615 GFlops/s (0.439983 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.113306 GFlops/s (0.807401 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.188319 GFlops/s (1.051482 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.369842 GFlops/s (1.180801 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.574538 GFlops/s (1.590294 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.738320 GFlops/s (1.850803 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.902228 GFlops/s (1.446292 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 1.196072 GFlops/s (1.931252 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 1.548157 GFlops/s (2.314845 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 1.567102 GFlops/s (2.200214 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 1.310384 GFlops/s (1.699793 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 1.627815 GFlops/s (2.172290 GFlops/s)
@@@ m == 22: [Device: GeForce 9400, Vendor: NVIDIA] 1.798014 GFlops/s (2.449925 GFlops/s)
@@@ m == 23: [Device: GeForce 9400, Vendor: NVIDIA] 1.605698 GFlops/s (2.166405 GFlops/s)
After some more minor improvements:
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA] 0.000033 GFlops/s (0.004266 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA] 0.000115 GFlops/s (0.007599 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA] 0.000432 GFlops/s (0.011489 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA] 0.000649 GFlops/s (0.015214 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA] 0.002461 GFlops/s (0.032931 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA] 0.005290 GFlops/s (0.070021 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA] 0.012717 GFlops/s (0.130064 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA] 0.028075 GFlops/s (0.206631 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA] 0.059880 GFlops/s (0.394273 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA] 0.130464 GFlops/s (0.805227 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA] 0.248165 GFlops/s (1.031155 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA] 0.369169 GFlops/s (1.174397 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA] 0.563298 GFlops/s (1.582955 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA] 0.780783 GFlops/s (1.807544 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA] 0.888784 GFlops/s (1.419580 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA] 1.234972 GFlops/s (1.927781 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA] 1.539932 GFlops/s (2.320529 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA] 1.578887 GFlops/s (2.230251 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA] 1.321207 GFlops/s (1.722921 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA] 1.620204 GFlops/s (2.173418 GFlops/s)
@@@ m == 22: [Device: GeForce 9400, Vendor: NVIDIA] 1.812446 GFlops/s (2.497036 GFlops/s)
@@@ m == 23: [Device: GeForce 9400, Vendor: NVIDIA] 1.736904 GFlops/s (2.351313 GFlops/s)
After correcting timing + with timings per kernel:
@@@ m == 1: [Device: GeForce 9400, Vendor: NVIDIA][K1 (2): 0.00](Total: 0.00) 0.00 GFlops/s (0.00 GFlops/s)
@@@ m == 2: [Device: GeForce 9400, Vendor: NVIDIA][K1 (4): 0.00](Total: 0.00) 0.00 GFlops/s (0.00 GFlops/s)
@@@ m == 3: [Device: GeForce 9400, Vendor: NVIDIA][K1 (8): 0.01](Total: 0.01) 0.00 GFlops/s (0.01 GFlops/s)
@@@ m == 4: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.01](Total: 0.01) 0.00 GFlops/s (0.01 GFlops/s)
@@@ m == 5: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.02][K2 (2): 0.03](Total: 0.02) 0.00 GFlops/s (0.01 GFlops/s)
@@@ m == 6: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.04][K2 (4): 0.06](Total: 0.05) 0.00 GFlops/s (0.03 GFlops/s)
@@@ m == 7: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.09][K2 (8): 0.12](Total: 0.10) 0.01 GFlops/s (0.07 GFlops/s)
@@@ m == 8: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.17][K2 (16): 0.17](Total: 0.17) 0.01 GFlops/s (0.13 GFlops/s)
@@@ m == 9: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.33][K2 (16): 0.32][K3 (2): 0.35](Total: 0.33) 0.03 GFlops/s (0.22 GFlops/s)
@@@ m == 10: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 0.63][K2 (16): 0.65][K3 (4): 0.75](Total: 0.66) 0.06 GFlops/s (0.45 GFlops/s)
@@@ m == 11: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 1.20][K2 (16): 1.25][K3 (8): 0.87](Total: 1.10) 0.12 GFlops/s (0.80 GFlops/s)
@@@ m == 12: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 2.55][K2 (16): 2.72][K3 (16): 0.59](Total: 1.22) 0.23 GFlops/s (1.02 GFlops/s)
@@@ m == 13: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 2.66][K2 (16): 2.91][K3 (16): 0.84][K4 (2): 0.60](Total: 1.40) 0.33 GFlops/s (1.07 GFlops/s)
@@@ m == 14: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.62][K2 (16): 3.14][K3 (16): 1.46][K4 (4): 0.82](Total: 1.85) 0.57 GFlops/s (1.40 GFlops/s)
@@@ m == 15: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.52][K2 (16): 2.86][K3 (16): 2.62][K4 (8): 0.80](Total: 1.92) 0.70 GFlops/s (1.68 GFlops/s)
@@@ m == 16: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.21][K2 (16): 3.88][K3 (16): 3.72][K4 (16): 0.57](Total: 1.54) 0.88 GFlops/s (1.47 GFlops/s)
@@@ m == 17: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.45][K2 (16): 3.55][K3 (16): 3.53][K4 (16): 1.06][K5 (2): 0.77](Total: 2.00) 1.14 GFlops/s (1.93 GFlops/s)
@@@ m == 18: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.55][K2 (16): 3.58][K3 (16): 3.74][K4 (16): 1.80][K5 (4): 0.92](Total: 2.34) 1.50 GFlops/s (2.28 GFlops/s)
@@@ m == 19: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.52][K2 (16): 3.81][K3 (16): 3.73][K4 (16): 2.96][K5 (8): 0.77](Total: 2.23) 1.55 GFlops/s (2.21 GFlops/s)
@@@ m == 20: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.49][K2 (16): 3.76][K3 (16): 3.74][K4 (16): 3.77][K5 (16): 0.54](Total: 1.70) 1.30 GFlops/s (1.70 GFlops/s)
@@@ m == 21: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.55][K2 (16): 3.51][K3 (16): 3.65][K4 (16): 3.65][K5 (16): 1.03][K6 (2): 0.77](Total: 2.18) 1.62 GFlops/s (2.18 GFlops/s)
@@@ m == 22: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.54][K2 (16): 3.54][K3 (16): 3.65][K4 (16): 3.64][K5 (16): 1.80][K6 (4): 0.91](Total: 2.48) 1.80 GFlops/s (2.47 GFlops/s)
@@@ m == 23: [Device: GeForce 9400, Vendor: NVIDIA][K1 (16): 3.55][K2 (16): 3.60][K3 (16): 3.53][K4 (16): 3.55][K5 (16): 2.86][K6 (8): 0.77](Total: 2.34) 1.72 GFlops/s (2.34 GFlops/s)
Decomposition of timing results per kernel:
printf([K%i (%i): %f Gfs, (%f%%)], qG, log2(BG), performance of kernel, percentage of total kernel computaton time spent in this kernel);
@@@ m == 1: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.00 Gfs)
[K1 (1): 0.00 Gfs (100%)](Total: 0.00 Gfs)
@@@ m == 2: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.00 Gfs)
[K1 (2): 0.00 Gfs (100%)](Total: 0.00 Gfs)
@@@ m == 3: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.01 Gfs)
[K1 (3): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 4: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.01 Gfs)
[K1 (4): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 5: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.02 Gfs)
[K1 (4): 0.02 Gfs (84%)][K2 (1): 0.03 Gfs (16%)](Total: 0.02 Gfs)
@@@ m == 6: [NVIDIA GeForce 9400] 0.00 GFlops/s (0.04 Gfs)
[K1 (4): 0.05 Gfs (76%)][K2 (2): 0.07 Gfs (24%)](Total: 0.05 Gfs)
@@@ m == 7: [NVIDIA GeForce 9400] 0.01 GFlops/s (0.07 Gfs)
[K1 (4): 0.09 Gfs (62%)][K2 (3): 0.11 Gfs (38%)](Total: 0.09 Gfs)
@@@ m == 8: [NVIDIA GeForce 9400] 0.01 GFlops/s (0.13 Gfs)
[K1 (4): 0.18 Gfs (48%)][K2 (4): 0.16 Gfs (52%)](Total: 0.17 Gfs)
@@@ m == 9: [NVIDIA GeForce 9400] 0.03 GFlops/s (0.20 Gfs)
[K1 (4): 0.31 Gfs (42%)][K2 (4): 0.29 Gfs (45%)][K3 (1): 0.25 Gfs (13%)](Total: 0.29 Gfs)
@@@ m == 10: [NVIDIA GeForce 9400] 0.06 GFlops/s (0.36 Gfs)
[K1 (4): 0.57 Gfs (35%)][K2 (4): 0.53 Gfs (37%)][K3 (2): 0.35 Gfs (28%)](Total: 0.49 Gfs)
@@@ m == 11: [NVIDIA GeForce 9400] 0.12 GFlops/s (0.79 Gfs)
[K1 (4): 1.22 Gfs (33%)][K2 (4): 1.26 Gfs (31%)][K3 (3): 0.82 Gfs (36%)](Total: 1.09 Gfs)
@@@ m == 12: [NVIDIA GeForce 9400] 0.23 GFlops/s (1.03 Gfs)
[K1 (4): 2.58 Gfs (16%)][K2 (4): 2.76 Gfs (15%)][K3 (4): 0.59 Gfs (69%)](Total: 1.23 Gfs)
@@@ m == 13: [NVIDIA GeForce 9400] 0.34 GFlops/s (1.20 Gfs)
[K1 (4): 2.67 Gfs (16%)][K2 (4): 2.93 Gfs (15%)][K3 (4): 0.83 Gfs (51%)][K4 (1): 0.59 Gfs (18%)](Total: 1.39 Gfs)
@@@ m == 14: [NVIDIA GeForce 9400] 0.61 GFlops/s (1.62 Gfs)
[K1 (4): 3.65 Gfs (14%)][K2 (4): 3.03 Gfs (17%)][K3 (4): 1.47 Gfs (35%)][K4 (2): 0.76 Gfs (34%)](Total: 1.80 Gfs)
@@@ m == 15: [NVIDIA GeForce 9400] 0.73 GFlops/s (1.81 Gfs)
[K1 (4): 3.38 Gfs (15%)][K2 (4): 2.80 Gfs (18%)][K3 (4): 2.40 Gfs (21%)][K4 (3): 0.84 Gfs (46%)](Total: 1.91 Gfs)
@@@ m == 16: [NVIDIA GeForce 9400] 0.88 GFlops/s (1.43 Gfs)
[K1 (4): 3.02 Gfs (12%)][K2 (4): 3.62 Gfs (10%)][K3 (4): 3.79 Gfs (10%)][K4 (4): 0.53 Gfs (68%)](Total: 1.45 Gfs)
@@@ m == 17: [NVIDIA GeForce 9400] 1.22 GFlops/s (2.00 Gfs)
[K1 (4): 3.41 Gfs (14%)][K2 (4): 3.51 Gfs (14%)][K3 (4): 3.65 Gfs (13%)][K4 (4): 1.09 Gfs (44%)][K5 (1): 0.79 Gfs (15%)](Total: 2.04 Gfs)
@@@ m == 18: [NVIDIA GeForce 9400] 1.54 GFlops/s (2.35 Gfs)
[K1 (4): 3.45 Gfs (15%)][K2 (4): 3.63 Gfs (15%)][K3 (4): 3.73 Gfs (14%)][K4 (4): 1.88 Gfs (28%)][K5 (2): 0.94 Gfs (28%)](Total: 2.37 Gfs)
@@@ m == 19: [NVIDIA GeForce 9400] 1.53 GFlops/s (2.19 Gfs)
[K1 (4): 3.38 Gfs (14%)][K2 (4): 3.71 Gfs (12%)][K3 (4): 3.64 Gfs (13%)][K4 (4): 2.96 Gfs (16%)][K5 (3): 0.76 Gfs (46%)](Total: 2.20 Gfs)
@@@ m == 20: [NVIDIA GeForce 9400] 1.30 GFlops/s (1.70 Gfs)
[K1 (4): 3.32 Gfs (10%)][K2 (4): 3.64 Gfs (9%)][K3 (4): 3.67 Gfs (9%)][K4 (4): 3.81 Gfs (9%)][K5 (4): 0.55 Gfs (62%)](Total: 1.70 Gfs)
@@@ m == 21: [NVIDIA GeForce 9400] 1.59 GFlops/s (2.15 Gfs)
[K1 (4): 3.35 Gfs (12%)][K2 (4): 3.34 Gfs (12%)][K3 (4): 3.53 Gfs (12%)][K4 (4): 3.55 Gfs (12%)][K5 (4): 1.04 Gfs (40%)][K6 (1): 0.80 Gfs (13%)](Total: 2.15 Gfs)
@@@ m == 22: [NVIDIA GeForce 9400] 2.03 GFlops/s (2.88 Gfs)
[K1 (4): 3.65 Gfs (14%)][K2 (4): 3.84 Gfs (14%)][K3 (4): 4.03 Gfs (13%)][K4 (4): 3.99 Gfs (13%)][K5 (4): 2.19 Gfs (24%)][K6 (2): 1.19 Gfs (22%)](Total: 2.88 Gfs)
@@@ m == 23: [NVIDIA GeForce 9400] 1.75 GFlops/s (2.42 Gfs)
[K1 (4): 3.33 Gfs (13%)][K2 (4): 3.53 Gfs (12%)][K3 (4): 3.62 Gfs (12%)][K4 (4): 3.74 Gfs (11%)][K5 (4): 3.02 Gfs (14%)][K6 (3): 0.82 Gfs (39%)](Total: 2.42 Gfs)
Without the arithmetic of the FFT (incorrect results):
@@@ m == 1: [NVIDIA GeForce 9400] 0.00 Gfs (0.00 Gfs)
[K1 (1): 0.00 Gfs (100%)](Total: 0.00 Gfs)
@@@ m == 2: [NVIDIA GeForce 9400] 0.00 Gfs (0.00 Gfs)
[K1 (2): 0.00 Gfs (100%)](Total: 0.00 Gfs)
@@@ m == 3: [NVIDIA GeForce 9400] 0.00 Gfs (0.01 Gfs)
[K1 (3): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 4: [NVIDIA GeForce 9400] 0.00 Gfs (0.02 Gfs)
[K1 (4): 0.02 Gfs (100%)](Total: 0.02 Gfs)
@@@ m == 5: [NVIDIA GeForce 9400] 0.00 Gfs (0.02 Gfs)
[K1 (4): 0.03 Gfs (80%)][K2 (1): 0.04 Gfs (20%)](Total: 0.04 Gfs)
@@@ m == 6: [NVIDIA GeForce 9400] 0.00 Gfs (0.04 Gfs)
[K1 (4): 0.07 Gfs (75%)][K2 (2): 0.10 Gfs (25%)](Total: 0.08 Gfs)
@@@ m == 7: [NVIDIA GeForce 9400] 0.01 Gfs (0.10 Gfs)
[K1 (4): 0.13 Gfs (65%)][K2 (3): 0.19 Gfs (35%)](Total: 0.15 Gfs)
@@@ m == 8: [NVIDIA GeForce 9400] 0.01 Gfs (0.18 Gfs)
[K1 (4): 0.28 Gfs (46%)][K2 (4): 0.24 Gfs (54%)](Total: 0.26 Gfs)
@@@ m == 9: [NVIDIA GeForce 9400] 0.02 Gfs (0.27 Gfs)
[K1 (4): 0.51 Gfs (42%)][K2 (4): 0.50 Gfs (43%)][K3 (1): 0.38 Gfs (14%)](Total: 0.49 Gfs)
@@@ m == 10: [NVIDIA GeForce 9400] 0.06 Gfs (0.55 Gfs)
[K1 (4): 0.99 Gfs (36%)][K2 (4): 0.84 Gfs (43%)][K3 (2): 0.87 Gfs (21%)](Total: 0.90 Gfs)
@@@ m == 11: [NVIDIA GeForce 9400] 0.13 Gfs (0.99 Gfs)
[K1 (4): 1.88 Gfs (28%)][K2 (4): 1.79 Gfs (30%)][K3 (3): 0.96 Gfs (42%)](Total: 1.47 Gfs)
@@@ m == 12: [NVIDIA GeForce 9400] 0.19 Gfs (1.20 Gfs)
[K1 (4): 3.61 Gfs (13%)][K2 (4): 3.77 Gfs (13%)][K3 (4): 0.66 Gfs (74%)](Total: 1.46 Gfs)
@@@ m == 13: [NVIDIA GeForce 9400] 0.39 Gfs (1.23 Gfs)
[K1 (4): 3.71 Gfs (14%)][K2 (4): 4.20 Gfs (12%)][K3 (4): 0.97 Gfs (53%)][K4 (1): 0.59 Gfs (21%)](Total: 1.65 Gfs)
@@@ m == 14: [NVIDIA GeForce 9400] 0.61 Gfs (1.50 Gfs)
[K1 (4): 4.85 Gfs (12%)][K2 (4): 4.77 Gfs (12%)][K3 (4): 1.28 Gfs (44%)][K4 (2): 0.84 Gfs (33%)](Total: 1.96 Gfs)
@@@ m == 15: [NVIDIA GeForce 9400] 0.67 Gfs (1.79 Gfs)
[K1 (4): 5.06 Gfs (11%)][K2 (4): 3.09 Gfs (18%)][K3 (4): 2.54 Gfs (22%)][K4 (3): 0.83 Gfs (50%)](Total: 2.06 Gfs)
@@@ m == 16: [NVIDIA GeForce 9400] 0.91 Gfs (1.50 Gfs)
[K1 (4): 3.73 Gfs (11%)][K2 (4): 4.15 Gfs (9%)][K3 (4): 5.15 Gfs (8%)][K4 (4): 0.54 Gfs (72%)](Total: 1.57 Gfs)
@@@ m == 17: [NVIDIA GeForce 9400] 1.26 Gfs (2.04 Gfs)
[K1 (4): 4.31 Gfs (12%)][K2 (4): 4.10 Gfs (12%)][K3 (4): 4.84 Gfs (10%)][K4 (4): 1.00 Gfs (50%)][K5 (1): 0.77 Gfs (16%)](Total: 2.12 Gfs)
@@@ m == 18: [NVIDIA GeForce 9400] 1.62 Gfs (2.57 Gfs)
[K1 (4): 4.46 Gfs (13%)][K2 (4): 4.63 Gfs (13%)][K3 (4): 4.64 Gfs (13%)][K4 (4): 1.87 Gfs (31%)][K5 (2): 0.95 Gfs (31%)](Total: 2.62 Gfs)
@@@ m == 19: [NVIDIA GeForce 9400] 1.68 Gfs (2.46 Gfs)
[K1 (4): 4.46 Gfs (12%)][K2 (4): 4.72 Gfs (11%)][K3 (4): 4.73 Gfs (11%)][K4 (4): 3.18 Gfs (16%)][K5 (3): 0.79 Gfs (50%)](Total: 2.48 Gfs)
@@@ m == 20: [NVIDIA GeForce 9400] 1.45 Gfs (1.94 Gfs)
[K1 (4): 4.65 Gfs (8%)][K2 (4): 4.80 Gfs (8%)][K3 (4): 4.84 Gfs (8%)][K4 (4): 4.89 Gfs (8%)][K5 (4): 0.57 Gfs (68%)](Total: 1.94 Gfs)
@@@ m == 21: [NVIDIA GeForce 9400] 1.74 Gfs (2.38 Gfs)
[K1 (4): 4.52 Gfs (10%)][K2 (4): 4.45 Gfs (10%)][K3 (4): 4.50 Gfs (10%)][K4 (4): 4.46 Gfs (10%)][K5 (4): 1.01 Gfs (45%)][K6 (1): 0.77 Gfs (15%)](Total: 2.38 Gfs)
@@@ m == 22: [NVIDIA GeForce 9400] 2.00 Gfs (2.81 Gfs)
[K1 (4): 4.40 Gfs (12%)][K2 (4): 4.53 Gfs (11%)][K3 (4): 4.53 Gfs (11%)][K4 (4): 4.57 Gfs (11%)][K5 (4): 1.83 Gfs (28%)][K6 (2): 0.96 Gfs (27%)](Total: 2.81 Gfs)
@@@ m == 23: [NVIDIA GeForce 9400] 1.90 Gfs (2.65 Gfs)
[K1 (4): 4.46 Gfs (10%)][K2 (4): 4.53 Gfs (10%)][K3 (4): 4.45 Gfs (10%)][K4 (4): 4.54 Gfs (10%)][K5 (4): 3.11 Gfs (15%)][K6 (3): 0.78 Gfs (44%)](Total: 2.65 Gfs)
LAC version, without improved locality:
@@@ m == 3: [NVIDIA GeForce 9400] 0.00 Gfs (0.01 Gfs)
[K1 (3): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 4: [NVIDIA GeForce 9400] 0.00 Gfs (0.01 Gfs)
[K1 (4): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 5: [NVIDIA GeForce 9400] 0.00 Gfs (0.02 Gfs)
[K1 (4): 0.03 Gfs (79%)][K2 (1): 0.03 Gfs (21%)](Total: 0.03 Gfs)
@@@ m == 6: [NVIDIA GeForce 9400] 0.00 Gfs (0.04 Gfs)
[K1 (4): 0.06 Gfs (71%)][K2 (2): 0.07 Gfs (29%)](Total: 0.06 Gfs)
@@@ m == 7: [NVIDIA GeForce 9400] 0.01 Gfs (0.08 Gfs)
[K1 (4): 0.12 Gfs (55%)][K2 (3): 0.11 Gfs (45%)](Total: 0.12 Gfs)
@@@ m == 8: [NVIDIA GeForce 9400] 0.01 Gfs (0.14 Gfs)
[K1 (4): 0.23 Gfs (42%)][K2 (4): 0.16 Gfs (58%)](Total: 0.19 Gfs)
@@@ m == 9: [NVIDIA GeForce 9400] 0.03 Gfs (0.23 Gfs)
[K1 (4): 0.43 Gfs (36%)][K2 (4): 0.29 Gfs (53%)][K3 (1): 0.36 Gfs (11%)](Total: 0.35 Gfs)
@@@ m == 10: [NVIDIA GeForce 9400] 0.06 Gfs (0.48 Gfs)
[K1 (4): 0.85 Gfs (34%)][K2 (4): 0.61 Gfs (47%)][K3 (2): 0.75 Gfs (19%)](Total: 0.72 Gfs)
@@@ m == 11: [NVIDIA GeForce 9400] 0.12 Gfs (0.86 Gfs)
[K1 (4): 1.56 Gfs (28%)][K2 (4): 1.27 Gfs (34%)][K3 (3): 0.87 Gfs (38%)](Total: 1.20 Gfs)
@@@ m == 12: [NVIDIA GeForce 9400] 0.15 Gfs (1.10 Gfs)
[K1 (4): 3.21 Gfs (14%)][K2 (4): 2.68 Gfs (17%)][K3 (4): 0.64 Gfs (70%)](Total: 1.33 Gfs)
@@@ m == 13: [NVIDIA GeForce 9400] 0.32 Gfs (1.11 Gfs)
[K1 (4): 3.24 Gfs (14%)][K2 (4): 2.84 Gfs (16%)][K3 (4): 0.88 Gfs (51%)][K4 (1): 0.58 Gfs (19%)](Total: 1.46 Gfs)
@@@ m == 14: [NVIDIA GeForce 9400] 0.53 Gfs (1.49 Gfs)
[K1 (4): 4.38 Gfs (13%)][K2 (4): 3.37 Gfs (17%)][K3 (4): 1.51 Gfs (37%)][K4 (2): 0.82 Gfs (34%)](Total: 1.95 Gfs)
@@@ m == 15: [NVIDIA GeForce 9400] 0.72 Gfs (1.70 Gfs)
[K1 (4): 4.39 Gfs (12%)][K2 (4): 2.82 Gfs (19%)][K3 (4): 2.53 Gfs (21%)][K4 (3): 0.81 Gfs (49%)](Total: 1.97 Gfs)
@@@ m == 16: [NVIDIA GeForce 9400] 0.85 Gfs (1.46 Gfs)
[K1 (4): 3.71 Gfs (10%)][K2 (4): 3.71 Gfs (10%)][K3 (4): 3.82 Gfs (10%)][K4 (4): 0.55 Gfs (69%)](Total: 1.53 Gfs)
@@@ m == 17: [NVIDIA GeForce 9400] 1.18 Gfs (2.06 Gfs)
[K1 (4): 4.33 Gfs (12%)][K2 (4): 3.63 Gfs (14%)][K3 (4): 3.83 Gfs (13%)][K4 (4): 1.09 Gfs (46%)][K5 (1): 0.80 Gfs (16%)](Total: 2.13 Gfs)
@@@ m == 18: [NVIDIA GeForce 9400] 1.55 Gfs (2.43 Gfs)
[K1 (4): 4.29 Gfs (13%)][K2 (4): 3.79 Gfs (15%)][K3 (4): 3.79 Gfs (15%)][K4 (4): 1.89 Gfs (29%)][K5 (2): 0.96 Gfs (29%)](Total: 2.49 Gfs)
@@@ m == 19: [NVIDIA GeForce 9400] 1.58 Gfs (2.30 Gfs)
[K1 (4): 4.25 Gfs (11%)][K2 (4): 3.80 Gfs (13%)][K3 (4): 3.80 Gfs (13%)][K4 (4): 3.02 Gfs (16%)][K5 (3): 0.79 Gfs (47%)](Total: 2.32 Gfs)
@@@ m == 20: [NVIDIA GeForce 9400] 1.33 Gfs (1.77 Gfs)
[K1 (4): 4.31 Gfs (8%)][K2 (4): 3.85 Gfs (9%)][K3 (4): 3.79 Gfs (9%)][K4 (4): 3.82 Gfs (9%)][K5 (4): 0.55 Gfs (64%)](Total: 1.77 Gfs)
@@@ m == 21: [NVIDIA GeForce 9400] 1.63 Gfs (2.20 Gfs)
[K1 (4): 4.14 Gfs (10%)][K2 (4): 3.51 Gfs (12%)][K3 (4): 3.67 Gfs (11%)][K4 (4): 3.66 Gfs (11%)][K5 (4): 1.02 Gfs (41%)][K6 (1): 0.75 Gfs (14%)](Total: 2.20 Gfs)
@@@ m == 22: [NVIDIA GeForce 9400] 1.86 Gfs (2.58 Gfs)
[K1 (4): 4.18 Gfs (11%)][K2 (4): 3.52 Gfs (13%)][K3 (4): 3.75 Gfs (13%)][K4 (4): 3.78 Gfs (12%)][K5 (4): 1.83 Gfs (26%)][K6 (2): 0.95 Gfs (25%)](Total: 2.59 Gfs)
@@@ m == 23: [NVIDIA GeForce 9400] 1.71 Gfs (2.33 Gfs)
[K1 (4): 4.21 Gfs (10%)][K2 (4): 3.47 Gfs (12%)][K3 (4): 3.31 Gfs (12%)][K4 (4): 3.35 Gfs (12%)][K5 (4): 2.96 Gfs (14%)][K6 (3): 0.75 Gfs (41%)](Total: 2.33 Gfs)
Contiguous version, CPU:
@@@ m == 3: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.00 Gfs (0.01 Gfs)
[K1 (3): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ m == 4: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.00 Gfs (0.04 Gfs)
[K1 (4): 0.04 Gfs (100%)](Total: 0.04 Gfs)
@@@ m == 5: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.01 Gfs (0.06 Gfs)
[K1 (4): 0.10 Gfs (85%)][K2 (1): 0.13 Gfs (15%)](Total: 0.10 Gfs)
@@@ m == 6: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.02 Gfs (0.14 Gfs)
[K1 (4): 0.19 Gfs (74%)][K2 (2): 0.27 Gfs (26%)](Total: 0.21 Gfs)
@@@ m == 7: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.05 Gfs (0.26 Gfs)
[K1 (4): 0.36 Gfs (60%)][K2 (3): 0.40 Gfs (40%)](Total: 0.38 Gfs)
@@@ m == 8: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.12 Gfs (0.46 Gfs)
[K1 (4): 0.54 Gfs (52%)][K2 (4): 0.59 Gfs (48%)](Total: 0.56 Gfs)
@@@ m == 9: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.24 Gfs (0.55 Gfs)
[K1 (4): 0.82 Gfs (36%)][K2 (4): 0.84 Gfs (35%)][K3 (1): 0.24 Gfs (30%)](Total: 0.65 Gfs)
@@@ m == 10: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.37 Gfs (0.70 Gfs)
[K1 (4): 0.87 Gfs (36%)][K2 (4): 0.89 Gfs (35%)][K3 (2): 0.53 Gfs (29%)](Total: 0.78 Gfs)
@@@ m == 11: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.52 Gfs (0.74 Gfs)
[K1 (4): 0.92 Gfs (30%)][K2 (4): 0.91 Gfs (31%)][K3 (3): 0.54 Gfs (39%)](Total: 0.77 Gfs)
@@@ m == 12: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 0.99 Gfs (1.38 Gfs)
[K1 (4): 1.44 Gfs (33%)][K2 (4): 1.45 Gfs (33%)][K3 (4): 1.44 Gfs (33%)](Total: 1.44 Gfs)
@@@ m == 13: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.14 Gfs (1.40 Gfs)
[K1 (4): 1.56 Gfs (29%)][K2 (4): 1.59 Gfs (28%)][K3 (4): 1.59 Gfs (28%)][K4 (1): 0.71 Gfs (16%)](Total: 1.44 Gfs)
@@@ m == 14: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.37 Gfs (1.55 Gfs)
[K1 (4): 1.71 Gfs (26%)][K2 (4): 1.74 Gfs (26%)][K3 (4): 1.68 Gfs (27%)][K4 (2): 1.07 Gfs (21%)](Total: 1.57 Gfs)
@@@ m == 15: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.42 Gfs (1.55 Gfs)
[K1 (4): 1.67 Gfs (25%)][K2 (4): 1.71 Gfs (24%)][K3 (4): 1.62 Gfs (26%)][K4 (3): 1.25 Gfs (25%)](Total: 1.56 Gfs)
@@@ m == 16: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.62 Gfs (1.76 Gfs)
[K1 (4): 1.74 Gfs (25%)][K2 (4): 1.81 Gfs (25%)][K3 (4): 1.77 Gfs (25%)][K4 (4): 1.77 Gfs (25%)](Total: 1.77 Gfs)
@@@ m == 17: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.33 Gfs (1.50 Gfs)
[K1 (4): 1.44 Gfs (25%)][K2 (4): 1.63 Gfs (22%)][K3 (4): 1.66 Gfs (21%)][K4 (4): 1.70 Gfs (21%)][K5 (1): 0.78 Gfs (11%)](Total: 1.51 Gfs)
@@@ m == 18: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.19 Gfs (1.38 Gfs)
[K1 (4): 1.38 Gfs (22%)][K2 (4): 1.46 Gfs (21%)][K3 (4): 1.47 Gfs (21%)][K4 (4): 1.48 Gfs (21%)][K5 (2): 1.03 Gfs (15%)](Total: 1.38 Gfs)
@@@ m == 19: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.20 Gfs (1.38 Gfs)
[K1 (4): 1.40 Gfs (21%)][K2 (4): 1.42 Gfs (20%)][K3 (4): 1.46 Gfs (20%)][K4 (4): 1.45 Gfs (20%)][K5 (3): 1.18 Gfs (19%)](Total: 1.39 Gfs)
@@@ m == 20: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.26 Gfs (1.46 Gfs)
[K1 (4): 1.42 Gfs (20%)][K2 (4): 1.48 Gfs (20%)][K3 (4): 1.49 Gfs (20%)][K4 (4): 1.42 Gfs (21%)][K5 (4): 1.48 Gfs (20%)](Total: 1.46 Gfs)
@@@ m == 21: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.20 Gfs (1.36 Gfs)
[K1 (4): 1.36 Gfs (19%)][K2 (4): 1.45 Gfs (18%)][K3 (4): 1.48 Gfs (18%)][K4 (4): 1.43 Gfs (18%)][K5 (4): 1.44 Gfs (18%)][K6 (1): 0.70 Gfs (9%)](Total: 1.36 Gfs)
@@@ m == 22: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.12 Gfs (1.26 Gfs)
[K1 (4): 1.22 Gfs (19%)][K2 (4): 1.33 Gfs (17%)][K3 (4): 1.39 Gfs (17%)][K4 (4): 1.33 Gfs (17%)][K5 (4): 1.28 Gfs (18%)][K6 (2): 0.93 Gfs (12%)](Total: 1.26 Gfs)
@@@ m == 23: [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz] 1.19 Gfs (1.36 Gfs)
[K1 (4): 1.17 Gfs (20%)][K2 (4): 1.42 Gfs (17%)][K3 (4): 1.48 Gfs (16%)][K4 (4): 1.46 Gfs (16%)][K5 (4): 1.44 Gfs (16%)][K6 (3): 1.21 Gfs (15%)](Total: 1.36 Gfs)
FFTW3 @ [Intel Intel(R) Core(TM)2 Duo CPU P7550 @ 2.26GHz]:
@@@ m == 3: 0.05 Gfs (-1.00 Gfs)
@@@ m == 4: 1.12 Gfs (-1.00 Gfs)
@@@ m == 5: 1.91 Gfs (-1.00 Gfs)
@@@ m == 6: 4.99 Gfs (-1.00 Gfs)
@@@ m == 7: 6.17 Gfs (-1.00 Gfs)
@@@ m == 8: 6.10 Gfs (-1.00 Gfs)
@@@ m == 9: 6.68 Gfs (-1.00 Gfs)
@@@ m == 10: 6.53 Gfs (-1.00 Gfs)
@@@ m == 11: 7.04 Gfs (-1.00 Gfs)
@@@ m == 12: 5.77 Gfs (-1.00 Gfs)
@@@ m == 13: 4.93 Gfs (-1.00 Gfs)
@@@ m == 14: 5.06 Gfs (-1.00 Gfs)
@@@ m == 15: 4.95 Gfs (-1.00 Gfs)
@@@ m == 16: 4.71 Gfs (-1.00 Gfs)
@@@ m == 17: 4.77 Gfs (-1.00 Gfs)
@@@ m == 18: 3.20 Gfs (-1.00 Gfs)
@@@ m == 19: 1.63 Gfs (-1.00 Gfs)
@@@ m == 20: 1.14 Gfs (-1.00 Gfs)
@@@ m == 21: 1.12 Gfs (-1.00 Gfs)
@@@ m == 22: 1.05 Gfs (-1.00 Gfs)
@@@ m == 23: 1.06 Gfs (-1.00 Gfs)
Windows 7 on AMD Mobility HD4500. Apple-based code (FFT A) generates an assertion failure for low m-values.
Interleaved format:
@@@ OpenCL FFT A (interleaved) @@@ m == 13: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.19 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 14: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.32 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 15: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.42 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 16: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.64 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 17: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.90 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 18: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.12 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 19: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.19 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 20: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.17 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 21: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.15 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 22: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.17 Gfs
@@@ OpenCL FFT A (interleaved) @@@ m == 23: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.91 Gfs
, non-interleaved format (full plannar):
@@@ OpenCL FFT A (non-interleaved) @@@ m == 13: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.27 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 14: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.21 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 15: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.32 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 16: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.45 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 17: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.76 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 18: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.75 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 19: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.78 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 20: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.77 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 21: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.68 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 22: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.71 Gfs
@@@ OpenCL FFT A (non-interleaved) @@@ m == 23: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.76 Gfs
Our custom, contiguous version:
@@@ Contiguous OpenCL FFT @@@ m == 2: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.00 Gfs)
[K1 (2): 0.00 Gfs (100%)](Total: 0.00 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 3: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.01 Gfs)
[K1 (3): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 4: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.01 Gfs)
[K1 (4): 0.01 Gfs (100%)](Total: 0.01 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 5: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.00 Gfs)
[K1 (4): 0.03 Gfs (71%)][K2 (1): 0.02 Gfs (29%)](Total: 0.02 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 6: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.00 Gfs)
[K1 (4): 0.05 Gfs (69%)][K2 (2): 0.06 Gfs (31%)](Total: 0.05 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 7: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.00 Gfs)
[K1 (4): 0.07 Gfs (60%)][K2 (3): 0.08 Gfs (40%)](Total: 0.07 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 8: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.00 Gfs (0.01 Gfs)
[K1 (4): 0.19 Gfs (50%)][K2 (4): 0.18 Gfs (50%)](Total: 0.18 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 9: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.01 Gfs (0.03 Gfs)
[K1 (4): 0.39 Gfs (39%)][K2 (4): 0.36 Gfs (43%)][K3 (1): 0.21 Gfs (19%)](Total: 0.34 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 10: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.03 Gfs (0.07 Gfs)
[K1 (4): 0.47 Gfs (39%)][K2 (4): 0.46 Gfs (40%)][K3 (2): 0.43 Gfs (21%)](Total: 0.45 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 11: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.10 Gfs (0.27 Gfs)
[K1 (4): 1.56 Gfs (34%)][K2 (4): 1.50 Gfs (35%)][K3 (3): 1.29 Gfs (31%)](Total: 1.46 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 12: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.08 Gfs (0.45 Gfs)
[K1 (4): 1.70 Gfs (27%)][K2 (4): 1.47 Gfs (32%)][K3 (4): 1.14 Gfs (41%)](Total: 1.40 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 13: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.16 Gfs (0.61 Gfs)
[K1 (4): 2.07 Gfs (22%)][K2 (4): 1.83 Gfs (24%)][K3 (4): 1.38 Gfs (32%)][K4 (1): 0.52 Gfs (21%)](Total: 1.45 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 14: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.30 Gfs (1.10 Gfs)
[K1 (4): 2.55 Gfs (24%)][K2 (4): 2.45 Gfs (25%)][K3 (4): 2.16 Gfs (28%)][K4 (2): 1.30 Gfs (23%)](Total: 2.12 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 15: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.47 Gfs (1.71 Gfs)
[K1 (4): 3.57 Gfs (24%)][K2 (4): 3.58 Gfs (24%)][K3 (4): 3.40 Gfs (25%)][K4 (3): 2.39 Gfs (27%)](Total: 3.21 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 16: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 0.70 Gfs (2.15 Gfs)
[K1 (4): 4.09 Gfs (21%)][K2 (4): 4.09 Gfs (21%)][K3 (4): 4.05 Gfs (22%)][K4 (4): 2.44 Gfs (36%)](Total: 3.49 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 17: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.08 Gfs (2.37 Gfs)
[K1 (4): 4.58 Gfs (18%)][K2 (4): 4.59 Gfs (18%)][K3 (4): 4.62 Gfs (18%)][K4 (4): 2.64 Gfs (31%)][K5 (1): 1.22 Gfs (17%)](Total: 3.44 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 18: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 1.69 Gfs (3.59 Gfs)
[K1 (4): 5.26 Gfs (19%)][K2 (4): 5.28 Gfs (19%)][K3 (4): 5.31 Gfs (19%)][K4 (4): 3.80 Gfs (26%)][K5 (2): 2.83 Gfs (18%)](Total: 4.46 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 19: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 2.36 Gfs (4.59 Gfs)
[K1 (4): 5.83 Gfs (19%)][K2 (4): 5.84 Gfs (19%)][K3 (4): 5.95 Gfs (18%)][K4 (4): 5.83 Gfs (19%)][K5 (3): 3.14 Gfs (26%)](Total: 5.16 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 20: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 2.55 Gfs (4.52 Gfs)
[K1 (4): 6.02 Gfs (16%)][K2 (4): 6.18 Gfs (16%)][K3 (4): 6.18 Gfs (16%)][K4 (4): 6.18 Gfs (16%)][K5 (4): 2.57 Gfs (37%)](Total: 4.81 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 21: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 2.55 Gfs (4.31 Gfs)
[K1 (4): 6.20 Gfs (14%)][K2 (4): 6.20 Gfs (14%)][K3 (4): 6.20 Gfs (14%)][K4 (4): 6.20 Gfs (14%)][K5 (4): 2.74 Gfs (31%)][K6 (1): 1.55 Gfs (14%)](Total: 4.48 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 22: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 3.01 Gfs (5.07 Gfs)
[K1 (4): 6.21 Gfs (15%)][K2 (4): 6.21 Gfs (15%)][K3 (4): 6.21 Gfs (15%)][K4 (4): 6.21 Gfs (15%)][K5 (4): 4.00 Gfs (24%)][K6 (2): 3.11 Gfs (15%)](Total: 5.22 Gfs)
@@@ Contiguous OpenCL FFT @@@ m == 23: [Advanced Micro Devices, Inc. OpenCL 1.1 AMD-APP-SDK-v2.5 (732.1): Advanced Micro Devices, Inc. ATI RV710] 3.25 Gfs (5.38 Gfs)
[K1 (4): 6.22 Gfs (15%)][K2 (4): 6.22 Gfs (15%)][K3 (4): 6.22 Gfs (15%)][K4 (4): 6.22 Gfs (15%)][K5 (4): 5.99 Gfs (16%)][K6 (3): 3.17 Gfs (23%)](Total: 5.49 Gfs)