-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathROADMAP
116 lines (86 loc) · 6.62 KB
/
ROADMAP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
================================================================================
=== >>> === tasks below are planned for the upcoming 1.2.0 milestone === <<< ===
================================================================================
X) Task title: "implement predicated AVX-512/ARM-SVE backends (in *_RX slots)"
1) Add rtarch_***_***x*p*.h header files to core/config for predicated targets
2) Add predicate registers X1..X6 (merging) and Z1..Z6 (zeroing) as triplets
3) Add cmd**P** subset for "two-operand + predicate" instructions
4) Add cmd**4** subset for "three-operand + predicate" instructions
5) Predicate is placed 1st in cmp-ops, and right after dest-SIMD-reg otherwise
6) Predicated targets can be implemented as extension to current AVX-512/ARM-SVE
7) Use predicate registers X1..X6 where merging/zeroing is not applicable
8) Emulate zeroing and three-operand ops on ARM-SVE from fields in triplets
9) Paired predicated backends should expose half the predicates (and registers)
================================================================================
=== >>> === tasks below are planned for the upcoming 1.3.0 milestone === <<< ===
================================================================================
R) Task title: "implement basic runtime generation for existing ASM code-bases"
1) Rewrite ASM_ENTER macro to allocate temporary buffer with code-exec rights
2) Rewrite EMITB / EMITW emitters to write into a memory buffer at cur++ offset
3) Define M to (+/-) depending on static/dynamic code generation (+ clang check)
4) Rewrite j** to encode jump-label distances into binary form, track labels
5) Rewrite ASM_LEAVE to type-cast the buffer to a function-pointer, then call it
6) Implement proper buffer management for more advanced versions later
================================================================================
=== >>> === tasks below are planned for the forthcoming 1.x.0 series === <<< ===
================================================================================
K) Task title: "use configuration utils (autotools, CMake, etc) for building"
1) Use single build script for all host CPU architectures on Linux
2) Keep cross-compilation on x86-64 Linux hosts (targeting QEMU linux-user mode)
3) Consider adding continuous integration (CI) tests
================================================================================
E) Task title: "add 8 SIMD registers full-IEEE support for ARMv7 using VFP"
1) Implement 128-bit SIMD registers/instructions as 4x32-bit VFP (full-IEEE)
2) Emulate currently exposed NEON instructions using VFP variants/fallbacks
3) Use register-offloading to upper bank for 1 mem-arg in load-op instructions
4) Find place in SIMD target mask (RT_128=8), like legacy x86, ARMv7 is 8-regs
================================================================================
N) Task title: "implement new 128/256-bit 30-regs targets on top of current AVX"
1) Implement register-offloading to memory (SIMD structs) on top of current AVX
2) Add new SIMD compatiblity flag RT_SIMD_COMPAT_256=1/2 for 30-regs with AVX1/2
3) Find place in SIMD target mask (RT_128=1/RT_256=4) for custom 30-regs support
4) Improve mask-jump (mkj*x) instructions for 64-bit SIMD elements (optional)
5) Target 128-bit version to SSE, RT_SIMD_COMPAT_128=2/4/8/16/32 for 30-regs
6) Add tests to check defined immediate/displacement limits (for BASE/SIMD ops)
================================================================================
G) Task title: "consider 64-bit SIMD emulation with FPRs on PowerPC G5/POWER6"
1) Implement 64-bit SIMD registers/instructions as 2x64-bit FPRs (full-IEEE)
2) Emulate currently exposed SIMD instructions using FPU variants/fallbacks
3) Emulate 64-bit integer SIMD ops using 64-bit BASE registers where possible
4) New 64-bit SIMD backend would complement 32-bit targets in existing slots
5) Expose 16x128/8x256 on PowerPC VMX (v4) instead of 15x128/8x256 for 32-bit
6) Consider implementing 8x256 mode with register-offloading to mem for 64-bit
================================================================================
P) Task title: "use RT_REGS to unload SIMD target mask for 256-bit on POWER11"
(may require significant redesign of SIMD target mask handling in rtbase.h)
(better schedule this task for the next major update, also check rtzero.h)
(consider renaming SVE binaries to *.a*armSVE to match *.x*avx512 on x86)
================================================================================
O) Task title: "use 3-operand SIMD instructions in packed/scalar SIMD tests"
================================================================================
T) Task title: "improve SIMD test coverage, add tests for corner cases in ops"
================================================================================
C) Task title: "implement SIMD fp32/fp64 converters consistently across targets"
================================================================================
A) Task title: "implement SIMD fp16 converters as tier-1 extension, modern CPUs"
================================================================================
F) Task title: "implement scalar fp compare-to-flags, fp/fp & fp/int converters"
================================================================================
M) Task title: "add support for trigonometric/randomizer SIMD meta-instructions"
(consider sleef library as an example of elementary math functions with SIMD)
(https://github.com/shibatch/sleef) <- use this code snapshot as a base
================================================================================
L) Task title: "consider SoftFP library integration for full fp16/fp128 support"
================================================================================
V) Task title: "add support for various new and existing architectures"
1) Add support for RISC-V architecture with "vector extension proposal"
(search the Web for "RISC-V vector extension proposal" also standard SIMD)
2) Add support for Sunway SW26010 with custom Chinese BASE/SIMD ISAs (64-bit)
(https://en.wikipedia.org/wiki/SW26010)
3) Add support for Loongson 3 (GS464E) with LoongSIMD ops as well as MIPS64r3
(https://en.wikipedia.org/wiki/Loongson)
4) Add support for SPARC64 VIIIfx HPC-ACE SIMD extensions as well as BASE ops
(http://www.fujitsu.com/downloads/TC/sparc64viiifx-extensions.pdf)
5) Add support for ELBRUS architecture, emulate SIMD with VLIW (plus Itanium)
(https://en.wikipedia.org/wiki/Elbrus_2000)
================================================================================