-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathREADME
48 lines (44 loc) · 2.71 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
UniSIMD assembler is a high-level C/C++ macro assembler framework unified across
ARM, MIPS, POWER and x86 architectures. It establishes a subset of both BASE and
SIMD instruction sets with clearly defined common API, so that application logic
can be written and maintained in one place without code replication.
The assembler itself isn't a separate tool, but rather a collection of C/C++
header files, which applications need to include directly in order to use.
Initial documentation for the assembler is provided in core/config/rtdocs.h.
At present, Intel SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 (32/64-bit x86 ISAs),
ARMv7 NEON/NEONv2, ARMv8 AArch32 and AArch64 NEON, SVE (32/64-bit ARM ISAs),
MIPS 32/64-bit r5/r6 MSA and POWER 32/64-bit VMX/VSX (little/big-endian ISAs)
are mostly implemented (w/ horizontal reductions and byte/half SIMD+BASE ops)
although scalar improvements, wider SIMD vectors with zeroing/merging predicates
in 3/4-operand instructions, cross-precision fp-converters on modern CPU targets
are planned as extensions to current 2/3-operand SPMD-driven vertical SIMD ISA.
The project has a test framework for Linux/GCC/Clang and Windows/VC++/TDM64-GCC.
Support for macOS is provided via Command Line Tools with GCC and Clang options.
Instructions for resolving dependencies and building the binaries
for supported platforms can be found in the accompanying INSTALL file.
UniSIMD core features:
- Unified, Universal, Portable, Compatible code
- Explicit register allocation, predictable performance
- Three register sets for code: 8, 16, 32 (free: 8, 15, 30)
- High-level SIMD registers/ops as singles, pairs and quads
- SIMD-aligned backend structures with offsets/factors
- Vector-length agnostic vertical SIMD ISA, configurable
- Simultaneous scalar + 128/256-bit + configurable SIMD ops
- ISA implementation for fp16/fp128 (half/quad) SIMD ops
- C/C++, Compute, SPMD on 4 major archs
- Intel SSE/SSE2/SSE4 and AVX/AVX2/AVX-512
- ARMv7 NEON/NEONv2, ARMv8 AArch32/AArch64 NEON, SVE
- MIPS r5/r6 MSA (Warrior P5600, I6400/P6600)
- POWER VMX/VSX (PowerPC G4/G5, POWER6/7/8/9)
- CISC, RISC, CISC on RISC, little/big-endian ISA
- Support for reg-reg, load/store, load-op instructions
- Plain, indexed and scaled-indexed addressing modes
- FMA3 support (native or higher-precision emulation)
- 32/64-bit hybrid mode for native 64-bit ABI
- 32/64-bit addressing for BASE and SIMD ops
- 32/64-bit configurable SIMD elements (fp+int)
- Simultaneous 32/64-bit BASE (bridges, rules) and SIMD ops
- ISA implementation for int8/int16 (byte/half) BASE ops
- Full control over code, compiler steps out of the way
- Potential for bit-exact fp-compute across modern targets
- Used in QuadRay engine