-
Notifications
You must be signed in to change notification settings - Fork 8
/
Copy pathVERSION
407 lines (383 loc) · 25.4 KB
/
VERSION
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
v1.1.0d: UniSIMD, code name "ENsed+1d": macOS M1, VS2022, Ubuntu 22.04
- switch to malloc for 64-bit pointer/address combo
- use assembler-local labels to build on M1 macOS
- add support for M1 macOS to makefiles
- add VS2022 support for SIMD test
- add notes for building on Windows with VS2022 and M1 macOS
- update documentation and main header (add braces to ASM_INIT)
- add double-precision logic/arithmetic to ARMv7, x86
- add workarounds for POWER8 and POWER9 targets on Ubuntu 22.04
- drop ppc64abi32 targets (since QEMU 5.2.0), also from QEMU build script
- add notes for VS2022, QEMU 6.2.0 and 7.2.0, Ubuntu 23.04
- swap 16-bit and SIMD integer compare test groups (30-37 <-> 38-44)
- update copyright year to 2023
v1.0.0g: UniSIMD, code name "ENsed+g", backports, VS2022, Ubuntu 22.04
- switch to malloc for 64-bit pointer/address combo
- require both SSE4.1 and SSE4.2 for SSE4 (v4) target slots
- add DAZ support for flush-to-zero mode on x86 (makes on par with RISCs)
- backport integer SIMD compare subset (min/max/ceq/cne/clt/cle/cgt/cge)
- backport tests for integer SIMD compare (30-36) (signed/unsigned)
- target slots AVX512DQ now include VL backends for 128/256-bit subsets
- optimize SIMD compare and mask-jump instructions for AVX-512
- add 64-bit sign/zero-extend bridges to existing 32/64-bit BASE subsets
- optimize standalone remainder instructions on ARM and POWER
- implement direct ASM section output comparison method (bypass C++ test)
- extended 30-reg 256-bit and 15-reg 512-bit POWER backends are deprecated
- extended POWER backends are still supported with v1.0.0f ASM feature set
- add VS2022 support for SIMD test
- update build scripts with TDM64-GCC 10.3.0-2 compiler reference
- update documentation and main header (add braces to ASM_INIT)
- add double-precision logic/arithmetic to ARMv7, x86
- add workarounds for POWER8 and POWER9 targets on Ubuntu 22.04
- drop ppc64abi32 targets (since QEMU 5.2.0), also from QEMU build script
- add notes for VS2022, QEMU 6.2.0 and 7.2.0, Ubuntu 23.04
- clean up comments in BASE and SIMD headers
- update copyright year to 2023
v1.1.0c: UniSIMD, code name "ENsed+1c": full-stack SIMD/BASE
- add 8-bit (byte) BASE instruction subset, redesign 16-bit BASE
- add 8-bit elements SIMD subset (native on RISCs, mostly emulated on x86)
- add 64-bit sign/zero-extend bridges to existing 32/64-bit BASE subsets
- add 32-bit sign/zero-extend bridges to new 8/16-bit BASE subsets
- add RT_BASE flag to limit addressing granularity, extend range on ARMv8
- add mask-jump (mkj) SIMD instructions for 8/16-bit SIMD subsets
- add DAZ support for flush-to-zero mode on x86 (makes on par with RISCs)
- add support for AVX-512 fp16 subset to match existing ARMv8.2 + SVE
- AVX-512 fp16 requires separate binary (no target slot or cap check)
- implement direct ASM section output comparison method (bypass C++ test)
- AVX-512 fp16 now provides validation for ARM's fp16 using above method
- target slots AVX512DQ now include VL backends for 128/256-bit subsets
- target slots AVX512DQ now require BW support to facilitate 8/16-bit SIMD
- optimize SIMD compare and mask-jump instructions for AVX-512
- optimize setting flags instructions in 8/16-bit BASE subsets (on RISCs)
- optimize standalone remainder instructions on ARM and POWER
- extended 30-reg 256-bit and 15-reg 512-bit POWER backends are deprecated
- extended POWER backends are still supported with v1.0.0f ASM feature set
- fix 16-bit (half-int) BASE addressing granularity on POWER
- add notes for VS2022, QEMU 6.2.0, Intel SDE 9.0
- clean up comments in BASE and SIMD headers
- update copyright year to 2022
v1.1.0b: UniSIMD, code name "ENsed+1b", second development release
- implement integer SIMD compare subset (signed/unsigned)
- add integer SIMD compare on MIPS32/64 (min/max/ceq/cne/clt/cle/cgt/cge)
- add integer SIMD compare on ARMv8 (64-bit min/max emulated) and SVE
- add integer SIMD compare on POWER (64-bit emulated on POWER7)
- add integer SIMD compare on x86+SSE2/4 (64-bit emulated)
- add integer SIMD compare on x86+AVX1/2 (emulated for full SIMD AVX1)
- add integer SIMD compare on x86+AVX512
- add integer SIMD compare on original legacy targets (ARMv7, x86, PPC G4)
- add integer SIMD compare for half-int SIMD backends (16-bit elements)
- require both SSE4.1 and SSE4.2 for SSE4 (v4) target slots
- add tests for integer SIMD compare (38-51)
v1.0.0f: UniSIMD, code name "ENsed+f", fixes and tests
- fix displacement encodings on MIPS
- add testing for displacement levels and types
- update makefiles to support ancient HW (SSE2, SSE1 has issue with cvzps)
- update SIMD test framework, add scripts for test automation
- update comments for QEMU 5.2.0 and QEMU 6.0.0 (require ninja-build)
v1.1.0a: UniSIMD, code name "ENsed+1a", first development release
- add tests for half-int SIMD/BASE ops (run level 30-37)
- drop extended POWER targets from SIMD testing (no half-int support)
- add half-int SIMD arithmetic with saturate (except original SSE1)
- add implementation for half-int BASE ops across modern targets
- add BASE half-int support on legacy ARMv7 and x86
- add SIMD half-int support on legacy ARMv7 and x86
- adjust displacement types for BASE half-int on legacy ARMv7 and x86
- adjust displacement types for BASE half-int on MIPS and POWER
- adjust displacement types for BASE half-int on x86_64
- adjust displacement types for scalar fp16 on ARMv8
- split SIMD half-int subset from fp16 on ARMv8
- add SIMD half-int support on x86_64, enable on ARMv8
- add SIMD half-int support on MIPS and POWER
- add preliminary support for POWER9 fp128 SIMD ops (not tested)
- add preliminary support for ARMv8.2 fp16 SIMD ops (not tested)
v1.0.0e: UniSIMD, code name "ENsed+e", 2021 extended support
- clarify instructions for POWER8 server, Raspberry Pi 3/4
- update links and comments in project files
- make comment for compiler swapping on MIPS more generic
- update mappings for byte/char SIMD ops
- update TDM64-GCC compiler reference to version 9.2.0
- update copyright year to 2021
- update comments for remainders and scaled addressing
- optimize remainder ops on POWER9
- add scaled-indexed addressing modes
v1.0.0d: UniSIMD, code name "ENsed+d", documentation edition
- clean up task descriptions in roadmap
- add notes for Ubuntu, QEMU, MIPS cross-compilers
- add Ubuntu (MATE) 20.04 LTS to makefile notes
- update standalone MIPS compiler to 2020.06-01
- change RUN_LEVEL to SUB_TEST for better wording
- clean up comment about displacement values
- add initial documentation for the assembler
- add sin/cos and log/exp math definitions to rtbase
v1.0.0c: UniSIMD, code name "ENsed++", celebration edition
- celebrating C++ and its various compilers
- add notes for Ubuntu Server on Raspberry Pi 4
- add -mcpu=power8 compiler option to makefiles on POWER
- fix RISC targets with clang after version 6.0
- update copyright year to 2020
v1.0.0b: UniSIMD, code name "ENsed+b", 2020-02-02 archive edition
- all releases after 2020-01-01 have 2nd naming from their baseline: (ENsed)
- letter from the update (b,c,..) appears concatenated after (+) in the name
- future minor releases (v1.X.0a) will have digit and letter (+1a, +2a, +3a)
- future major releases (v2.X.0a) will have the form: (2+, 2+1a, 2+2a, 2+3a)
- clean up and update comments related to recent compiler and QEMU versions
- fix comments for SIMD instructions in 3-operand forms, clarify for SIMD div
- add SIMD fma3 aliases as 3-operand forms: fma**3**
- fix SIMD fma3 emulation with fp32 elements on AVX1
v1.0.0a: UniSIMD, code name "ENsed+", 2020-edition ("ENsed" + 2019 updates)
- all new releases from now on will use *X.Y.Za(bc..) naming scheme
- all branches start with letter (b), all tags start with letter (v)
- first release (tag) on every new branch will be marked with letter (a)
- all subsequent minor updates will have letters (b,c,..), tags aren't moving
- add SIMD flag to replace VMX targets with VSX (on)
- add signed BASE ops to combined-arithmetic-jump (arithmetic shift right)
- add setting-flags BASE arithmetic shift right
- make setting-flags BASE ops orthogonal to size/type (cmd**Z**)
- add -mips64r6 compiler option to makefiles on MIPS
- optimize 64-bit SIMD shifts on POWER9, clean up mkj** formatting
- improve ARM/x86 compatibility in SIMD shifts
- add SIMD integer multiply instruction (for 32/64-bit elements)
- update copyright year to 2019
- fix 32-bit BASE compare-to-mem on 64-bit POWER (backported down)
- fix usage of non-persistent temp-register on POWER
- update build instructions and makefile notes
- add notes about QEMU 3.1.0 for SVE emulation
- add SIMD flag to replace VMX targets with VSX (off)
- fix and clean up SIMD target selection in headers
- fix/add comments for SIMD/BASE shift count value
- adjust build instructions for older HW compatibility
- adjust Win64 release build script for lower core-count
v1.0.0: UniSIMD assembler, code name "ENsed", base for future SIMD enhancements
- foundation for QuadRay engine 0.7.0 "GIzmo" with ARM-SVE, POWER9, new scheme
- renewed directory structure, move BASE and SIMD header files to core/config
- add new fp-compatibility and feature tasks, rename TASKS file to ROADMAP
- add support for 30 SIMD register pairs (2x128) backend on POWER7/8
- add support for 30 SIMD registers (scalar+128+256) backend on Skylake-X
- drop standalone SSE2 target from x64, reuse SSE4 (v4) slot, add compat flag
- add support for 128-bit AVX1+FMA3 (v16) and AVX2+FMA3 (v32) targets for AMD
- compactify POWER7/8 targets into one slot, add new RT_SIMD_COMPAT_PW8 flag
- swap legacy PowerPC G4/POWER6 VMX (now v4) with POWER7/8 VSX1/2 (now v1)
- 64-bit POWER6 now matches 64-bit Nehalem target (both v4), 15x128/8x256-bit
- add support for POWER9 backend (v2) with immediate vector loads/stores
- move 128-bit 30 SIMD registers Skylake-X target from v1 to v2, match POWER9
- reserve 128-bit v1 and 256-bit v4 for 30 SIMD registers emulation on AVX1/2
- implement plain ARM-SVE backend (v4) for 256/512/1K4/2K8-bit vector lengths
- implement paired ARM-SVE backend (v1) for 512/1K4/2K8-bit SIMD target slots
- new scheme: RT_128=4+8, RT_256=1+2, RT_512=1+2, RT_1K4=1+2 are 15 registers
- new scheme: RT_128=1+2, RT_256=4+8, RT_512=4+8, RT_1K4=4+8 are 30 registers
- add elm*x_st instruction to detach scalar subset from vectors (via mem)
- add support for horizontal pairwise/reductive add/mul/min/max instructions
- patch system allocators to compile on macOS, widen OS support in makefiles
- clean up SIMD tests to support PIE (also macOS)
- separate 64-bit Linux from multilib build scripts, add for macOS
- add VMX-compatible scalar SIMD subset on PPC G4 and POWER family of CPUs
- add MSA/scalar compatibility on big-endian MIPS, support for fp32 11-bit DP
- rename sections in target-specific headers to BASE, SIMD, ELEM (for scalar)
- optimize long displacements for BASE, SIMD, ELEM on RISCs where applicable
- implement proper SIMD-scaling for displacement types (as sliding in rtbase)
- move common internal x87 FPU sections to BASE headers on x86
- dedicate rtconf header for configurable instruction subsets on all targets
- allow target-specific headers to redefine common instructions from rtbase
- improve SIMD target reporting in tests, add -c n option to reduce test time
- update notes for MIPS cross-compiler location, add -mnan=2008 to makefiles
- update notes for AArch64 Linux, QEMU 3.0.0, Intel SDE, add ARM IE reference
- add test for SIMD mask-move (mmv), run level 27
- add test for 8/15/30 BASE/SIMD registers, run level 28
- warning-free building with GCC/Clang and MSVC++
- fix BASE shifts with zero immediate arg on legacy ARMv7 (backported down)
- convert all text files with unix2dos
- always reserve maximum space for SIMD register file
- save/restore temp predicate register on AVX512
- fix SIMD registers save/restore for 15x128x2 on POWER7
- fix temporary FPRs save/restore on POWER
- fix scalar SIMD min/max on POWER7
- fix BASE compare immediate encodings on POWER
- fix location for 128/256-bit common SIMD instructions
- fix for scalar SIMD alignment on ARMv7, POWER8
- fix compilation in C++11 mode with RT_DEBUG=2
- add comment for NaNs handling in floating point piepline
- clarify comments about SIMD fp round instructions
- fix comment for SIMD shifts with count in memory
- add comment for scalar/vector compatibility
v0.9.1: Unified SIMD Assembler, 3-operand + basic scalar SIMD, extra backends
- expose 128/256-bit SIMD subsets (cmd[i/j/l]*, cmd[c/d/f]*) simultaneously
- add 3-operand SIMD instructions to all targets, emulate where not present
- implement basic scalar SIMD support (arithmetic + compare-to-mask-elem)
- implement additional paired/quaded 8-register SIMD backends on x86_64
- add 8-register makefile flags RT_256_R8, RT_512_R8, RT_1K4_R8, RT_2K8_R8
- original 15-register makefile flags RT_128, RT_256, RT_512 remain
- add new makefile flag RT_1K4 for 15-register code-bases on paired AVX-512
- expose 30 registers as an extension to common baseline of 15 where present
- each major architecture has at least one SIMD target with 30 registers
- add new RT_SIMD selector flag to remap vector-length-agnostic subsets
- add new RT_REGS selector flag to choose targets within given RT_SIMD width
- rename SIMD target headers to reflect size-factor/sub-variant, move legacy
- add new internal flags RT_128X*, RT_256X*, RT_512X* to match SIMD headers
- new internal flags keep SIMD sub-variant value in format for native width
- implement SIMD flags compatibility layer in rtzero to map makefile flags
- rtarch main header selects appropriate BASE/SIMD target from flags above
- implement SIMD target format converters in rtbase for runtime selection
- change SIMD target reporting to native-size x size-factor v version format
- reserve _RX slots in SIMD target mask for predicated backends (30+8 regs)
- clean up (drop) legacy SSE(1) support from x32 headers/makefiles
- move BASE sub-target selection to rtarch main header (ARM, x86)
- add notes for AArch64 Linux on Raspberry Pi 3 to INSTALL file
- add new TASKS file with description for future tasks
- enforce full ARMv7 instruction set (32-bit words) in makefiles
- fix LLVM's condition evaluation sign on all targets, define M -/+
- fix SIMD registers save/restore for 128-bit AVX targets (backported down)
- fix buffer allocation in SIMD tests (for 64-bit elems)
- fix stack alignment (now 16 bytes) on ARMv8/AArch64 (hardware) targets
- allow external override (from makefiles) for SIMD compatibility modes
- minor fixes in rtarch, accelerate release builds on multi-core machines
v0.9.0: Unified SIMD Assembler, 256-bit SIMD on RISCs, basic AVX-512 support
- adjust root rt_SIMD_INFO struct to contain both 32-bit and 64-bit constants
- add new sign-mask and full-mask general purpose constants to rt_SIMD_INFO
- expose 32/64-bit SIMD-element-size subsets (cmdo*, cmdq*) simultaneously
- element size in existing cmdp* subset remains configurable with RT_ELEMENT
- all three SIMD subsets (cmdo*, cmdp*, cmdq*) are still SIMD-width-agnostic
- expose fixed 64-bit BASE subset cmdz* for 64-bit targets only
- existing address-size cmdx*, element-size cmdy* and 32-bit cmdw* remain
- add BASE move instructions for 64-bit immediates as pairs of 32-bit types
- add new rotate-right and inverse-logic BASE instructions (ror, ann, orn)
- add new BMI1/BMI2 implementations for existing BASE instructions on x86
- implement non-portable x87 ISA subset for x86 targets internally
- implement fused-multiply-accumulate (fma/fms) on all SIMD targets
- add new mask-move SIMD instructions to common SIMD ISA (was x86 only)
- add new fp-negate and inverse-logic SIMD instructions (neg, orn, not)
- add new variable SIMD shifts with per-element count to all targets
- implement 256-bit SIMD support (2x128-bit, 15 regs) on modern RISC targets
- implement 512-bit SIMD support (4x128-bit, 15 regs) on modern POWER targets
- implement 512-bit SIMD support (1x512-bit, 16 regs) on future x86 targets
- AVX1/AVX2 256-bit SIMD for x86 (1x256-bit, 16 regs) remains supported
- 256-bit SIMD with 15 regs becomes new common baseline for modern hardware
- improve test coverage for BASE and SIMD load-op instructions
- add tests for new rotate, logic, shifts, fma/fms instructions, run level 24
- add rtzero header file to clean up assembler definitions after use
- rename instruction parameters to better reflect their use as source/dest
- add formulas for all BASE and SIMD instructions for better clarity
- reserve the whole alphabet for future BASE and SIMD instruction subsets
- add new SIMD compatibility flags for 128-bit AVX1/2, FMA/FMS/FMR, XMM regs
- add wrappers for 64-bit literals to better support legacy 32-bit compilers
- fix label_ld/label_st range on ARMv7/AArch64 to be on par with other targets
- fix discrepancy in VMX/VSX vector-loads on POWER (from here backported down)
- fix AVX-version of mmvpx_ld from zeroing to merging on x86
v0.8.1: Unified SIMD Assembler, full 64-bit fp/int SIMD compute elements
- add element-sized BASE ISA subset to fixed-32-bit and address-sized subsets
- new instruction mnemonics introduced for element-sized BASE subset (cmdy*)
- add new rtarch headers to house element-sized SIMD subset for 64-bit targets
- support for 64-bit SIMD elements currently requires 64-bit addresses as well
- enable full-precision SIMD rcpps/rsqps and rceps/rseps instructions
- add new offset corrections for endianness related to element-sized subset
- add new SIMD width short names for fixed and element-sized SIMD fields
- add new custom-sized integer types (address, element) with printf mods
- make current adjustable fp types follow SIMD element size (RT_ELEMENT)
- adjust math macros and definitions to support double-precision arithmetic
- add build/clean scripts, update makefiles with extra targets, MIPS notes
- remove unnecessary limitation on SIMD masks (add AVX-512/ARM-SVE notes)
- distinguish SIMD NEONv1/v2 vanilla ARM builds (cortex-a8/cortex-a15)
- distinguish SIMD v2/v4 64-bit POWER builds (POWER7+VSX/POWER8+VSX2)
- fix non-setting-flags instructions to not interfere with cmp on MIPS, POWER
- fix full-precision IEEE-compat divps_ld on ARMv7 targets (backported down)
v0.8.0: Unified SIMD Assembler, full 64-bit addressing for BASE and SIMD
- double original 32-bit BASE ISA to fixed-32-bit and address-sized subsets
- original instruction mnemonics follow in-heap/code-segment address size
- new instruction mnemonics introduced for fixed-32-bit subset (cmdw*)
- setting-flags instruction mnemonics remapped from (cmdz*) to (cmd*z)
- add combined-arithmetic-jump wrapper for better API stability/efficiency
- add new rtarch headers to house address-sized subset for 64-bit targets
- move original (now address-sized) mappings to rtbase for 32-bit targets
- add canonical forms for BASE div/rem and shifts (not always efficient)
- add setting-flags versions for BASE orr/xor and unsigned shifts
- remap one-operand instructions from cmd**_rr/mm to rx/mx and xr/xm
- move stack instructions to their own section at the end of rtarch headers
- move sregs instructions to their own section at the end of rtarch headers
- add config flags for full-precision SIMD rcpps/rsqps instructions
- add master flags for SIMD compatibility modes to rtarch main header
- add new offset corrections for endianness (from here backported down)
- add Win64 support via TDM64-GCC toolchain (tdm64-gcc-5.1.0-2.exe)
- add NULL-ptr checks to custom allocators (Linux/mmap, Win64/VirtualAlloc)
- fix setting-flags instructions for 64-bit POWER running 32-bit ISA
- fix non-setting-flags instructions (neg*x) to not set flags on MIPS
v0.7.1: Unified SIMD Assembler, 64/32-bit hybrid mode for native 64-bit ABI
- use fixed-sized and adjustable integer types in rtbase and SIMD test
- add a64 (AArch64 native ABI) and x64 (x86_64 native ABI) targets/makefiles
- add m64 (MIPS64 native ABI) and p64 (Power64 native ABI) targets/makefiles
- most of the current ISA remains 32-bit for BASE and SIMD with few exceptions
- adjust backend structures to support 64-bit pointer types in select places
- move sys_alloc/sys_free to platform-specific sections in SIMD test
- implement custom allocators (mmap) to limit address range to 32-bit (Linux)
- limit address range to 2GB boundary as MIPS64 sign-extends 32-bit mem-loads
- treat code labels as 64-bit in label_ld/st and jmpxx_mm instructions
- implement 64-bit versions of stack_sa/la instructions on MIPS and POWER
- fix variable SIMD shifts to support little-endian on POWER targets
- fix ASM blocks to only use SIMD registers within VRSAVE segment on POWER
- remove ASM block's zeroing of r15 as unnecessary on x32/x64 targets
- reformat/rework ASM blocks to better respect internal register mapping
- explicitly save/load SIMD registers in ASM blocks across all targets
- drop ASM clobber lists for lack of consistency across targets/SIMD-widths
- fix clang's ASM block l-value errors and other warnings, official support
- add build instructions to makefiles for Ubuntu 16.04 LTS 64-bit Live CD
- fix divps_ld instruction's encoding on ARM
- use IEEE-compatible div/sqr on legacy ARM and POWER
v0.7: Unified SIMD Assembler, additional 32-bit CPU architectures
- add a32 (AArch64:ILP32 ABI) and x32 (x86_64:mx32 ABI) targets/makefiles
- add m32 (MIPS32r5/r6 + MSA) and p32 (POWER + VMX/VSX) targets/makefiles
- add yet another SIMD variant (v4) for x86/SSE4.1 and ARMv8/AArch32
- separate ARMv7/ASIMDv2 (v2) and ARMv8/AArch32 (v4) SIMD variants on ARM
- add ARM builds for Raspberry Pi 2 and 3 in addition to Nokia N900
- use static linking in SIMD tests for QEMU emulation
- add mmv (blendvps) to x86/x32 SSE4.1 for fast conditional loads
- add combined-compare-jumps to rtarch for better efficiency (MIPS, POWER)
- remove limitation for BASE instructions to only accept DP offsets
- add new immediate/displacement types, add comment that they are unsigned
- add comments throughout rtarch about instructions' set-flags behavior
- implement full-range 32-bit integer divide on ARMv7 (v1) as 64-bit fp-div
- add widening versions of integer multiply instructions to rtarch definitions
- add remainder wrappers for integer divide instructions to rtarch definitions
- add IEEE-compatible versions of fp div & sqr for ARMv7 and POWER targets
- add "residual correction" to non-IEEE fp div on ARMv7 and POWER targets
- add SIMD tests for fp-to-int round and int-div remainder, run level 18
v0.6: Unified SIMD Assembler, additional SIMD targets
- rename SIMD target files to reflect SIMD width
- enable SIMD instructions definitions only if RT_SIMD_CODE is defined
- add new SIMD targets for SSE1, AVX1, AVX2 with corresponding build flags
- add float-to-integer convert with explicit mode parameter (x86, AArch32)
- add signed-integer-divide native instruction for ARM's AArch32 mode
- add SIMD test for shifts by runtime value & BASE register, run level 16
- add ver (cpuid) instruction for runtime SIMD target selection (x86 only)
- add mmv (vmaskmov) to AVX backend for fast conditional loads/stores
- add BASE instructions sub-tests to SIMD test if RT_BASE_TEST is defined
- drop set-flags bit (slow) from BASE mul instructions on ARM
- add RT_SIMD_FAST_FCTRL to save 1 instruction on FCTRL blocks entry
- clarify current and future targets in rtarch (from here backported down)
- add xor & neg BASE instructions to rtarch
- add shifts by fixed BASE register instructions
- add register versions of BASE mul/div, remainder instructions
- add SIMD cvzps instruction for fp-to-int round-towards-zero conversion
- add ASM_ENTER_F/ASM_LEAVE_F/ROUND*_F for non-IEEE flush-to-zero SIMD mode
- add RT_SIMD_FLUSH_ZERO to enable faster non-IEEE flush-to-zero SIMD mode
- add ASM_INIT/ASM_DONE to manage root info structure
- make stack pointer register architecturally invisible
- replace non-standard malloc.h with stdlib.h for malloc/free
- clean up rtarch whitespace formatting
v0.5: Unified SIMD Assembler, API freeze for the engine
- instruction naming scheme finalized
- change ARM instructions to set flags
- added framework for internal constants (used by reciprocals)
- added SIMD instruction for cube root, reciprocal steps redesigned
- additional SIMD tests, run level 15
v0.4: SIMD test framework, macro assembler overhaul
- macro expansion reworked for better compiler compatibility
- immediate/displacement parameters handling redesigned
- added reciprocal support for SSE, MPE support refined
v0.3: SIMD test framework, run level 9
- tests for integer mul, div, jmp instructions
- SIMD tests for integer add, shl, shr instructions
- SIMD tests for cvt, sqr, rsq instructions
v0.2: SIMD test framework, run level 5
- SIMD tests for mul, div, cmp instructions
v0.1: SIMD test framework, run level 1
- SIMD tests for add, sub instructions
v0.0: Empty project
- initial file set and directory structure