aboutsummaryrefslogtreecommitdiff
path: root/c/blake3_sse2_x86-64_windows_gnu.S
AgeCommit message (Collapse)Author
2023-01-23Correct section names on Windows GNU assemblynamazso
2021-11-05fix incorrect output / undefined behavior in Windows SSE2 assemblyJack O'Connor
The SSE2 patch introduced xmm10 as a temporary register for one of the rotations, but xmm6-xmm15 are callee-save registers on Windows, and SSE4.1 was only saving the registers it used. The minimal fix is to use one of the saved registers instead of xmm10. See https://github.com/BLAKE3-team/BLAKE3/issues/206.
2021-02-06More movd/movq discrepancies. Fixes #149. (#150)Samuel Neves
This should be irrelevant, but some toolchains will not accept movd with 64-bit arguments.
2020-08-31remove avoidable spillSamuel Neves
2020-08-31C: asm: simplify pblendw emulationMatthew Krupcale
Use statically calculated ~mask. This reduces the number of moves and registers necessary at the expense of an extra memory load. This is probably a good trade-off since we are not bound by memory uops in this loop.
2020-08-31C: asm: simplify pinsrd emulationMatthew Krupcale
Use punpckl{,q}dq instead of pinsrw.
2020-08-30C: asm: remove blendvps usage altogetherMatthew Krupcale
This simplifies the operation by removing the need to use blendvps at all.
2020-08-24C: asm: emulate pshufb ROT8 using SSE2 instructionsMatthew Krupcale
Use a simple shift for the rotation. * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU. * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24C: asm: emulate pshufb ROT16 using SSE2 instructionsMatthew Krupcale
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits. * c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU. * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24C: asm: emulate pinsrd using SSE2 instructionsMatthew Krupcale
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location. * c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU. * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24C: asm: emulate blendvps using SSE2 instructionsMatthew Krupcale
Blend according to (mask & b) | ((~mask) & a). * c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU. * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24C: asm: emulate pblendw using SSE2 instructionsMatthew Krupcale
Use a constant mask to blend according to (mask & b) | ((~mask) & a). * c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix * c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU. * c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
2020-08-24Start SSE2 implementation based on SSE4.1 versionMatthew Krupcale
Wire up basic functions and features for SSE2 support using the SSE4.1 version as a basis without implementing the SSE2 instructions yet. * Cargo.toml: add no_sse2 feature * benches/bench.rs: wire SSE2 benchmarks * build.rs: add SSE2 rust intrinsics and assembly builds * c/Makefile.testing: add SSE2 C and assembly targets * c/README.md: add SSE2 to C build instructions * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings * c/blake3_dispatch.c: add SSE2 C dispatch * c/blake3_impl.h: add SSE2 C function prototypes * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2 assembly files starting with SSE4.1 version * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module configurations * src/platform.rs: add SSE2 rust platform detection and dispatch * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version * tools/instruction_set_support/src/main.rs: add SSE2 feature detection