| Age | Commit message (Collapse) | Author |
|
|
|
Changes since 0.3.6:
- BUGFIX: The C implementation was incorrect on big endian systems for
inputs longer than 1024 bytes. This bug affected all previous versions
of the C implementation. Little endian platforms like x86 were
unaffected. The Rust implementation was also unaffected.
@jakub-zwolakowski and @pascal-cuoq from TrustInSoft reported this
bug: https://github.com/BLAKE3-team/BLAKE3/pull/118
- BUGFIX: The C build on x86-64 was producing binaries with an
executable stack. @tristanheaven reported this bug:
https://github.com/BLAKE3-team/BLAKE3/issues/109
- @mkrupcale added optimized implementations for SSE2. This improves
performance on older x86 processors that don't support SSE4.1.
- The C implementation now exposes the
`blake3_hasher_init_derive_key_raw` function, to make it easier to
implement language bindings. Added by @k0001.
|
|
This will let us add big endian testing to CI for our C code. (We were
already doing it for our Rust code.)
This is adapted from test_vectors/cross_test.sh. It works around the
limitation that the `cross` tool can't reach parent directories. It's an
unfortunate hack, but at least it's only for testing. It might've been
less hacky to use symlinks for this somehow, but I worry that would
break things on Windows, and I don't want to have to add workarounds for
my workarounds.
|
|
Kudos to @pascal-cuoq and @jakub-zwolakowski from TrustInSoft for
catching these bugs.
Original report: https://github.com/BLAKE3-team/BLAKE3/pull/118
|
|
|
|
|
|
|
|
|
|
|
|
https://github.com/BLAKE3-team/BLAKE3/blob/master/b3sum/what_does_check_do.md
|
|
|
|
|
|
|
|
|
|
|
|
C: Add blake3_hasher_init_derive_key_len
|
|
|
|
|
|
|
|
|
|
Samuel noticed that rustc seems to assume (incorrectly?) that all i686
targets support SSE2, but it doesn't make that assumption for i586.
|
|
It will be very rare that this actually executes, but we should include
it for completeness.
|
|
This is quite hard to trigger, because SSE2 has been guaranteed for a
long time. But you could trigger it this way:
rustup target add i686-unknown-linux-musl
RUSTFLAGS="-C target-cpu=i386" cargo build --target i686-unknown-linux-musl
Note a relevant gotcha though: The `cross` tool will not forward
environment variables like RUSTFLAGS to the container by default, so if
you're testing with `cross` you'll need to use the `rustc` command to
explicitly pass the flag, as I've done here in ci.yml. (Or you could
create a `Cross.toml` file, but I don't want to commit one of those if I
can avoid it.)
|
|
|
|
|
|
Add SSE2 implementations
|
|
Use statically calculated ~mask. This reduces the number of moves and registers necessary at the expense of an extra memory load. This is probably a good trade-off since we are not bound by memory uops in this loop.
|
|
This enables pretty printing via `{:#?}`. The normal style for `{:?}` is
kept exactly the same.
|
|
Use punpckl{,q}dq instead of pinsrw.
|
|
This simplifies the operation by removing the need to use blendvps at all.
|
|
blake3_hasher_init_derive_key_len is an alternative version of
blake3_hasher_init_derive_key which takes the context and its
length as separate parameters, and not together as a C string.
The motivation for this addition is making it easier for
bindings to this C library to call this function without
having to first copy over the context bytes just to add
one 0x00 byte at the end.
Notice that contrary to blake3_hasher_init_derive_key,
blake3_hasher_init_derive_key_len allows the inclusion of a
0x00 byte in the context. Given the rules about context string
selection, this byte is unlikely to be used as part of a context
string. But if for some reason it is ever given, it will be
included in the context string and processed like any other
non-alphanumeric byte would. For compatibility with
blake3_hasher_init_derive_key, bindings should still check for
the absence of 0x00 bytes.
|
|
|
|
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
|
|
MSVC returns "error A2006:undefined symbol : FFFFFFFFH", so use 0FFFFFFFFH instead. Also use 0 prefix for 0H to align things.
|
|
Previously, these masks were undefined because they were outside of the RDATA section.
|
|
MSVC returns "error A2006:undefined symbol : B1H", so use 0B1H instead.
|
|
SSE2 target_feature appears to always be present for x86_64.
|
|
Use a simple shift for the rotation.
* c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.
* c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.
* c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Blend according to (mask & b) | ((~mask) & a).
* c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use a constant mask to blend according to (mask & b) | ((~mask) & a).
* c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
intrinsics
Use a simple shift version for the 8-bit rotation.
* c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot8 using SSE2 intrinsics
|
|
intrinsics
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.
* c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot16 using SSE2 intrinsics
|
|
Use a constant mask to blend according to (mask & b) | ((~mask) & a).
* src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
* c/blake3_sse2.c: Likewise.
|
|
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.
* Cargo.toml: add no_sse2 feature
* benches/bench.rs: wire SSE2 benchmarks
* build.rs: add SSE2 rust intrinsics and assembly builds
* c/Makefile.testing: add SSE2 C and assembly targets
* c/README.md: add SSE2 to C build instructions
* c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
* c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
* c/blake3_dispatch.c: add SSE2 C dispatch
* c/blake3_impl.h: add SSE2 C function prototypes
* c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
* c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
assembly files starting with SSE4.1 version
* src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
* src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
configurations
* src/platform.rs: add SSE2 rust platform detection and dispatch
* src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
* tools/instruction_set_support/src/main.rs: add SSE2 feature detection
|
|
The default executable stack setting on Linux can be fixed in two different ways:
- By adding the `.section .note.GNU-stack,"",%progbits` special incantation
- By passing the `--noexecstack` flag to the assembler
This patch implements both, but only one of them is strictly necessary.
I've also added some additional hardening flags to the Makefile. May not be portable.
|
|
|
|
|
|
|