| Age | Commit message (Collapse) | Author |
|
Apart from being pretty ambiguous in general, the term "context string"
has the specific problem that it isn't clear whether it should be
describing the input or the output. In fact, it's quite important that
it describes the output, because the whole point is to domain-separate
different outputs that derive from the *same* input. To make that
clearer, rename the "context string" to the "purpose string" in
documentation.
|
|
- Visual Studio <= 2015 does not support AVX-512 either way;
- Visual Studio 2017 does not tolerate vmovd with 64-bit operands;
- Visual Studio 2019 does not care.
|
|
|
|
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/152.
|
|
Some of the SIMD code is still unformatted, so for now I'm only touching
the files that just have a couple small changes.
|
|
|
|
This should be irrelevant, but some toolchains will not accept movd with 64-bit arguments.
|
|
|
|
|
|
related discussion here: https://github.com/BLAKE3-team/BLAKE3/issues/130
|
|
|
|
fix disabled-optimization -Wall -Werror
|
|
|
|
patch by Samuel Neves ( https://github.com/sneves )
if you tried to compile blake3_dispatch.c with
-Wall -Werror -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 -DBLAKE3_NO_AVX512
something like this would happen:
hans@xDevAd:~/projects/BLAKE3/c$ gcc -O0 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 -DBLAKE3_NO_AVX512 -Wall -Wextra -Wpedantic -Werror
blake3_dispatch.c: In function ‘blake3_compress_in_place’:
blake3_dispatch.c:139:26: error: unused variable ‘features’ [-Werror=unused-variable]
139 | const enum cpu_feature features = get_cpu_features();
| ^~~~~~~~
blake3_dispatch.c: In function ‘blake3_compress_xof’:
blake3_dispatch.c:167:26: error: unused variable ‘features’ [-Werror=unused-variable]
167 | const enum cpu_feature features = get_cpu_features();
| ^~~~~~~~
blake3_dispatch.c: In function ‘blake3_hash_many’:
blake3_dispatch.c:195:26: error: unused variable ‘features’ [-Werror=unused-variable]
195 | const enum cpu_feature features = get_cpu_features();
| ^~~~~~~~
blake3_dispatch.c: In function ‘blake3_simd_degree’:
blake3_dispatch.c:244:26: error: unused variable ‘features’ [-Werror=unused-variable]
244 | const enum cpu_feature features = get_cpu_features();
| ^~~~~~~~
cc1: all warnings being treated as errors
|
|
This will let us add big endian testing to CI for our C code. (We were
already doing it for our Rust code.)
This is adapted from test_vectors/cross_test.sh. It works around the
limitation that the `cross` tool can't reach parent directories. It's an
unfortunate hack, but at least it's only for testing. It might've been
less hacky to use symlinks for this somehow, but I worry that would
break things on Windows, and I don't want to have to add workarounds for
my workarounds.
|
|
Kudos to @pascal-cuoq and @jakub-zwolakowski from TrustInSoft for
catching these bugs.
Original report: https://github.com/BLAKE3-team/BLAKE3/pull/118
|
|
|
|
|
|
|
|
|
|
|
|
C: Add blake3_hasher_init_derive_key_len
|
|
|
|
|
|
|
|
Add SSE2 implementations
|
|
Use statically calculated ~mask. This reduces the number of moves and registers necessary at the expense of an extra memory load. This is probably a good trade-off since we are not bound by memory uops in this loop.
|
|
Use punpckl{,q}dq instead of pinsrw.
|
|
This simplifies the operation by removing the need to use blendvps at all.
|
|
blake3_hasher_init_derive_key_len is an alternative version of
blake3_hasher_init_derive_key which takes the context and its
length as separate parameters, and not together as a C string.
The motivation for this addition is making it easier for
bindings to this C library to call this function without
having to first copy over the context bytes just to add
one 0x00 byte at the end.
Notice that contrary to blake3_hasher_init_derive_key,
blake3_hasher_init_derive_key_len allows the inclusion of a
0x00 byte in the context. Given the rules about context string
selection, this byte is unlikely to be used as part of a context
string. But if for some reason it is ever given, it will be
included in the context string and processed like any other
non-alphanumeric byte would. For compatibility with
blake3_hasher_init_derive_key, bindings should still check for
the absence of 0x00 bytes.
|
|
|
|
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
|
|
MSVC returns "error A2006:undefined symbol : FFFFFFFFH", so use 0FFFFFFFFH instead. Also use 0 prefix for 0H to align things.
|
|
Previously, these masks were undefined because they were outside of the RDATA section.
|
|
MSVC returns "error A2006:undefined symbol : B1H", so use 0B1H instead.
|
|
Use a simple shift for the rotation.
* c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.
* c/blake3_sse2_x86-64_unix.S: emulate pshufb using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use two pinsrw and a 16-bit shift to insert the 32-bit integer at the desired location.
* c/blake3_sse2_x86-64_unix.S: emulate pinsrd using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Blend according to (mask & b) | ((~mask) & a).
* c/blake3_sse2_x86-64_unix.S: emulate blendvps using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
Use a constant mask to blend according to (mask & b) | ((~mask) & a).
* c/blake3_sse2_x86-64_unix.S: emulate pblendw using SSE2 instructions for x86_64 unix
* c/blake3_sse2_x86-64_windows_gnu.S: Likewise for x86_64 Windows GNU.
* c/blake3_sse2_x86-64_windows_msvc.asm: Likewise for x86_64 Windows MSVC.
|
|
intrinsics
Use a simple shift version for the 8-bit rotation.
* c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot8 using SSE2 intrinsics
|
|
intrinsics
Use two 16-bit shuffles: one for the low 64-bits and one for the high 64-bits.
* c/blake3_sse2.c: emulate _mm_shuffle_epi8 rot16 using SSE2 intrinsics
|
|
Use a constant mask to blend according to (mask & b) | ((~mask) & a).
* src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
* c/blake3_sse2.c: Likewise.
|
|
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.
* Cargo.toml: add no_sse2 feature
* benches/bench.rs: wire SSE2 benchmarks
* build.rs: add SSE2 rust intrinsics and assembly builds
* c/Makefile.testing: add SSE2 C and assembly targets
* c/README.md: add SSE2 to C build instructions
* c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
* c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
* c/blake3_dispatch.c: add SSE2 C dispatch
* c/blake3_impl.h: add SSE2 C function prototypes
* c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
* c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
assembly files starting with SSE4.1 version
* src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
* src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
configurations
* src/platform.rs: add SSE2 rust platform detection and dispatch
* src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
* tools/instruction_set_support/src/main.rs: add SSE2 feature detection
|
|
The default executable stack setting on Linux can be fixed in two different ways:
- By adding the `.section .note.GNU-stack,"",%progbits` special incantation
- By passing the `--noexecstack` flag to the assembler
This patch implements both, but only one of them is strictly necessary.
I've also added some additional hardening flags to the Makefile. May not be portable.
|
|
|
|
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/99.
|
|
|
|
|
|
Assembly: enable CET
|