| Age | Commit message (Collapse) | Author |
|
|
|
|
|
It will be very rare that this actually executes, but we should include
it for completeness.
|
|
This is quite hard to trigger, because SSE2 has been guaranteed for a
long time. But you could trigger it this way:
rustup target add i686-unknown-linux-musl
RUSTFLAGS="-C target-cpu=i386" cargo build --target i686-unknown-linux-musl
Note a relevant gotcha though: The `cross` tool will not forward
environment variables like RUSTFLAGS to the container by default, so if
you're testing with `cross` you'll need to use the `rustc` command to
explicitly pass the flag, as I've done here in ci.yml. (Or you could
create a `Cross.toml` file, but I don't want to commit one of those if I
can avoid it.)
|
|
Add SSE2 implementations
|
|
This enables pretty printing via `{:#?}`. The normal style for `{:?}` is
kept exactly the same.
|
|
Use _mm_and_si128 and _mm_cmpeq_epi16 rather than expensive multiplication _mm_mullo_epi16 with _mm_srai_epi16 that compiler may not be able to optimize.
|
|
SSE2 target_feature appears to always be present for x86_64.
|
|
Use a constant mask to blend according to (mask & b) | ((~mask) & a).
* src/rust_sse2.rs: emulate _mm_blend_epi16 using SSE2 intrinsics
* c/blake3_sse2.c: Likewise.
|
|
Wire up basic functions and features for SSE2 support using the SSE4.1 version
as a basis without implementing the SSE2 instructions yet.
* Cargo.toml: add no_sse2 feature
* benches/bench.rs: wire SSE2 benchmarks
* build.rs: add SSE2 rust intrinsics and assembly builds
* c/Makefile.testing: add SSE2 C and assembly targets
* c/README.md: add SSE2 to C build instructions
* c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds
* c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings
* c/blake3_dispatch.c: add SSE2 C dispatch
* c/blake3_impl.h: add SSE2 C function prototypes
* c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version
* c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2
assembly files starting with SSE4.1 version
* src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings
* src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module
configurations
* src/platform.rs: add SSE2 rust platform detection and dispatch
* src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version
* tools/instruction_set_support/src/main.rs: add SSE2 feature detection
|
|
|
|
It looks like I originally made this mistake when I was copying code
from the baokeshed prototype (a274a9b0faa444dd842a0584483eae6e97dbf21e),
and then it got replicated into the C implementation later.
|
|
|
|
|
|
|
|
|
|
|
|
A helper function was incorrectly restricted to x86-only. CI doesn't
currently cover this, because benchmarks are nightly-only, and it's kind
of inconvenient to set RUSTC_BOOTSTRAP=1 through `cross` (which normally
doesn't propagate env vars). But maybe we should start...
|
|
There are two scenarios where compiling AVX-512 C or assembly code might
not work:
1. There might not be a C compiler installed at all. Most commonly this
is either in cross-compiling situations, or with the Windows GNU
target.
2. The installed C compiler might not support e.g. -mavx512f, because
it's too old.
In both of these cases, print a relevant warning, and then automatically
fall back to using the pure Rust intrinsics build.
Note that this only affects x86 targets. Other targets always use pure
Rust, unless the "neon" feature is enabled.
|
|
This lets CI test a wider range of possible SIMD support settings.
|
|
The biggest change here is that assembly implementations are enabled by
default.
Added features:
- "pure" (Pure Rust, with no C or assembly implementations.)
Removed features:
- "c" (Now basically the default.)
Renamed features;
- "c_prefer_intrinsics" -> "prefer_intrinsics"
- "c_neon" -> "neon"
Unchanged:
- "rayon"
- "std" (Still the only feature on by default.)
|
|
|
|
Suggested by @zaynetro:
https://github.com/BLAKE3-team/BLAKE3/pull/24#issuecomment-594369061
|
|
|
|
If the total number of chunks hashed so far is e.g. 1, and update() is
called with e.g. 8 more chunks, we can't compress all 8 together. We
have to break the input up, to make sure that that 1 lone chunk CV gets
merged with its proper sibling, and that in general the correct layout
of the tree is preserved. What we should do is hash 1-2-4-1 chunks of
input, using increasing powers of 2 (with some cleanup at the end). What
we were doing was 2-2-2-2 chunks. This was the result of a mistaken
optimization that got us stuck with an always-odd number of chunks so
far.
Fixes https://github.com/BLAKE3-team/BLAKE3/issues/69.
|
|
|
|
|
|
|
|
|
|
We use a counter value that's very close to wrapping the lower word,
when we're testing the hash_many chunks case. It turns out that this is
a useful thing to do with parents too, even though parents 1) are
teeechnically supposed to always use a counter of 0, and 2) aren't going
to increment the counter at all. We caught a bug in the assembly
implementations this way (where we accidentally did increment the
counter, but only the higher word), because the equivalent test in
rust_c_bindings uses this eccentric parents counter value.
|
|
https://github.com/rust-lang/rust/issues/68905 is currently causing
nightly builds to fail, unless `--no-default-features` is used. This
change means that the default build will succeed, and the failure will
only happen when the "c_avx512" is enabled. The `b3sum` crate will still
fail to build on nightly, because it enables that feature, but most
callers should start succeeding on nightly.
|
|
This is a new interface that allows the caller to provide a
multi-threading implementation. It's defined in terms of a new `Join`
trait, for which we provide two implementations, `SerialJoin` and
`RayonJoin`. This lets the caller control when multi-threading is used,
rather than the previous all-or-nothing design of the "rayon" feature.
Although existing callers should keep working, this is a compatibility
break, because callers who were relying on automatic multi-threading
before will now be single-threaded. Thus the next release of this crate
will need to be version 0.2.
See https://github.com/BLAKE3-team/BLAKE3/issues/25 and
https://github.com/BLAKE3-team/BLAKE3/issues/54.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Closes https://github.com/BLAKE3-team/BLAKE3/issues/41.
|
|
One thing I like to test is that, if I hack simd_degree to be higher
than MAX_SIMD_DEGREE, assertions fire. This requires a test case long
enough to exceed that number of chunks.
|
|
|
|
|
|
Because compress_subtree_to_parent_node effectively cuts its input in
half, we can give it an input that's twice as big, without violating the
CV stack invariant.
|
|
|
|
|
|
|
|
For the Read and Write traits, this also allows the compiler to see that
the return value is always Ok, allowing it to remove the Err case from
the caller as dead code.
|
|
The generic constant_time_eq has several branches on the slice length,
which are not necessary when the slice length is known. However, the
optimizer is not allowed to look into the core of constant_time_eq, so
these branches cannot be elided.
Use instead a fixed-size variant of constant_time_eq, which has no
branches since the length is known.
|
|
We can't change the context used in test_vectors.json without breaking
people, but we can change the one in unit tests.
|
|
|