aboutsummaryrefslogtreecommitdiff
path: root/src/lib.rs
AgeCommit message (Collapse)Author
2020-09-10cargo fmtJack O'Connor
2020-08-31Merge pull request #110 from mkrupcale/sse2Samuel Neves
Add SSE2 implementations
2020-08-31Implement `fmt::Debug` using buildersNikolai Vazquez
This enables pretty printing via `{:#?}`. The normal style for `{:?}` is kept exactly the same.
2020-08-24Start SSE2 implementation based on SSE4.1 versionMatthew Krupcale
Wire up basic functions and features for SSE2 support using the SSE4.1 version as a basis without implementing the SSE2 instructions yet. * Cargo.toml: add no_sse2 feature * benches/bench.rs: wire SSE2 benchmarks * build.rs: add SSE2 rust intrinsics and assembly builds * c/Makefile.testing: add SSE2 C and assembly targets * c/README.md: add SSE2 to C build instructions * c/blake3_c_rust_bindings/build.rs: add SSE2 C rust binding builds * c/blake3_c_rust_bindings/src/lib.rs: add SSE2 C rust bindings * c/blake3_dispatch.c: add SSE2 C dispatch * c/blake3_impl.h: add SSE2 C function prototypes * c/blake3_sse2.c: add SSE2 C intrinsic file starting with SSE4.1 version * c/blake3_sse2_x86-64_{unix.S,windows_gnu.S,windows_msvc.asm}: add SSE2 assembly files starting with SSE4.1 version * src/ffi_sse2.rs: add rust implementation using SSE2 C rust bindings * src/lib.rs: add SSE2 rust intrinsics and SSE2 C rust binding rust SSE2 module configurations * src/platform.rs: add SSE2 rust platform detection and dispatch * src/rust_sse2.rs: add SSE2 rust intrinsic file starting with SSE4.1 version * tools/instruction_set_support/src/main.rs: add SSE2 feature detection
2020-08-14the same hex example for rustdocsJack O'Connor
2020-06-26shrink a stack array that's twice as big as it needs to beJack O'Connor
It looks like I originally made this mistake when I was copying code from the baokeshed prototype (a274a9b0faa444dd842a0584483eae6e97dbf21e), and then it got replicated into the C implementation later.
2020-05-23fix another small mistake in the docsJack O'Connor
2020-04-01automatically fall back to the pure Rust buildJack O'Connor
There are two scenarios where compiling AVX-512 C or assembly code might not work: 1. There might not be a C compiler installed at all. Most commonly this is either in cross-compiling situations, or with the Windows GNU target. 2. The installed C compiler might not support e.g. -mavx512f, because it's too old. In both of these cases, print a relevant warning, and then automatically fall back to using the pure Rust intrinsics build. Note that this only affects x86 targets. Other targets always use pure Rust, unless the "neon" feature is enabled.
2020-03-29refactor the Cargo feature setJack O'Connor
The biggest change here is that assembly implementations are enabled by default. Added features: - "pure" (Pure Rust, with no C or assembly implementations.) Removed features: - "c" (Now basically the default.) Renamed features; - "c_prefer_intrinsics" -> "prefer_intrinsics" - "c_neon" -> "neon" Unchanged: - "rayon" - "std" (Still the only feature on by default.)
2020-03-05add an example of parsing a Hash from a hex stringJack O'Connor
Suggested by @zaynetro: https://github.com/BLAKE3-team/BLAKE3/pull/24#issuecomment-594369061
2020-02-27some comment typosJack O'Connor
2020-02-25remove a mis-optimization that hurt performance for uneven updatesJack O'Connor
If the total number of chunks hashed so far is e.g. 1, and update() is called with e.g. 8 more chunks, we can't compress all 8 together. We have to break the input up, to make sure that that 1 lone chunk CV gets merged with its proper sibling, and that in general the correct layout of the tree is preserved. What we should do is hash 1-2-4-1 chunks of input, using increasing powers of 2 (with some cleanup at the end). What we were doing was 2-2-2-2 chunks. This was the result of a mistaken optimization that got us stuck with an always-odd number of chunks so far. Fixes https://github.com/BLAKE3-team/BLAKE3/issues/69.
2020-02-12add a performance note and a usage example for HasherJack O'Connor
2020-02-12document optional Cargo features on docs.rsJack O'Connor
2020-02-12integrate assembly implementations into the blake3 crateJack O'Connor
2020-02-06Hasher::update_with_joinJack O'Connor
This is a new interface that allows the caller to provide a multi-threading implementation. It's defined in terms of a new `Join` trait, for which we provide two implementations, `SerialJoin` and `RayonJoin`. This lets the caller control when multi-threading is used, rather than the previous all-or-nothing design of the "rayon" feature. Although existing callers should keep working, this is a compatibility break, because callers who were relying on automatic multi-threading before will now be single-threaded. Thus the next release of this crate will need to be version 0.2. See https://github.com/BLAKE3-team/BLAKE3/issues/25 and https://github.com/BLAKE3-team/BLAKE3/issues/54.
2020-02-04re-export digest and crypto_macJack O'Connor
2020-02-03Inline wrapper methodsCesar Eduardo Barros
2020-02-03make the inherent reset() method return &mut selfJack O'Connor
2020-02-03implement crypto_mac::MacJack O'Connor
2020-02-02mention the digest traits in the docsJack O'Connor
2020-02-02implement traits from the digest crateJack O'Connor
2020-02-02add Hasher::resetJack O'Connor
Closes https://github.com/BLAKE3-team/BLAKE3/issues/41.
2020-01-21expand comments about lazy mergingJack O'Connor
2020-01-21stack size in the optimized impl should be MAX_DEPTH + 1Jack O'Connor
2020-01-20double the maximum incremental subtree sizeJack O'Connor
Because compress_subtree_to_parent_node effectively cuts its input in half, we can give it an input that's twice as big, without violating the CV stack invariant.
2020-01-18comment about parallelismJack O'Connor
2020-01-12Inline trivial functionsCesar Eduardo Barros
For the Read and Write traits, this also allows the compiler to see that the return value is always Ok, allowing it to remove the Err case from the caller as dead code.
2020-01-12Use fixed-size constant_time_eqCesar Eduardo Barros
The generic constant_time_eq has several branches on the slice length, which are not necessary when the slice length is known. However, the optimizer is not allowed to look into the core of constant_time_eq, so these branches cannot be elided. Use instead a fixed-size variant of constant_time_eq, which has no branches since the length is known.
2020-01-09test_msg_schedule_permutationJack O'Connor
2020-01-08code commentJP Aumasson
2020-01-07simplify the docs exampleJack O'Connor
2020-01-05switch to the new permutationsJack O'Connor
2020-01-05warn not to use derive_key with passwordsJack O'Connor
2019-12-29add the guts module to share code with BaoJack O'Connor
2019-12-28make derive_key take a key of any lengthJack O'Connor
The previous version of this API called for a key of exactly 256 bits. That's good for optimal performance, but it would mean losing the use-with-other-algorithms property for applications whose input keys are a different size. There's no way for an abstraction over the previous version to provide reliable domain separation for the "extract" step.
2019-12-14docs tweaksJack O'Connor
2019-12-13fix the doc tests buildJack O'Connor
2019-12-13expand the docsJack O'Connor
2019-12-12update MAX_DEPTHJack O'Connor
2019-12-12rename "offset" to "counter" and always increment it by 1Jack O'Connor
This is simpler than sometimes incrementing by CHUNK_LEN and other times incrementing by BLOCK_LEN.
2019-12-12reduce the CHUNK_LEN from 2048 bytes to 1024 bytesJack O'Connor
Smaller chunk sizes are a big benefit for parallelism at shorter input lengths, and recent benchmarks show that this reduction has a relative small cost in terms of peak throughput. It's also a nice round number.
2019-12-12make the "c_avx512" feature a no-op on non-x86Jack O'Connor
This lets us enable it by default in b3sum.
2019-12-12struct OutputReaderJack O'Connor
2019-12-11switch to representing CVs as words for the compression functionJack O'Connor
The portable implementation was getting slowed down by converting back and forth between words and bytes. I made the corresponding change on the C side first (https://github.com/veorq/BLAKE3-c/commit/12a37be8b50922a358c016ba07f46816a3da4a31), and as part of this commit I'm re-vendoring the C code. I'm also exposing a small FFI interface to C so that blake3_neon.c can link against portable.rs rather than blake3_portable.c, see c_neon.rs.
2019-12-11test against test_vectors.json in CIJack O'Connor
2019-12-08add Rust FFI wrappers for AVX-512 and NEONJack O'Connor
2019-12-07fix a bad assertJack O'Connor
This would fire (incorrectly) on platforms where MAX_SIMD_DEGREE=1.
2019-12-07add the OffsetDeltas type aliasJack O'Connor
I'm about to add C integration for AVX-512 and NEON, and this matches better what the C code is doing.
2019-12-06add bench.rsJack O'Connor