| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 2022-12-23 | try doing 512-bit loadskernel2 | Jack O'Connor | |
| 2022-12-17 | kernel2::parents_16 | Jack O'Connor | |
| 2022-12-17 | kernel2::chunks_16 | Jack O'Connor | |
| 2022-12-17 | bench_just_kernel2 | Jack O'Connor | |
| 2022-11-23 | try full transposition | Jack O'Connor | |
| 2022-11-23 | missing inlines | Jack O'Connor | |
| 2022-11-23 | xor_xof_16 | Jack O'Connor | |
| 2022-11-23 | correct the counter values | Jack O'Connor | |
| 2022-11-21 | WIP i don't remember what this is | Jack O'Connor | |
| 2022-10-10 | WIP kernel2 | Jack O'Connor | |
| 2022-04-09 | kernel_3d_16 and xof functionskernel | Jack O'Connor | |
| 2022-03-26 | xor_xof variants for the 2d kernel | Jack O'Connor | |
| 2022-03-20 | blake3_avx512_xof_stream_4 | Jack O'Connor | |
| 2022-03-20 | blake3_avx2_xof_stream_2 | Jack O'Connor | |
| 2022-03-20 | blake3_avx512_xof_stream_2 | Jack O'Connor | |
| 2022-03-20 | initial xof_stream functions | Jack O'Connor | |
| 2022-03-16 | rename kernel_1 to kernel2d_1 and add degree args | Jack O'Connor | |
| 2022-03-15 | generate blake3_{avx512,sse41,sse2}_compress with asm.py | Jack O'Connor | |
| 2022-03-11 | replace tail calls with jumps | Jack O'Connor | |
| 2022-03-11 | blake3_avx512_chunks_8 and blake3_avx512_parents_8 | Jack O'Connor | |
| 2022-03-09 | blake3_avx512_xof_xor_16 | Jack O'Connor | |
| 2022-03-09 | test unaligned writes | Jack O'Connor | |
| 2022-03-09 | broadcast the block length and domain flags inside blake3_avx512_kernel_16 | Jack O'Connor | |
| blake3_avx512_xof_stream_16 was also incorrectly hardcoding a block length of 64. The block length parameter is the *input* block length, which is independent of the output block length. (The output block length is not a compression function parameter.) | |||
| 2022-03-09 | move third row initialization into blake3_avx512_kernel_16 | Jack O'Connor | |
| 2022-03-09 | interleave the write ops in blake3_avx512_xor_stream_16 | Jack O'Connor | |
| This seems to give a small but consistent performance boost. | |||
| 2022-03-09 | blake3_avx512_xof_stream_16 | Jack O'Connor | |
| 2022-03-08 | split the left and right child CVs for blake3_avx512_parents_16 | Jack O'Connor | |
| There's no reason to force the caller to allocate them together. | |||
| 2022-03-08 | blake3_avx512_parents_16 | Jack O'Connor | |
| 2022-03-08 | use a memory argument for vpbroadcastd | Jack O'Connor | |
| 2022-03-08 | describe the transposition in comments | Jack O'Connor | |
| 2022-03-08 | now using only 3 scratch zmm registers | Jack O'Connor | |
| 2022-03-08 | interleave the first pass -- good performance | Jack O'Connor | |
| 2022-03-08 | try it with 4 times as many loads | Jack O'Connor | |
| 2022-03-08 | add a benchmark | Jack O'Connor | |
| 2022-03-08 | blake3_avx512_chunks_16 | Jack O'Connor | |
| 2022-03-08 | unroll the block loop and load the key | Jack O'Connor | |
| 2022-03-08 | correct the last two transposition passes | Jack O'Connor | |
| 2022-03-08 | nonzero message | Jack O'Connor | |
| 2022-03-08 | start working on a refactored assembly implementation | Jack O'Connor | |
| The main goal is to eventually have extended outputs benefit from the same SIMD optimizations as inputs. To make this easier, I want to factor out a shared "kernel" routine that can be shared among several different interfaces: - compressing chunks - compressing parents - producing XOF output - xor'ing XOF output The timing here partly coincides with Rust stabilizing inline asm. That's certainly not necessary for any of this to work, but it gives me the confidence to try this without needing to master the rules of three different calling conventions. | |||
| 2022-03-04 | add "(if any)" regarding keying in the security notes | Jack O'Connor | |
| 2022-03-03 | simplify a bit more | Jack O'Connor | |
| 2022-03-02 | simplify the security notes, avoid referring to entropy | Jack O'Connor | |
| 2022-03-02 | document the extended output security issue found by Aldo Gunsing | Jack O'Connor | |
| https://eprint.iacr.org/2022/283 | |||
| 2022-01-24 | check the HMAC output bytes | Jack O'Connor | |
| 2022-01-24 | Adds test | jbis9051 | |
| 2022-01-23 | Add blocksize trait | jbis9051 | |
| 2021-12-30 | a few more comment tweaks | Jack O'Connor | |
| 2021-12-30 | Update digest crate to 0.10 for traits-preview feature | Matthias Schiffer | |
| Adjust to the following changes that happened in digest: - The crypto-mac crate has been merged into digest (with "mac" feature enabled) - Various traits have been split up - The Digest and Mac traits now share their update/finalize/reset implementations - The BlockInput trait was dropped without replacement apparently (as long as the low-level core API is not used) | |||
| 2021-11-05 | fix incorrect output / undefined behavior in Windows SSE2 assembly | Jack O'Connor | |
| The SSE2 patch introduced xmm10 as a temporary register for one of the rotations, but xmm6-xmm15 are callee-save registers on Windows, and SSE4.1 was only saving the registers it used. The minimal fix is to use one of the saved registers instead of xmm10. See https://github.com/BLAKE3-team/BLAKE3/issues/206. | |||
| 2021-11-04 | add Hasher::count | Jack O'Connor | |
