diff options
| author | Samuel Neves <[email protected]> | 2020-08-31 18:56:55 +0100 |
|---|---|---|
| committer | GitHub <[email protected]> | 2020-08-31 18:56:55 +0100 |
| commit | 3340e32c7f4d89f7544b34b2b2924c5dbe5ee258 (patch) | |
| tree | eed1563c7d24e27423dbdf42c37aac123d003ace /c/README.md | |
| parent | 324090b2c30b88463917e133f456af0d45ccbea9 (diff) | |
| parent | be2da69b6b293764867c42fcbc278627271d9710 (diff) | |
Merge pull request #110 from mkrupcale/sse2
Add SSE2 implementations
Diffstat (limited to 'c/README.md')
| -rw-r--r-- | c/README.md | 17 |
1 files changed, 10 insertions, 7 deletions
diff --git a/c/README.md b/c/README.md index b1b18a4..c66c66a 100644 --- a/c/README.md +++ b/c/README.md @@ -40,7 +40,8 @@ with a Unix-like OS, you can compile a working binary like this: ```bash gcc -O3 -o example example.c blake3.c blake3_dispatch.c blake3_portable.c \ - blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S + blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \ + blake3_avx512_x86-64_unix.S ``` # API @@ -143,8 +144,8 @@ by hand. Note that these steps may change in future versions. Dynamic dispatch is enabled by default on x86. The implementation will query the CPU at runtime to detect SIMD support, and it will use the widest instruction set available. By default, `blake3_dispatch.c` -expects to be linked with code for four different instruction sets: -portable C, SSE4.1, AVX2, and AVX-512. +expects to be linked with code for five different instruction sets: +portable C, SSE2, SSE4.1, AVX2, and AVX-512. For each of the x86 SIMD instruction sets, two versions are available, one in assembly (with three flavors: Unix, Windows MSVC, and Windows @@ -159,7 +160,8 @@ the assembly implementations: ```bash gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \ - blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S blake3_avx512_x86-64_unix.S + blake3_sse2_x86-64_unix.S blake3_sse41_x86-64_unix.S blake3_avx2_x86-64_unix.S \ + blake3_avx512_x86-64_unix.S ``` When building the intrinsics-based implementations, you need to build @@ -168,11 +170,12 @@ explicitly enabled in the compiler. Here's the same shared library using the intrinsics-based implementations: ```bash +gcc -c -fPIC -O3 -msse2 blake3_sse2.c -o blake3_sse2.o gcc -c -fPIC -O3 -msse4.1 blake3_sse41.c -o blake3_sse41.o gcc -c -fPIC -O3 -mavx2 blake3_avx2.c -o blake3_avx2.o gcc -c -fPIC -O3 -mavx512f -mavx512vl blake3_avx512.c -o blake3_avx512.o gcc -shared -O3 -o libblake3.so blake3.c blake3_dispatch.c blake3_portable.c \ - blake3_avx2.o blake3_avx512.o blake3_sse41.o + blake3_avx2.o blake3_avx512.o blake3_sse41.o blake3_sse2.o ``` Note above that building `blake3_avx512.c` requires both `-mavx512f` and @@ -186,8 +189,8 @@ each instruction set. Here's an example of building a shared library on x86 with only portable code: ```bash -gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 -DBLAKE3_NO_AVX512 \ - blake3.c blake3_dispatch.c blake3_portable.c +gcc -shared -O3 -o libblake3.so -DBLAKE3_NO_SSE2 -DBLAKE3_NO_SSE41 -DBLAKE3_NO_AVX2 \ + -DBLAKE3_NO_AVX512 blake3.c blake3_dispatch.c blake3_portable.c ``` ## ARM NEON |
