aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-21Simple Inner Loop OptimzationloopoptRoland Paterson-Jones
Two simple loop optimizations. 1. Strength reduction of mul[tiplication] by loop induction variable. 2. Hoisting of (address) base into phi where loop induction variable is used only as a base (address) offset. Limited to loops with a single body block, which happily is always innermost loops. This restriction would not be very hard to lift - it would require detecting the set of loop blocks (and ensuring reducibility?) Limited to loop induction variables with 0 initial value and increment of 1 (for mul strength reduction). This limitation is trivial to lift; however all of the cproc/hare[c]/coremark opportunity is with 0/1 loops for mul reduction, and 0 initial value for base-offset opt.
2025-03-16minic: C23 doesn't allow bool as identifierHorst H. von Brand
Signed-off-by: Horst H. von Brand <[email protected]>
2025-03-15tools/test.sh: test the native architecture without QEMUAntonio Terceiro
While at it, extract most duplicated code across targets into a function.
2025-03-15tools/test.sh: allow running against installed packageAntonio Terceiro
If $bin is set in the environment, use it instead of using `qbe` from the source tree. The same for $binref. This supports the following use cases: - I have a qbe package installed, and I want to test my local changes with the installed packages as a reference: $ binref=/usr/bin/qbe ./tools/test.sh all - I want to test the installed qbe against new tests that I have written, to reproduce a bug: $ bin=/usr/bin/qbe ./tools/test.sh test/newtest.ssa In Debian, we also run tests against the installed package when dependencies change, etc. We will also run on several architectures where the necessary cross compilers might not be available. So make tests that cannot be run because of a missing compiler exit with 77, signaling to Debian's autopkgtest that the test is skipped.
2025-03-15Makefile: add explicit target to test the x86_64 backendAntonio Terceiro
When developing on an arm64 machine, it's useful to be able to test the x86_64 target.
2025-03-15arm64: use IP1 as scratch registerQuentin Carbonneaux
On Apple platforms x18 is not guaranteed to be preserved across context switches. So we now use IP1 as scratch register. En passant, one dubious use of IP0 in arm64/emit.c fixarg() was transitioned to IP1. I believe the previous code could clobber a user value if IP0 was live.
2025-03-1410 years of qbe!Quentin Carbonneaux
2025-03-14gvn/gcm reviewQuentin Carbonneaux
- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14Get rid of movins() infra.Roland Paterson-Jones
2025-03-14Global Value Numbering / Global Code MotionRoland Paterson-Jones
More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14Combine fillrpo() and fillpreds() into fillcfg().Roland Paterson-Jones
Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2025-03-14Simplify fillpreds()Roland Paterson-Jones
Now that b->pred is a vector we do can remove the counting pass.
2025-03-14Simplify fillrpo()Roland Paterson-Jones
Essentially use post-order as id, then reverse to rpo. Avoids needing f->nblk initially; slightly simpler logic.
2025-03-14Re-use (vgrow) b->ins vector in backend xxx_abi() fn's.Roland Paterson-Jones
Removes last re-allocation of b->ins.
2025-03-14idup(Ins **, Ins *, ulong) -> idup(Blk *, Ins *, ulong)Roland Paterson-Jones
Always used this way and factors setting b->nins. Makes b->ins vector contract more obvious.
2025-03-14Blk::ins is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in idup() which is called often in the optimisation chain. Blk::ins is reallocated in xxx_abi() - needs further fiddling.
2025-03-14Blk::pred is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in fillpred() which is called often in the optimisation chain.
2025-03-14Fn::rpo is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in fillrpo() which is called multiple times in the optimisation chain.
2024-12-19handle large hfas correctly on arm64Quentin Carbonneaux
2024-10-01fix various codegen bugs on arm64Quentin Carbonneaux
- dynamic allocations could generate bad 'and' instructions (for the and with -16 in salloc()). - symbols used in w context would generate adrp and add instructions on wN registers while they seem to only work on xN registers. Thanks to Rosie for reporting them.
2024-08-23skip preludes for some leaf fnsQuentin Carbonneaux
When rbp is not necessary to compile a leaf function, we skip saving and restoring it.
2024-08-15arm64/isel: Avoid signed overflow when handling immediatesAlexey Yerin
Clang incorrectly optimizes this negation with -O2 and causes QBE to emit 0 in place of INT64_MIN.
2024-08-15align emitted codeQuentin Carbonneaux
Functions are now aligned on 16-byte boundaries. This mimics gcc and should help reduce the maximum perf impact of cosmetic code changes. Previously, any change in the output of qbe could have far reaching implications on alignment. Thanks to Roland Paterson-Jones for pointing out the variability issue.
2024-06-19drop imul rewritingQuentin Carbonneaux
This was cute to do, but it is largely inconsequential, as shown by the rough timings below: benchmarking mul8_lea 3.9 ticks ± 0.88 (min: 3) benchmarking mul8_imul 3.3 ticks ± 0.27 (min: 3) benchmarking div8_udiv 6.5 ticks ± 0.52 (min: 6) benchmarking div8_shr 3.3 ticks ± 0.34 (min: 3)
2024-06-19no mul->shl as it confuses address matchingQuentin Carbonneaux
Additionally, the strength-reduction for small powers of two is handled by amd64/emit.c now.
2024-06-18cheaper mul by small constants on amd64Quentin Carbonneaux
2024-06-18simplify 8*x as well as x*8Quentin Carbonneaux
2024-06-17prevent bogus simplificationsQuentin Carbonneaux
2024-06-17qbe has its own magicQuentin Carbonneaux
2024-06-16fix unintended assignmentQuentin Carbonneaux
2024-06-16revert 4bc4c958Quentin Carbonneaux
Hopefully the right time now!
2024-06-16Simplify int mul/udiv/urem of 2^N into shl/shr/and.Roland Paterson-Jones
Passes the "standard" test suite. (cproc bootstrap, hare[c] make test, roland units, linpack/coremark run) However linpack benchmark is now notably slower. Coremark is ~2% faster. As noticed before, linmark timing is dubious, and maybe my cheap (AMD) laptop prefers mul to shl.
2024-06-09Optab-driven copy detectionRoland Paterson-Jones
2024-06-05relax one assertQuentin Carbonneaux
In this branch we only need that br[b->loop].b is defined. This is the case if b->loop >= n.
2024-05-28replace asm keywordErica Z
when applying a custom set of CFLAGS under clang that does not include -std=c99, asm is treated as a keyword and as such can not be used as an identifier. this prevents the issue by renaming the offending variables.
2024-05-03add width info for comparisonsQuentin Carbonneaux
Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.
2024-04-27function params must be uniqueQuentin Carbonneaux
2024-04-22revert 1b7770e271Quentin Carbonneaux
Quotes are used on Apple target variants to flag that we must not add the _ symbol prefix.
2024-04-13parse: use dynamically sized hashtable for temporariesMichael Forney
This significantly improves parsing performance for massive functions with a huge number of temporaries. Parsing the 86MiB IL produced by cproc during zig bootstrap drops from 17m15s to 2.5s (over 400x speedup). The speedup is much smaller for IL produced from normal non-autogenerated C code. Parsing the sqlite3 amalgamation drops from 0.40s to 0.33s.
2024-04-12add "make wc"Quentin Carbonneaux
2024-04-12drop unnecessary checkQuentin Carbonneaux
2024-04-12add common linkage for dataQuentin Carbonneaux
2024-04-11fold scaled offsets in addressesQuentin Carbonneaux
2024-04-11drop over-zealous offset accumulationQuentin Carbonneaux
2024-04-09use mgen in amd64/isel.cQuentin Carbonneaux
2024-04-09mgen: match automatons and C generationQuentin Carbonneaux
The algorithm to generate matchers took a long time to be discovered and refined to its present version. The rest of mgen is mostly boring engineering. Extensive fuzzing ensures that the two core components of mgen (tables and matchers generation) are correct on specific problem instances.
2024-04-09fuse ac rules in ins-tree matchingQuentin Carbonneaux
The initial plan was to have one matcher per ac-variant, but that leads to way too much generated code. Instead, we can fuse ac variants of the rules and have a smarter matching algorithm to recover bound variables.
2024-04-09does not look too goodQuentin Carbonneaux
2024-04-09modulo ac matching and more testsQuentin Carbonneaux
2024-04-09wip ins-tree matcherQuentin Carbonneaux