aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
13 daysModify amd64 fixarg to fix calling constant addressesHEADmasterRichard McCormack
On x86_64, direct calls are always PC-relative. This means that in order to call an absolute address, the call must be indirect. To accomplish this, update fixarg to introduce a temporary before emitting.
13 daysfix typo in simplcfgQuentin Carbonneaux
13 daysdrop dead preds in fixphisdevQuentin Carbonneaux
It is possible that GVN removes some dead blocks, this could lead to odd - but probably harmless - phi args appearing in the IL. This patch cleans things up during fillcfg().
13 daysnew simplcfg passQuentin Carbonneaux
Useful for ifopt to match more often. Empty blocks are fused and conditional jumps on empty blocks with the same successor (and no phis in the successor) are collapsed.
13 daysifopt simplificationsQuentin Carbonneaux
13 daysIf-conversion RFC 4 - x86 only (for now), use cmovXXRoland Paterson-Jones
Replacement of tiny conditional jump graphlets with conditional move instructions. Currently enabled only for x86. Arm64 support using cselXX will be essentially identical. Adds (internal) frontend sel0/sel1 ops with flag-specific backend xselXX following jnz implementation pattern. Testing: standard QBE, cproc, harec, hare, roland
13 daysupdate copyright yearsQuentin Carbonneaux
13 daysrv64: handle slots in jnzQuentin Carbonneaux
13 daysfix jmp arg spillingQuentin Carbonneaux
In case we need to spill to accomodate for the jump argument, piggyback the reloads from slots to regalloc so that they can be correctly inserted on edges.
2026-01-06please as with truncated constantsQuentin Carbonneaux
Apple's assembler actually hard crashed on overflows.
2026-01-06arm64_apple: fix argxbh supportQuentin Carbonneaux
2026-01-06arm64: prevent bogus IP1 clobbersQuentin Carbonneaux
2026-01-05rv64: fix invalid float immediatesQuentin Carbonneaux
Thanks to Luke Graham for reporting and fixing this issue.
2025-05-30skip deleted phis in use width scanQuentin Carbonneaux
2025-04-16fix fp constants on big endian hostsQuentin Carbonneaux
2025-03-16minic: C23 doesn't allow bool as identifierHorst H. von Brand
Signed-off-by: Horst H. von Brand <[email protected]>
2025-03-15tools/test.sh: test the native architecture without QEMUAntonio Terceiro
While at it, extract most duplicated code across targets into a function.
2025-03-15tools/test.sh: allow running against installed packageAntonio Terceiro
If $bin is set in the environment, use it instead of using `qbe` from the source tree. The same for $binref. This supports the following use cases: - I have a qbe package installed, and I want to test my local changes with the installed packages as a reference: $ binref=/usr/bin/qbe ./tools/test.sh all - I want to test the installed qbe against new tests that I have written, to reproduce a bug: $ bin=/usr/bin/qbe ./tools/test.sh test/newtest.ssa In Debian, we also run tests against the installed package when dependencies change, etc. We will also run on several architectures where the necessary cross compilers might not be available. So make tests that cannot be run because of a missing compiler exit with 77, signaling to Debian's autopkgtest that the test is skipped.
2025-03-15Makefile: add explicit target to test the x86_64 backendAntonio Terceiro
When developing on an arm64 machine, it's useful to be able to test the x86_64 target.
2025-03-15arm64: use IP1 as scratch registerQuentin Carbonneaux
On Apple platforms x18 is not guaranteed to be preserved across context switches. So we now use IP1 as scratch register. En passant, one dubious use of IP0 in arm64/emit.c fixarg() was transitioned to IP1. I believe the previous code could clobber a user value if IP0 was live.
2025-03-1410 years of qbe!Quentin Carbonneaux
2025-03-14gvn/gcm reviewQuentin Carbonneaux
- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14Get rid of movins() infra.Roland Paterson-Jones
2025-03-14Global Value Numbering / Global Code MotionRoland Paterson-Jones
More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14Combine fillrpo() and fillpreds() into fillcfg().Roland Paterson-Jones
Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2025-03-14Simplify fillpreds()Roland Paterson-Jones
Now that b->pred is a vector we do can remove the counting pass.
2025-03-14Simplify fillrpo()Roland Paterson-Jones
Essentially use post-order as id, then reverse to rpo. Avoids needing f->nblk initially; slightly simpler logic.
2025-03-14Re-use (vgrow) b->ins vector in backend xxx_abi() fn's.Roland Paterson-Jones
Removes last re-allocation of b->ins.
2025-03-14idup(Ins **, Ins *, ulong) -> idup(Blk *, Ins *, ulong)Roland Paterson-Jones
Always used this way and factors setting b->nins. Makes b->ins vector contract more obvious.
2025-03-14Blk::ins is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in idup() which is called often in the optimisation chain. Blk::ins is reallocated in xxx_abi() - needs further fiddling.
2025-03-14Blk::pred is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in fillpred() which is called often in the optimisation chain.
2025-03-14Fn::rpo is a vectorRoland Paterson-Jones
Scratching an itch - avoid unnecesary re-allocation in fillrpo() which is called multiple times in the optimisation chain.
2024-12-19handle large hfas correctly on arm64Quentin Carbonneaux
2024-10-01fix various codegen bugs on arm64Quentin Carbonneaux
- dynamic allocations could generate bad 'and' instructions (for the and with -16 in salloc()). - symbols used in w context would generate adrp and add instructions on wN registers while they seem to only work on xN registers. Thanks to Rosie for reporting them.
2024-08-23skip preludes for some leaf fnsQuentin Carbonneaux
When rbp is not necessary to compile a leaf function, we skip saving and restoring it.
2024-08-15arm64/isel: Avoid signed overflow when handling immediatesAlexey Yerin
Clang incorrectly optimizes this negation with -O2 and causes QBE to emit 0 in place of INT64_MIN.
2024-08-15align emitted codeQuentin Carbonneaux
Functions are now aligned on 16-byte boundaries. This mimics gcc and should help reduce the maximum perf impact of cosmetic code changes. Previously, any change in the output of qbe could have far reaching implications on alignment. Thanks to Roland Paterson-Jones for pointing out the variability issue.
2024-06-19drop imul rewritingQuentin Carbonneaux
This was cute to do, but it is largely inconsequential, as shown by the rough timings below: benchmarking mul8_lea 3.9 ticks ± 0.88 (min: 3) benchmarking mul8_imul 3.3 ticks ± 0.27 (min: 3) benchmarking div8_udiv 6.5 ticks ± 0.52 (min: 6) benchmarking div8_shr 3.3 ticks ± 0.34 (min: 3)
2024-06-19no mul->shl as it confuses address matchingQuentin Carbonneaux
Additionally, the strength-reduction for small powers of two is handled by amd64/emit.c now.
2024-06-18cheaper mul by small constants on amd64Quentin Carbonneaux
2024-06-18simplify 8*x as well as x*8Quentin Carbonneaux
2024-06-17prevent bogus simplificationsQuentin Carbonneaux
2024-06-17qbe has its own magicQuentin Carbonneaux
2024-06-16fix unintended assignmentQuentin Carbonneaux
2024-06-16revert 4bc4c958Quentin Carbonneaux
Hopefully the right time now!
2024-06-16Simplify int mul/udiv/urem of 2^N into shl/shr/and.Roland Paterson-Jones
Passes the "standard" test suite. (cproc bootstrap, hare[c] make test, roland units, linpack/coremark run) However linpack benchmark is now notably slower. Coremark is ~2% faster. As noticed before, linmark timing is dubious, and maybe my cheap (AMD) laptop prefers mul to shl.
2024-06-09Optab-driven copy detectionRoland Paterson-Jones
2024-06-05relax one assertQuentin Carbonneaux
In this branch we only need that br[b->loop].b is defined. This is the case if b->loop >= n.
2024-05-28replace asm keywordErica Z
when applying a custom set of CFLAGS under clang that does not include -std=c99, asm is treated as a keyword and as such can not be used as an identifier. this prevents the issue by renaming the offending variables.
2024-05-03add width info for comparisonsQuentin Carbonneaux
Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.