c9x.me/qbe.git - QBE - Backend Compiler

Age	Commit message (Collapse)	Author
13 days	Modify amd64 fixarg to fix calling constant addressesHEAD master	Richard McCormack
	On x86_64, direct calls are always PC-relative. This means that in order to call an absolute address, the call must be indirect. To accomplish this, update fixarg to introduce a temporary before emitting.
13 days	fix typo in simplcfg	Quentin Carbonneaux

13 days	drop dead preds in fixphisdev	Quentin Carbonneaux
	It is possible that GVN removes some dead blocks, this could lead to odd - but probably harmless - phi args appearing in the IL. This patch cleans things up during fillcfg().
13 days	new simplcfg pass	Quentin Carbonneaux
	Useful for ifopt to match more often. Empty blocks are fused and conditional jumps on empty blocks with the same successor (and no phis in the successor) are collapsed.
13 days	ifopt simplifications	Quentin Carbonneaux

13 days	If-conversion RFC 4 - x86 only (for now), use cmovXX	Roland Paterson-Jones
	Replacement of tiny conditional jump graphlets with conditional move instructions. Currently enabled only for x86. Arm64 support using cselXX will be essentially identical. Adds (internal) frontend sel0/sel1 ops with flag-specific backend xselXX following jnz implementation pattern. Testing: standard QBE, cproc, harec, hare, roland
13 days	update copyright years	Quentin Carbonneaux

13 days	rv64: handle slots in jnz	Quentin Carbonneaux

13 days	fix jmp arg spilling	Quentin Carbonneaux
	In case we need to spill to accomodate for the jump argument, piggyback the reloads from slots to regalloc so that they can be correctly inserted on edges.
2026-01-06	please as with truncated constants	Quentin Carbonneaux
	Apple's assembler actually hard crashed on overflows.
2026-01-06	arm64_apple: fix argxbh support	Quentin Carbonneaux

2026-01-06	arm64: prevent bogus IP1 clobbers	Quentin Carbonneaux

2026-01-05	rv64: fix invalid float immediates	Quentin Carbonneaux
	Thanks to Luke Graham for reporting and fixing this issue.
2025-05-30	skip deleted phis in use width scan	Quentin Carbonneaux

2025-04-16	fix fp constants on big endian hosts	Quentin Carbonneaux

2025-03-16	minic: C23 doesn't allow bool as identifier	Horst H. von Brand
	Signed-off-by: Horst H. von Brand <[email protected]>
2025-03-15	tools/test.sh: test the native architecture without QEMU	Antonio Terceiro
	While at it, extract most duplicated code across targets into a function.
2025-03-15	tools/test.sh: allow running against installed package	Antonio Terceiro
	If $bin is set in the environment, use it instead of using `qbe` from the source tree. The same for $binref. This supports the following use cases: - I have a qbe package installed, and I want to test my local changes with the installed packages as a reference: $ binref=/usr/bin/qbe ./tools/test.sh all - I want to test the installed qbe against new tests that I have written, to reproduce a bug: $ bin=/usr/bin/qbe ./tools/test.sh test/newtest.ssa In Debian, we also run tests against the installed package when dependencies change, etc. We will also run on several architectures where the necessary cross compilers might not be available. So make tests that cannot be run because of a missing compiler exit with 77, signaling to Debian's autopkgtest that the test is skipped.
2025-03-15	Makefile: add explicit target to test the x86_64 backend	Antonio Terceiro
	When developing on an arm64 machine, it's useful to be able to test the x86_64 target.
2025-03-15	arm64: use IP1 as scratch register	Quentin Carbonneaux
	On Apple platforms x18 is not guaranteed to be preserved across context switches. So we now use IP1 as scratch register. En passant, one dubious use of IP0 in arm64/emit.c fixarg() was transitioned to IP1. I believe the previous code could clobber a user value if IP0 was live.
2025-03-14	10 years of qbe!	Quentin Carbonneaux

2025-03-14	gvn/gcm review	Quentin Carbonneaux
	- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14	Get rid of movins() infra.	Roland Paterson-Jones

2025-03-14	Global Value Numbering / Global Code Motion	Roland Paterson-Jones
	More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14	Combine fillrpo() and fillpreds() into fillcfg().	Roland Paterson-Jones
	Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2025-03-14	Simplify fillpreds()	Roland Paterson-Jones
	Now that b->pred is a vector we do can remove the counting pass.
2025-03-14	Simplify fillrpo()	Roland Paterson-Jones
	Essentially use post-order as id, then reverse to rpo. Avoids needing f->nblk initially; slightly simpler logic.
2025-03-14	Re-use (vgrow) b->ins vector in backend xxx_abi() fn's.	Roland Paterson-Jones
	Removes last re-allocation of b->ins.
2025-03-14	idup(Ins *, Ins , ulong) -> idup(Blk , Ins , ulong)	Roland Paterson-Jones
	Always used this way and factors setting b->nins. Makes b->ins vector contract more obvious.
2025-03-14	Blk::ins is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in idup() which is called often in the optimisation chain. Blk::ins is reallocated in xxx_abi() - needs further fiddling.
2025-03-14	Blk::pred is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in fillpred() which is called often in the optimisation chain.
2025-03-14	Fn::rpo is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in fillrpo() which is called multiple times in the optimisation chain.
2024-12-19	handle large hfas correctly on arm64	Quentin Carbonneaux

2024-10-01	fix various codegen bugs on arm64	Quentin Carbonneaux
	- dynamic allocations could generate bad 'and' instructions (for the and with -16 in salloc()). - symbols used in w context would generate adrp and add instructions on wN registers while they seem to only work on xN registers. Thanks to Rosie for reporting them.
2024-08-23	skip preludes for some leaf fns	Quentin Carbonneaux
	When rbp is not necessary to compile a leaf function, we skip saving and restoring it.
2024-08-15	arm64/isel: Avoid signed overflow when handling immediates	Alexey Yerin
	Clang incorrectly optimizes this negation with -O2 and causes QBE to emit 0 in place of INT64_MIN.
2024-08-15	align emitted code	Quentin Carbonneaux
	Functions are now aligned on 16-byte boundaries. This mimics gcc and should help reduce the maximum perf impact of cosmetic code changes. Previously, any change in the output of qbe could have far reaching implications on alignment. Thanks to Roland Paterson-Jones for pointing out the variability issue.
2024-06-19	drop imul rewriting	Quentin Carbonneaux
	This was cute to do, but it is largely inconsequential, as shown by the rough timings below: benchmarking mul8_lea 3.9 ticks ± 0.88 (min: 3) benchmarking mul8_imul 3.3 ticks ± 0.27 (min: 3) benchmarking div8_udiv 6.5 ticks ± 0.52 (min: 6) benchmarking div8_shr 3.3 ticks ± 0.34 (min: 3)
2024-06-19	no mul->shl as it confuses address matching	Quentin Carbonneaux
	Additionally, the strength-reduction for small powers of two is handled by amd64/emit.c now.
2024-06-18	cheaper mul by small constants on amd64	Quentin Carbonneaux

2024-06-18	simplify 8x as well as x8	Quentin Carbonneaux

2024-06-17	prevent bogus simplifications	Quentin Carbonneaux

2024-06-17	qbe has its own magic	Quentin Carbonneaux

2024-06-16	fix unintended assignment	Quentin Carbonneaux

2024-06-16	revert 4bc4c958	Quentin Carbonneaux
	Hopefully the right time now!
2024-06-16	Simplify int mul/udiv/urem of 2^N into shl/shr/and.	Roland Paterson-Jones
	Passes the "standard" test suite. (cproc bootstrap, hare[c] make test, roland units, linpack/coremark run) However linpack benchmark is now notably slower. Coremark is ~2% faster. As noticed before, linmark timing is dubious, and maybe my cheap (AMD) laptop prefers mul to shl.
2024-06-09	Optab-driven copy detection	Roland Paterson-Jones

2024-06-05	relax one assert	Quentin Carbonneaux
	In this branch we only need that br[b->loop].b is defined. This is the case if b->loop >= n.
2024-05-28	replace asm keyword	Erica Z
	when applying a custom set of CFLAGS under clang that does not include -std=c99, asm is treated as a keyword and as such can not be used as an identifier. this prevents the issue by renaming the offending variables.
2024-05-03	add width info for comparisons	Quentin Carbonneaux
	Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.