c9x.me/qbe.git - QBE - Backend Compiler

Age	Commit message (Collapse)	Author
2025-03-21	Simple Inner Loop Optimzationloopopt	Roland Paterson-Jones
	Two simple loop optimizations. 1. Strength reduction of mul[tiplication] by loop induction variable. 2. Hoisting of (address) base into phi where loop induction variable is used only as a base (address) offset. Limited to loops with a single body block, which happily is always innermost loops. This restriction would not be very hard to lift - it would require detecting the set of loop blocks (and ensuring reducibility?) Limited to loop induction variables with 0 initial value and increment of 1 (for mul strength reduction). This limitation is trivial to lift; however all of the cproc/hare[c]/coremark opportunity is with 0/1 loops for mul reduction, and 0 initial value for base-offset opt.
2025-03-16	minic: C23 doesn't allow bool as identifier	Horst H. von Brand
	Signed-off-by: Horst H. von Brand <[email protected]>
2025-03-15	tools/test.sh: test the native architecture without QEMU	Antonio Terceiro
	While at it, extract most duplicated code across targets into a function.
2025-03-15	tools/test.sh: allow running against installed package	Antonio Terceiro
	If $bin is set in the environment, use it instead of using `qbe` from the source tree. The same for $binref. This supports the following use cases: - I have a qbe package installed, and I want to test my local changes with the installed packages as a reference: $ binref=/usr/bin/qbe ./tools/test.sh all - I want to test the installed qbe against new tests that I have written, to reproduce a bug: $ bin=/usr/bin/qbe ./tools/test.sh test/newtest.ssa In Debian, we also run tests against the installed package when dependencies change, etc. We will also run on several architectures where the necessary cross compilers might not be available. So make tests that cannot be run because of a missing compiler exit with 77, signaling to Debian's autopkgtest that the test is skipped.
2025-03-15	Makefile: add explicit target to test the x86_64 backend	Antonio Terceiro
	When developing on an arm64 machine, it's useful to be able to test the x86_64 target.
2025-03-15	arm64: use IP1 as scratch register	Quentin Carbonneaux
	On Apple platforms x18 is not guaranteed to be preserved across context switches. So we now use IP1 as scratch register. En passant, one dubious use of IP0 in arm64/emit.c fixarg() was transitioned to IP1. I believe the previous code could clobber a user value if IP0 was live.
2025-03-14	10 years of qbe!	Quentin Carbonneaux

2025-03-14	gvn/gcm review	Quentin Carbonneaux
	- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14	Get rid of movins() infra.	Roland Paterson-Jones

2025-03-14	Global Value Numbering / Global Code Motion	Roland Paterson-Jones
	More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14	Combine fillrpo() and fillpreds() into fillcfg().	Roland Paterson-Jones
	Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2025-03-14	Simplify fillpreds()	Roland Paterson-Jones
	Now that b->pred is a vector we do can remove the counting pass.
2025-03-14	Simplify fillrpo()	Roland Paterson-Jones
	Essentially use post-order as id, then reverse to rpo. Avoids needing f->nblk initially; slightly simpler logic.
2025-03-14	Re-use (vgrow) b->ins vector in backend xxx_abi() fn's.	Roland Paterson-Jones
	Removes last re-allocation of b->ins.
2025-03-14	idup(Ins *, Ins , ulong) -> idup(Blk , Ins , ulong)	Roland Paterson-Jones
	Always used this way and factors setting b->nins. Makes b->ins vector contract more obvious.
2025-03-14	Blk::ins is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in idup() which is called often in the optimisation chain. Blk::ins is reallocated in xxx_abi() - needs further fiddling.
2025-03-14	Blk::pred is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in fillpred() which is called often in the optimisation chain.
2025-03-14	Fn::rpo is a vector	Roland Paterson-Jones
	Scratching an itch - avoid unnecesary re-allocation in fillrpo() which is called multiple times in the optimisation chain.
2024-12-19	handle large hfas correctly on arm64	Quentin Carbonneaux

2024-10-01	fix various codegen bugs on arm64	Quentin Carbonneaux
	- dynamic allocations could generate bad 'and' instructions (for the and with -16 in salloc()). - symbols used in w context would generate adrp and add instructions on wN registers while they seem to only work on xN registers. Thanks to Rosie for reporting them.
2024-08-23	skip preludes for some leaf fns	Quentin Carbonneaux
	When rbp is not necessary to compile a leaf function, we skip saving and restoring it.
2024-08-15	arm64/isel: Avoid signed overflow when handling immediates	Alexey Yerin
	Clang incorrectly optimizes this negation with -O2 and causes QBE to emit 0 in place of INT64_MIN.
2024-08-15	align emitted code	Quentin Carbonneaux
	Functions are now aligned on 16-byte boundaries. This mimics gcc and should help reduce the maximum perf impact of cosmetic code changes. Previously, any change in the output of qbe could have far reaching implications on alignment. Thanks to Roland Paterson-Jones for pointing out the variability issue.
2024-06-19	drop imul rewriting	Quentin Carbonneaux
	This was cute to do, but it is largely inconsequential, as shown by the rough timings below: benchmarking mul8_lea 3.9 ticks ± 0.88 (min: 3) benchmarking mul8_imul 3.3 ticks ± 0.27 (min: 3) benchmarking div8_udiv 6.5 ticks ± 0.52 (min: 6) benchmarking div8_shr 3.3 ticks ± 0.34 (min: 3)
2024-06-19	no mul->shl as it confuses address matching	Quentin Carbonneaux
	Additionally, the strength-reduction for small powers of two is handled by amd64/emit.c now.
2024-06-18	cheaper mul by small constants on amd64	Quentin Carbonneaux

2024-06-18	simplify 8x as well as x8	Quentin Carbonneaux

2024-06-17	prevent bogus simplifications	Quentin Carbonneaux

2024-06-17	qbe has its own magic	Quentin Carbonneaux

2024-06-16	fix unintended assignment	Quentin Carbonneaux

2024-06-16	revert 4bc4c958	Quentin Carbonneaux
	Hopefully the right time now!
2024-06-16	Simplify int mul/udiv/urem of 2^N into shl/shr/and.	Roland Paterson-Jones
	Passes the "standard" test suite. (cproc bootstrap, hare[c] make test, roland units, linpack/coremark run) However linpack benchmark is now notably slower. Coremark is ~2% faster. As noticed before, linmark timing is dubious, and maybe my cheap (AMD) laptop prefers mul to shl.
2024-06-09	Optab-driven copy detection	Roland Paterson-Jones

2024-06-05	relax one assert	Quentin Carbonneaux
	In this branch we only need that br[b->loop].b is defined. This is the case if b->loop >= n.
2024-05-28	replace asm keyword	Erica Z
	when applying a custom set of CFLAGS under clang that does not include -std=c99, asm is treated as a keyword and as such can not be used as an identifier. this prevents the issue by renaming the offending variables.
2024-05-03	add width info for comparisons	Quentin Carbonneaux
	Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.
2024-04-27	function params must be unique	Quentin Carbonneaux

2024-04-22	revert 1b7770e271	Quentin Carbonneaux
	Quotes are used on Apple target variants to flag that we must not add the _ symbol prefix.
2024-04-13	parse: use dynamically sized hashtable for temporaries	Michael Forney
	This significantly improves parsing performance for massive functions with a huge number of temporaries. Parsing the 86MiB IL produced by cproc during zig bootstrap drops from 17m15s to 2.5s (over 400x speedup). The speedup is much smaller for IL produced from normal non-autogenerated C code. Parsing the sqlite3 amalgamation drops from 0.40s to 0.33s.
2024-04-12	add "make wc"	Quentin Carbonneaux

2024-04-12	drop unnecessary check	Quentin Carbonneaux

2024-04-12	add common linkage for data	Quentin Carbonneaux

2024-04-11	fold scaled offsets in addresses	Quentin Carbonneaux

2024-04-11	drop over-zealous offset accumulation	Quentin Carbonneaux

2024-04-09	use mgen in amd64/isel.c	Quentin Carbonneaux

2024-04-09	mgen: match automatons and C generation	Quentin Carbonneaux
	The algorithm to generate matchers took a long time to be discovered and refined to its present version. The rest of mgen is mostly boring engineering. Extensive fuzzing ensures that the two core components of mgen (tables and matchers generation) are correct on specific problem instances.
2024-04-09	fuse ac rules in ins-tree matching	Quentin Carbonneaux
	The initial plan was to have one matcher per ac-variant, but that leads to way too much generated code. Instead, we can fuse ac variants of the rules and have a smarter matching algorithm to recover bound variables.
2024-04-09	does not look too good	Quentin Carbonneaux

2024-04-09	modulo ac matching and more tests	Quentin Carbonneaux

2024-04-09	wip ins-tree matcher	Quentin Carbonneaux