c9x.me/qbe.git - QBE - Backend Compiler

Age	Commit message (Collapse)	Author
2025-03-14	gvn/gcm review	Quentin Carbonneaux
	- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14	Global Value Numbering / Global Code Motion	Roland Paterson-Jones
	More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2024-05-03	add width info for comparisons	Quentin Carbonneaux
	Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.
2022-12-25	new UNDEF Ref	Quentin Carbonneaux
	Crashing loads of uninitialized memory proved to be a problem when implementing unions using qbe. This patch introduces a new UNDEF Ref to represent data that is known to be uninitialized. Optimization passes can make use of it to eliminate some code. In the last compilation stages, UNDEF is treated as the constant 0xdeaddead.
2022-11-22	rename Tmp.ins to be more descriptive	Quentin Carbonneaux

2022-11-20	fill definition site in filluse()	Quentin Carbonneaux

2022-10-03	fix case of Pool constants	Quentin Carbonneaux

2022-10-03	refine width of parsb/ub/sh/uh ops	Quentin Carbonneaux

2020-08-06	Use a dynamic array for phi arguments	Michael Forney

2019-04-11	properly detect ssa form	Quentin Carbonneaux
	Previously, we would skip ssa construction when a temporary has a single definition. This is only part of the ssa invariant: we must also check that all uses are dominated by the single definition. The new code does this. In fact, qbe does not store all the dominators for a block, so instead of walking the idom linked list we use a rough heuristic and declare conservatively that B0 dominates B1 when one of the two conditions is true: a. B0 is the start block b. B0 is B1 Some measurements on a big file from Michael Forney show that the code is still as fast as before this patch.
2019-03-01	skip expensive ssa-building loop when possible	Quentin Carbonneaux
	If a temporary is assigned exactly once (most are), there is no need to do any work to put it in ssa form. On an input file of ~35k loc, this makes the processing time go from 2.9 secs to 1.2 secs.
2018-04-26	Fix compiler warnings.	Emil Skoeldberg
	Compiler warned about comparison between signed and unsigned values.
2017-05-16	do not account for interferences in phi classes	Quentin Carbonneaux
	Before this commit, I tried to make sure that two interfering temporaries never ended up in the same phi class. This was to make sure that their register hints were not counterproductively stepping on each other's toes. The idea is fine, but: * the implementation is clumsy because it mixes the orthogonal concepts of (i) interference and (ii) phi classes; * the hinting process in the register allocator is hard to understand because the disjoint-set data structure used for phi classes is cut in arbitrary places. After this commit, the phi classes really are phi classes represented with a proper disjoint-set data structure.
2017-02-25	do sign/zero extensions removal in copy.c	Quentin Carbonneaux

2017-02-06	use uint for block ids	Quentin Carbonneaux

2017-01-12	use a less obtuse api for vnew()	Quentin Carbonneaux

2016-12-12	create cfg.c for cfg-related functions	Quentin Carbonneaux

2016-08-15	specify the allocation function in vnew	Quentin Carbonneaux

2016-04-19	check for trivial undefined uses in ssacheck	Quentin Carbonneaux

2016-04-18	factor some subtyping logic in clsmerge()	Quentin Carbonneaux

2016-04-13	handle odd jumps in blkdel() an renblk()	Quentin Carbonneaux

2016-04-13	do not compute def-use links for regs	Quentin Carbonneaux

2016-04-13	hack an ssa validator (likely buggy)	Quentin Carbonneaux

2016-04-12	fix bug in predecessors filling code	Quentin Carbonneaux

2016-04-12	simplify fillpreds() code	Quentin Carbonneaux

2016-04-09	add a proper block deletion routine	Quentin Carbonneaux

2016-03-31	cleanup error handling	Quentin Carbonneaux

2016-03-29	new layout, put LICENSE in root	Quentin Carbonneaux