aboutsummaryrefslogtreecommitdiff
path: root/ssa.c
AgeCommit message (Collapse)Author
2025-03-14gvn/gcm reviewQuentin Carbonneaux
- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14Global Value Numbering / Global Code MotionRoland Paterson-Jones
More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2024-05-03add width info for comparisonsQuentin Carbonneaux
Comparisons return a 1-bit value, in theory we could add a Wu1 width for them but I did not bother and just used Wub. This simply means that if a frontend generates an extsb of a comparison result (silly), we will not generate good code.
2022-12-25new UNDEF RefQuentin Carbonneaux
Crashing loads of uninitialized memory proved to be a problem when implementing unions using qbe. This patch introduces a new UNDEF Ref to represent data that is known to be uninitialized. Optimization passes can make use of it to eliminate some code. In the last compilation stages, UNDEF is treated as the constant 0xdeaddead.
2022-11-22rename Tmp.ins to be more descriptiveQuentin Carbonneaux
2022-11-20fill definition site in filluse()Quentin Carbonneaux
2022-10-03fix case of Pool constantsQuentin Carbonneaux
2022-10-03refine width of parsb/ub/sh/uh opsQuentin Carbonneaux
2020-08-06Use a dynamic array for phi argumentsMichael Forney
2019-04-11properly detect ssa formQuentin Carbonneaux
Previously, we would skip ssa construction when a temporary has a single definition. This is only part of the ssa invariant: we must also check that all uses are dominated by the single definition. The new code does this. In fact, qbe does not store all the dominators for a block, so instead of walking the idom linked list we use a rough heuristic and declare conservatively that B0 dominates B1 when one of the two conditions is true: a. B0 is the start block b. B0 is B1 Some measurements on a big file from Michael Forney show that the code is still as fast as before this patch.
2019-03-01skip expensive ssa-building loop when possibleQuentin Carbonneaux
If a temporary is assigned exactly once (most are), there is no need to do any work to put it in ssa form. On an input file of ~35k loc, this makes the processing time go from 2.9 secs to 1.2 secs.
2018-04-26Fix compiler warnings.Emil Skoeldberg
Compiler warned about comparison between signed and unsigned values.
2017-05-16do not account for interferences in phi classesQuentin Carbonneaux
Before this commit, I tried to make sure that two interfering temporaries never ended up in the same phi class. This was to make sure that their register hints were not counterproductively stepping on each other's toes. The idea is fine, but: * the implementation is clumsy because it mixes the orthogonal concepts of (i) interference and (ii) phi classes; * the hinting process in the register allocator is hard to understand because the disjoint-set data structure used for phi classes is cut in arbitrary places. After this commit, the phi classes *really* are phi classes represented with a proper disjoint-set data structure.
2017-02-25do sign/zero extensions removal in copy.cQuentin Carbonneaux
2017-02-06use uint for block idsQuentin Carbonneaux
2017-01-12use a less obtuse api for vnew()Quentin Carbonneaux
2016-12-12create cfg.c for cfg-related functionsQuentin Carbonneaux
2016-08-15specify the allocation function in vnewQuentin Carbonneaux
2016-04-19check for trivial undefined uses in ssacheckQuentin Carbonneaux
2016-04-18factor some subtyping logic in clsmerge()Quentin Carbonneaux
2016-04-13handle odd jumps in blkdel() an renblk()Quentin Carbonneaux
2016-04-13do not compute def-use links for regsQuentin Carbonneaux
2016-04-13hack an ssa validator (likely buggy)Quentin Carbonneaux
2016-04-12fix bug in predecessors filling codeQuentin Carbonneaux
2016-04-12simplify fillpreds() codeQuentin Carbonneaux
2016-04-09add a proper block deletion routineQuentin Carbonneaux
2016-03-31cleanup error handlingQuentin Carbonneaux
2016-03-29new layout, put LICENSE in rootQuentin Carbonneaux