aboutsummaryrefslogtreecommitdiff
path: root/main.c
AgeCommit message (Collapse)Author
13 daysnew simplcfg passQuentin Carbonneaux
Useful for ifopt to match more often. Empty blocks are fused and conditional jumps on empty blocks with the same successor (and no phis in the successor) are collapsed.
13 daysIf-conversion RFC 4 - x86 only (for now), use cmovXXRoland Paterson-Jones
Replacement of tiny conditional jump graphlets with conditional move instructions. Currently enabled only for x86. Arm64 support using cselXX will be essentially identical. Adds (internal) frontend sel0/sel1 ops with flag-specific backend xselXX following jnz implementation pattern. Testing: standard QBE, cproc, harec, hare, roland
2025-03-14gvn/gcm reviewQuentin Carbonneaux
- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14Global Value Numbering / Global Code MotionRoland Paterson-Jones
More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14Combine fillrpo() and fillpreds() into fillcfg().Roland Paterson-Jones
Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2023-06-06implement line number info trackingThomas Bracht Laumann Jespersen
Support "file" and "loc" directives. "file" takes a string (a file name) assigns it a number, sets the current file to that number and records the string for later. "loc" takes a single number and outputs location information with a reference to the current file.
2022-12-14new blit instructionQuentin Carbonneaux
2022-11-20new slot coalescing passQuentin Carbonneaux
This pass limits stack usage when many small aggregates are allocated on the stack. A fast liveness analysis figures out which slots interfere and the pass then fuses slots that do not interfere. The pass also kills stack slots that are only ever assigned. On the hare stdlib test suite, this fusion pass managed to reduce the total eligible slot bytes count by 84%. The slots considered for fusion must not escape and not exceed 64 bytes in size.
2022-10-08fix asm comment positionQuentin Carbonneaux
When emitting data detected as zero the comment appeared before the data directives were output.
2022-10-03new arm64_apple targetQuentin Carbonneaux
Should make qbe work on apple arm-based hardware.
2022-10-03add new target-specific abi0 passQuentin Carbonneaux
The general idea is to give abis a chance to talk before we've done all the optimizations. Currently, all targets eliminate {par,arg,ret}{sb,ub,...} during this pass. The forthcoming arm64_apple will, however, insert proper extensions during abi0. Moving forward abis can, for example, lower small-aggregates passing there so that memory optimizations can interact better with function calls.
2022-08-31drop -G flag and add target amd64_appleQuentin Carbonneaux
apple support is more than assembly syntax in case of arm64 machines, and apple syntax is currently useless in all cases but amd64; rather than having a -G option that only makes sense with amd64, we add a new target amd64_apple
2022-08-31flag the default target in "qbe -h"Quentin Carbonneaux
2022-04-11move nx stack annotation to gas.cQuentin Carbonneaux
2022-04-11Close input file after done readingDaniel Xu
Leaks resources to not close. Signed-off-by: Daniel Xu <[email protected]>
2022-03-15new -t? flag to print default targetQuentin Carbonneaux
2022-03-14output symbol type and sizeQuentin Carbonneaux
That is not available on osx so I tweaked the gas.c api a little to conditionally output the two directives.
2022-02-17add rv64 backendMichael Forney
It is mostly complete, but still has a few ABI bugs when passing floats in structs, or when structs are passed partly in register, and partly on stack.
2022-01-31Do not use the asm keyword as a local variableDetlef Riekenberg
Signed-off-by: Detlef Riekenberg <[email protected]>
2022-01-23check for fopen() errors for output fileBor Grošelj Simić
2021-09-09skip nx stack annotation on osxQuentin Carbonneaux
2021-03-12Arrange debug flag table to match pass orderMichael Forney
This makes it easier to determine which flag to pass to show the desired debug info.
2019-05-02move fillloop() after fold()Quentin Carbonneaux
SCCP is currently the one and only pass which seriously affects control flow; so we must compute loop costs afterwards.
2019-04-11properly detect ssa formQuentin Carbonneaux
Previously, we would skip ssa construction when a temporary has a single definition. This is only part of the ssa invariant: we must also check that all uses are dominated by the single definition. The new code does this. In fact, qbe does not store all the dominators for a block, so instead of walking the idom linked list we use a rough heuristic and declare conservatively that B0 dominates B1 when one of the two conditions is true: a. B0 is the start block b. B0 is B1 Some measurements on a big file from Michael Forney show that the code is still as fast as before this patch.
2017-04-08new arm64 backend, yeepeeQuentin Carbonneaux
2017-04-08prepare for multi-targetQuentin Carbonneaux
This big diff does multiple changes to allow the addition of new targets to qbe. The changes are listed below in decreasing order of impact. 1. Add a new Target structure. To add support for a given target, one has to implement all the members of the Target structure. All the source files where changed to use this interface where needed. 2. Single out amd64-specific code. In this commit, the amd64 target T_amd64_sysv is the only target available, it is implemented in the amd64/ directory. All the non-static items in this directory are prefixed with either amd64_ or amd64_sysv (for items that are specific to the System V ABI). 3. Centralize Ops information. There is now a file 'ops.h' that must be used to store all the available operations together with their metadata. The various targets will only select what they need; but it is beneficial that there is only *one* place to change to add a new instruction. One good side effect of this change is that any operation 'xyz' in the IL now as a corresponding 'Oxyz' in the code. 4. Misc fixes. One notable change is that instruction selection now generates generic comparison operations and the lowering to the target's comparisons is done in the emitter. GAS directives for data are the same for many targets, so data emission was extracted in a file 'gas.c'. 5. Modularize the Makefile. The Makefile now has a list of C files that are target-independent (SRC), and one list of C files per target. Each target can also use its own 'all.h' header (for example to define registers).
2017-02-27scrub assembly outputQuentin Carbonneaux
Notably, this adds a new pass to get rid of jumps on jumps.
2017-02-25do sign/zero extensions removal in copy.cQuentin Carbonneaux
2017-02-24start a new simplification passQuentin Carbonneaux
2017-02-24reenable and fix a bug in memoptQuentin Carbonneaux
While a minimal dead store elimination is not implemented, the generated code looks quite a bit better with them enabled. It also is quite cheap.
2017-02-10support variable argument listsQuentin Carbonneaux
This change is backward compatible, calls to "variadic" functions (like printf) must now be annotated (with ...).
2017-02-06use uint for block idsQuentin Carbonneaux
2017-01-12use a less obtuse api for vnew()Quentin Carbonneaux
2016-12-21schedule loop nesting computations earlierQuentin Carbonneaux
2016-12-12use the new load optimizationQuentin Carbonneaux
2016-04-13hack an ssa validator (likely buggy)Quentin Carbonneaux
2016-04-11improve help message slightlyQuentin Carbonneaux
2016-04-09rebuild rpo after foldQuentin Carbonneaux
2016-04-09enable constant foldingQuentin Carbonneaux
2016-04-01don't try to keep use counts in abi()Quentin Carbonneaux
Abi lowering does not need use counts, but they are needed for instruction selection. I changed main to call filluse() between these two passes.
2016-03-31move abi code in a new fileQuentin Carbonneaux
2016-03-29new layout, put LICENSE in rootQuentin Carbonneaux