aboutsummaryrefslogtreecommitdiff
path: root/all.h
AgeCommit message (Collapse)Author
2025-03-21Simple Inner Loop OptimzationloopoptRoland Paterson-Jones
Two simple loop optimizations. 1. Strength reduction of mul[tiplication] by loop induction variable. 2. Hoisting of (address) base into phi where loop induction variable is used only as a base (address) offset. Limited to loops with a single body block, which happily is always innermost loops. This restriction would not be very hard to lift - it would require detecting the set of loop blocks (and ensuring reducibility?) Limited to loop induction variables with 0 initial value and increment of 1 (for mul strength reduction). This limitation is trivial to lift; however all of the cproc/hare[c]/coremark opportunity is with 0/1 loops for mul reduction, and 0 initial value for base-offset opt.
2025-03-14gvn/gcm reviewQuentin Carbonneaux
- Many stylistic nits. - Removed blkmerge(). - Some minor bug fixes. - GCM reassoc is now "sink"; a pass that moves trivial ops in their target block with the same goal of reducing register pressure, but starting from instructions that benefit from having their inputs close.
2025-03-14Get rid of movins() infra.Roland Paterson-Jones
2025-03-14Global Value Numbering / Global Code MotionRoland Paterson-Jones
More or less as proposed in its ninth iteration with the addition of a gcmmove() functionality to restore coherent local schedules. Changes since RFC 8: Features: - generalization of phi 1/0 detection - collapse linear jmp chains before GVN; simplifies if-graph detection used in 0/non-0 value inference and if-elim... - infer 0/non-0 values from dominating blk jnz; eliminates redundant cmp eq/ne 0 and associated jnz/blocks, for example redundant null pointer checks (hare codebase likes this) - remove (emergent) empty if-then-else graphlets between GVN and GCM; improves GCM instruction placement, particularly cmps. - merge %addr =l add %addr1, N sequences - reduces tmp count, register pressure. - squash consecutive associative ops with constant args, e.g. t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M Bug Fixes: - remove "cmp eq/ne of non-identical RCon's " in copyref(). RCon's are not guaranteed to be dedup'ed, and symbols can alias. Codebase: - moved some stuff into cfg.c including blkmerge() - some refactoring in gvn.c - simplification of reassoc.c - always reassoc all cmp ops and Kl add %t, N. Better on coremark, smaller codebase. - minor simplification of movins() - use vins Testing - standard QBE, cproc, hare, harec, coremark [still have Rust build issues with latest roland] Benchmark - coremark is ~15%+ faster than master - hare "HARETEST_INCLUDE='slow' make check" ~8% faster (crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster) Changes since RFC 7: Bug fixes: - remove isbad4gcm() in GVN/GCM - it is unsound due to different state at GVN vs GCM time; replace with "reassociation" pass after GCM - fix intra-blk use-before-def after GCM - prevent GVN from deduping trapping instructions cos GCM will not move them - remove cmp eq/ne identical arg copy detection for floating point, it is not valid for NaN - fix cges/cged flagged as commutative in ops.h instead of cnes/cned respectively; just a typo Minor features: - copy detection handles cmp le/lt/ge/gt with identical args - treat (integer) div/rem by non-zero constant as non-trapping - eliminate add N/sub N pairs in copy detection - maintain accurate tmp use in GVN; not strictly necessary but enables interim global state sanity checking - "reassociation" of trivial constant offset load/store addresses, and cmp ops with point-of-use in pass after GCM - normalise commutative op arg order - e.g. op con, tmp -> op tmp, con to simplify copy detection and GVN instruction dedup Codebase: - split out core copy detection and constant folding (back) out into copy.c, fold.c respectively; gvn.c was getting monolithic - generic support for instruction moving in ins.c - used by GCM and reassoc - new reassociation pass in reassoc.c - other minor clean-up/refactor Changes since RFC 6: - More ext elimination in GVN by examination of def and use bit width - elimination of redundant and mask by bit width examination - Incorporation of Song's patch Changes since RFC 5: - avoidance of "bad" candidates for GVN/GCM - trivial address offset calculations, and comparisons - more copy detection mostly around boolean values - allow elimination of unused load, alloc, trapping instructions - detection of trivial boolean v ? 1 : 0 phi patterns - bug fix for (removal of) "chg" optimisation in ins recreation - it was missing removal of unused instructions in some cases ifelim() between GVN and GCM; deeper nopunused()
2025-03-14Combine fillrpo() and fillpreds() into fillcfg().Roland Paterson-Jones
Remove edgedel() calls from fillrpo(). Call new prunephis() from fillpreds(). [Curiously this never seems to do anything even tho edgedel() is no longer called from fillrpo()] One remaining fillpreds() call in parse.c typecheck - seems like it will still work the same. defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds() comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers??? static fillrpo(); remove edgedel() from fillrpo() replace fillrpo() and/or fillpreds() with fillcfg()
2025-03-14idup(Ins **, Ins *, ulong) -> idup(Blk *, Ins *, ulong)Roland Paterson-Jones
Always used this way and factors setting b->nins. Makes b->ins vector contract more obvious.
2024-08-23skip preludes for some leaf fnsQuentin Carbonneaux
When rbp is not necessary to compile a leaf function, we skip saving and restoring it.
2024-06-09Optab-driven copy detectionRoland Paterson-Jones
2024-04-12add common linkage for dataQuentin Carbonneaux
2024-04-11fold scaled offsets in addressesQuentin Carbonneaux
2024-04-09use mgen in amd64/isel.cQuentin Carbonneaux
2024-04-07mcf's improved rsval()Quentin Carbonneaux
It is semantically the same but does not rely on implementation- defined behavior.
2024-03-07remove duplicated function declarationsNguyễn Gia Phong
References: 12f9d16c7b000 ("create cfg.c for cfg-related functions")
2024-01-02dbgloc: add column argumentDrew DeVault
dbgloc line [col] This is implemented in a backwards-compatible manner.
2023-06-06implement line number info trackingThomas Bracht Laumann Jespersen
Support "file" and "loc" directives. "file" takes a string (a file name) assigns it a number, sets the current file to that number and records the string for later. "loc" takes a single number and outputs location information with a reference to the current file.
2023-05-31Bump NStringAlexey Yerin
2023-04-03Fix 1 C UBLocria Cyber
2023-03-22rename blknew() to newblk()Quentin Carbonneaux
This is consistent with newtmp() and newcon().
2023-03-19naming nitQuentin Carbonneaux
2023-03-16silence format warning more reliablyQuentin Carbonneaux
2022-12-25new UNDEF RefQuentin Carbonneaux
Crashing loads of uninitialized memory proved to be a problem when implementing unions using qbe. This patch introduces a new UNDEF Ref to represent data that is known to be uninitialized. Optimization passes can make use of it to eliminate some code. In the last compilation stages, UNDEF is treated as the constant 0xdeaddead.
2022-12-14new blit instructionQuentin Carbonneaux
2022-12-12new rsval() helper for signed RefsQuentin Carbonneaux
The .val field is signed in RSlot. Add a new dedicated function to fetch it as a signed int.
2022-11-27new hlt block terminatorQuentin Carbonneaux
It is handy to express when the end of a block cannot be reached. If a hlt terminator is executed, it traps the program. We don't go the llvm way and specify execution semantics as undefined behavior.
2022-11-22use a new struct for symbolsQuentin Carbonneaux
Symbols are a useful abstraction that occurs in both Con and Alias. In this patch they get their own struct. This new struct packages a symbol name and a type; the type tells us where the symbol name must be interpreted (currently, in gobal memory or in thread-local storage). The refactor fixed a bug in addcon(), proving the value of packaging symbol names with their type.
2022-11-22rename Tmp.ins to be more descriptiveQuentin Carbonneaux
2022-11-20new slot coalescing passQuentin Carbonneaux
This pass limits stack usage when many small aggregates are allocated on the stack. A fast liveness analysis figures out which slots interfere and the pass then fuses slots that do not interfere. The pass also kills stack slots that are only ever assigned. On the hare stdlib test suite, this fusion pass managed to reduce the total eligible slot bytes count by 84%. The slots considered for fusion must not escape and not exceed 64 bytes in size.
2022-11-20export getalias()Quentin Carbonneaux
We will be using it in the new coalesce() pass.
2022-11-20stored bytes in Alias informationQuentin Carbonneaux
Stack slots may have padding bytes, and if we want to have precise liveness information it's important that we are able to tell them apart. This patch extends fillalias() to remember for every slot what bytes were ever assigned. In case the slot address does not escape we know that only these bytes matter. To save space, we only store this information if the slot size is less than or equal to NBit. The Alias struct was reworked a bit to save some space. I am still not very satisfied with its layout though.
2022-11-20make Alias.base an intQuentin Carbonneaux
We had the invariant that it'd always be a temporary.
2022-11-20fill definition site in filluse()Quentin Carbonneaux
2022-10-08mark apple targets with a booleanQuentin Carbonneaux
It is more natural to branch on a flag than have different function pointers for high-level passes.
2022-10-08"rel" fields become "reloc"Quentin Carbonneaux
2022-10-08do not drop relocation kind in alias analysisQuentin Carbonneaux
2022-10-08add support for thread-local storageQuentin Carbonneaux
The apple targets are not done yet.
2022-10-03fix case of Pool constantsQuentin Carbonneaux
2022-10-03add new target-specific abi0 passQuentin Carbonneaux
The general idea is to give abis a chance to talk before we've done all the optimizations. Currently, all targets eliminate {par,arg,ret}{sb,ub,...} during this pass. The forthcoming arm64_apple will, however, insert proper extensions during abi0. Moving forward abis can, for example, lower small-aggregates passing there so that memory optimizations can interact better with function calls.
2022-10-03parse sb,ub,sh,uh abi typesQuentin Carbonneaux
2022-08-31drop -G flag and add target amd64_appleQuentin Carbonneaux
apple support is more than assembly syntax in case of arm64 machines, and apple syntax is currently useless in all cases but amd64; rather than having a -G option that only makes sense with amd64, we add a new target amd64_apple
2022-03-17amd64: restore previous name of amd64_sysv targetMichael Forney
2022-03-15new -t? flag to print default targetQuentin Carbonneaux
2022-03-14dynamic stack allocs for arm64Quentin Carbonneaux
I also moved some isel logic that would have been repeated a third time in util.c.
2022-03-14output symbol type and sizeQuentin Carbonneaux
That is not available on osx so I tweaked the gas.c api a little to conditionally output the two directives.
2022-03-08flag types defined as unionsQuentin Carbonneaux
The risc-v abi needs to know if a type is defined as a union or not. We cannot use nunion to obtain this information because the risc-v abi made the unfortunate decision of treating union { int i; } differently from int i; So, instead, I introduce a single bit flag 'isunion'.
2022-03-08cosmeticsQuentin Carbonneaux
2022-02-17add rv64 backendMichael Forney
It is mostly complete, but still has a few ABI bugs when passing floats in structs, or when structs are passed partly in register, and partly on stack.
2022-02-11gas: put zero data into .bss by defaultMichael Forney
This allows frontends to use BSS generically, without knowledge of platform-dependent details.
2022-02-02shared linkage logic for func/dataQuentin Carbonneaux
2021-11-22reuse previous address constants in fold()Michael Forney
parseref() has code to reuse address constants, but this is not done in other passes such as fold or isel. Introduce a new function newcon() which takes a Con and returns a Ref for that constant, and use this whenever creating address constants. This is necessary to fix folding of address constants when one operand is already folded. For example, in %a =l add $x, 1 %b =l add %a, 2 %c =w loadw %b %a and %b were folded to $x+1 and $x+3 respectively, but then the second add is visited again since it uses %a. This gets folded to $x+3 as well, but as a new distinct constant. This results in %b getting labeled as bottom instead of either constant, disabling the replacement of %b by a constant in subsequent instructions (such as the loadw).
2021-11-10bump NStringQuentin Carbonneaux