| Age | Commit message (Collapse) | Author |
|
Two simple loop optimizations.
1. Strength reduction of mul[tiplication] by loop induction
variable.
2. Hoisting of (address) base into phi where loop induction
variable is used only as a base (address) offset.
Limited to loops with a single body block, which happily
is always innermost loops. This restriction would not be
very hard to lift - it would require detecting the set of
loop blocks (and ensuring reducibility?)
Limited to loop induction variables with 0 initial value
and increment of 1 (for mul strength reduction). This
limitation is trivial to lift; however all of the
cproc/hare[c]/coremark opportunity is with 0/1 loops for
mul reduction, and 0 initial value for base-offset opt.
|
|
- Many stylistic nits.
- Removed blkmerge().
- Some minor bug fixes.
- GCM reassoc is now "sink"; a pass that
moves trivial ops in their target block
with the same goal of reducing register
pressure, but starting from instructions
that benefit from having their inputs
close.
|
|
|
|
More or less as proposed in its ninth iteration with the
addition of a gcmmove() functionality to restore coherent
local schedules.
Changes since RFC 8:
Features:
- generalization of phi 1/0 detection
- collapse linear jmp chains before GVN; simplifies if-graph
detection used in 0/non-0 value inference and if-elim...
- infer 0/non-0 values from dominating blk jnz; eliminates
redundant cmp eq/ne 0 and associated jnz/blocks, for example
redundant null pointer checks (hare codebase likes this)
- remove (emergent) empty if-then-else graphlets between GVN and
GCM; improves GCM instruction placement, particularly cmps.
- merge %addr =l add %addr1, N sequences - reduces tmp count,
register pressure.
- squash consecutive associative ops with constant args, e.g.
t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M
Bug Fixes:
- remove "cmp eq/ne of non-identical RCon's " in copyref().
RCon's are not guaranteed to be dedup'ed, and symbols can
alias.
Codebase:
- moved some stuff into cfg.c including blkmerge()
- some refactoring in gvn.c
- simplification of reassoc.c - always reassoc all cmp ops
and Kl add %t, N. Better on coremark, smaller codebase.
- minor simplification of movins() - use vins
Testing - standard QBE, cproc, hare, harec, coremark
[still have Rust build issues with latest roland]
Benchmark
- coremark is ~15%+ faster than master
- hare "HARETEST_INCLUDE='slow' make check" ~8% faster
(crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster)
Changes since RFC 7:
Bug fixes:
- remove isbad4gcm() in GVN/GCM - it is unsound due to different state
at GVN vs GCM time; replace with "reassociation" pass after GCM
- fix intra-blk use-before-def after GCM
- prevent GVN from deduping trapping instructions cos GCM will not
move them
- remove cmp eq/ne identical arg copy detection for floating point, it
is not valid for NaN
- fix cges/cged flagged as commutative in ops.h instead of cnes/cned
respectively; just a typo
Minor features:
- copy detection handles cmp le/lt/ge/gt with identical args
- treat (integer) div/rem by non-zero constant as non-trapping
- eliminate add N/sub N pairs in copy detection
- maintain accurate tmp use in GVN; not strictly necessary but enables
interim global state sanity checking
- "reassociation" of trivial constant offset load/store addresses, and
cmp ops with point-of-use in pass after GCM
- normalise commutative op arg order - e.g. op con, tmp -> op tmp, con
to simplify copy detection and GVN instruction dedup
Codebase:
- split out core copy detection and constant folding (back) out into
copy.c, fold.c respectively; gvn.c was getting monolithic
- generic support for instruction moving in ins.c - used by GCM and
reassoc
- new reassociation pass in reassoc.c
- other minor clean-up/refactor
Changes since RFC 6:
- More ext elimination in GVN by examination of def and use bit width
- elimination of redundant and mask by bit width examination
- Incorporation of Song's patch
Changes since RFC 5:
- avoidance of "bad" candidates for GVN/GCM - trivial address offset
calculations, and comparisons
- more copy detection mostly around boolean values
- allow elimination of unused load, alloc, trapping instructions
- detection of trivial boolean v ? 1 : 0 phi patterns
- bug fix for (removal of) "chg" optimisation in ins recreation - it
was missing removal of unused instructions in some cases
ifelim() between GVN and GCM; deeper nopunused()
|
|
Remove edgedel() calls from fillrpo().
Call new prunephis() from fillpreds().
[Curiously this never seems to do anything even tho edgedel()
is no longer called from fillrpo()]
One remaining fillpreds() call in parse.c typecheck - seems
like it will still work the same.
defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds()
comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers???
static fillrpo(); remove edgedel() from fillrpo()
replace fillrpo() and/or fillpreds() with fillcfg()
|
|
Always used this way and factors setting b->nins.
Makes b->ins vector contract more obvious.
|
|
When rbp is not necessary to compile
a leaf function, we skip saving and
restoring it.
|
|
|
|
|
|
|
|
|
|
It is semantically the same but
does not rely on implementation-
defined behavior.
|
|
References: 12f9d16c7b000 ("create cfg.c for cfg-related functions")
|
|
dbgloc line [col]
This is implemented in a backwards-compatible manner.
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
|
|
|
|
This is consistent with newtmp()
and newcon().
|
|
|
|
|
|
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
|
|
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
|
|
This pass limits stack usage when
many small aggregates are allocated
on the stack. A fast liveness
analysis figures out which slots
interfere and the pass then fuses
slots that do not interfere. The
pass also kills stack slots that
are only ever assigned.
On the hare stdlib test suite, this
fusion pass managed to reduce the
total eligible slot bytes count
by 84%.
The slots considered for fusion
must not escape and not exceed
64 bytes in size.
|
|
We will be using it in the new
coalesce() pass.
|
|
Stack slots may have padding
bytes, and if we want to have
precise liveness information
it's important that we are able
to tell them apart.
This patch extends fillalias()
to remember for every slot
what bytes were ever assigned.
In case the slot address does
not escape we know that only
these bytes matter.
To save space, we only store
this information if the slot
size is less than or equal to
NBit.
The Alias struct was reworked
a bit to save some space. I am
still not very satisfied with
its layout though.
|
|
We had the invariant that it'd
always be a temporary.
|
|
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
|
|
|
|
The apple targets are not done yet.
|
|
|
|
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.
Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
|
|
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
|
|
|
|
I also moved some isel logic
that would have been repeated
a third time in util.c.
|
|
That is not available on osx
so I tweaked the gas.c api
a little to conditionally
output the two directives.
|
|
The risc-v abi needs to know if a
type is defined as a union or not.
We cannot use nunion to obtain this
information because the risc-v abi
made the unfortunate decision of
treating
union { int i; }
differently from
int i;
So, instead, I introduce a single
bit flag 'isunion'.
|
|
|
|
It is mostly complete, but still has a few ABI bugs when passing
floats in structs, or when structs are passed partly in register,
and partly on stack.
|
|
This allows frontends to use BSS generically, without knowledge of
platform-dependent details.
|
|
|
|
parseref() has code to reuse address constants, but this is not
done in other passes such as fold or isel. Introduce a new function
newcon() which takes a Con and returns a Ref for that constant, and
use this whenever creating address constants.
This is necessary to fix folding of address constants when one
operand is already folded. For example, in
%a =l add $x, 1
%b =l add %a, 2
%c =w loadw %b
%a and %b were folded to $x+1 and $x+3 respectively, but then the
second add is visited again since it uses %a. This gets folded to
$x+3 as well, but as a new distinct constant. This results in %b
getting labeled as bottom instead of either constant, disabling the
replacement of %b by a constant in subsequent instructions (such
as the loadw).
|
|
|