| Age | Commit message (Collapse) | Author |
|
Useful for ifopt to match more
often. Empty blocks are fused
and conditional jumps on empty
blocks with the same successor
(and no phis in the successor)
are collapsed.
|
|
Replacement of tiny conditional jump graphlets with
conditional move instructions.
Currently enabled only for x86. Arm64 support using cselXX
will be essentially identical.
Adds (internal) frontend sel0/sel1 ops with flag-specific
backend xselXX following jnz implementation pattern.
Testing: standard QBE, cproc, harec, hare, roland
|
|
- Many stylistic nits.
- Removed blkmerge().
- Some minor bug fixes.
- GCM reassoc is now "sink"; a pass that
moves trivial ops in their target block
with the same goal of reducing register
pressure, but starting from instructions
that benefit from having their inputs
close.
|
|
More or less as proposed in its ninth iteration with the
addition of a gcmmove() functionality to restore coherent
local schedules.
Changes since RFC 8:
Features:
- generalization of phi 1/0 detection
- collapse linear jmp chains before GVN; simplifies if-graph
detection used in 0/non-0 value inference and if-elim...
- infer 0/non-0 values from dominating blk jnz; eliminates
redundant cmp eq/ne 0 and associated jnz/blocks, for example
redundant null pointer checks (hare codebase likes this)
- remove (emergent) empty if-then-else graphlets between GVN and
GCM; improves GCM instruction placement, particularly cmps.
- merge %addr =l add %addr1, N sequences - reduces tmp count,
register pressure.
- squash consecutive associative ops with constant args, e.g.
t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M
Bug Fixes:
- remove "cmp eq/ne of non-identical RCon's " in copyref().
RCon's are not guaranteed to be dedup'ed, and symbols can
alias.
Codebase:
- moved some stuff into cfg.c including blkmerge()
- some refactoring in gvn.c
- simplification of reassoc.c - always reassoc all cmp ops
and Kl add %t, N. Better on coremark, smaller codebase.
- minor simplification of movins() - use vins
Testing - standard QBE, cproc, hare, harec, coremark
[still have Rust build issues with latest roland]
Benchmark
- coremark is ~15%+ faster than master
- hare "HARETEST_INCLUDE='slow' make check" ~8% faster
(crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster)
Changes since RFC 7:
Bug fixes:
- remove isbad4gcm() in GVN/GCM - it is unsound due to different state
at GVN vs GCM time; replace with "reassociation" pass after GCM
- fix intra-blk use-before-def after GCM
- prevent GVN from deduping trapping instructions cos GCM will not
move them
- remove cmp eq/ne identical arg copy detection for floating point, it
is not valid for NaN
- fix cges/cged flagged as commutative in ops.h instead of cnes/cned
respectively; just a typo
Minor features:
- copy detection handles cmp le/lt/ge/gt with identical args
- treat (integer) div/rem by non-zero constant as non-trapping
- eliminate add N/sub N pairs in copy detection
- maintain accurate tmp use in GVN; not strictly necessary but enables
interim global state sanity checking
- "reassociation" of trivial constant offset load/store addresses, and
cmp ops with point-of-use in pass after GCM
- normalise commutative op arg order - e.g. op con, tmp -> op tmp, con
to simplify copy detection and GVN instruction dedup
Codebase:
- split out core copy detection and constant folding (back) out into
copy.c, fold.c respectively; gvn.c was getting monolithic
- generic support for instruction moving in ins.c - used by GCM and
reassoc
- new reassociation pass in reassoc.c
- other minor clean-up/refactor
Changes since RFC 6:
- More ext elimination in GVN by examination of def and use bit width
- elimination of redundant and mask by bit width examination
- Incorporation of Song's patch
Changes since RFC 5:
- avoidance of "bad" candidates for GVN/GCM - trivial address offset
calculations, and comparisons
- more copy detection mostly around boolean values
- allow elimination of unused load, alloc, trapping instructions
- detection of trivial boolean v ? 1 : 0 phi patterns
- bug fix for (removal of) "chg" optimisation in ins recreation - it
was missing removal of unused instructions in some cases
ifelim() between GVN and GCM; deeper nopunused()
|
|
Remove edgedel() calls from fillrpo().
Call new prunephis() from fillpreds().
[Curiously this never seems to do anything even tho edgedel()
is no longer called from fillrpo()]
One remaining fillpreds() call in parse.c typecheck - seems
like it will still work the same.
defensive; fillcfg() combining fillrpo() and fillpreds() - problem after simpljmp() - think it is cos fillrpo() is still doing edgedel() which should now be covered by fillpreds()
comment out edgedel() in fillrpo() - fillcfg() no longer asserts after simpljmp() but seems like prunephis() never triggers???
static fillrpo(); remove edgedel() from fillrpo()
replace fillrpo() and/or fillpreds() with fillcfg()
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
|
|
This pass limits stack usage when
many small aggregates are allocated
on the stack. A fast liveness
analysis figures out which slots
interfere and the pass then fuses
slots that do not interfere. The
pass also kills stack slots that
are only ever assigned.
On the hare stdlib test suite, this
fusion pass managed to reduce the
total eligible slot bytes count
by 84%.
The slots considered for fusion
must not escape and not exceed
64 bytes in size.
|
|
When emitting data detected as zero
the comment appeared before the data
directives were output.
|
|
Should make qbe work on apple
arm-based hardware.
|
|
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.
Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
|
|
|
|
Leaks resources to not close.
Signed-off-by: Daniel Xu <[email protected]>
|
|
|
|
That is not available on osx
so I tweaked the gas.c api
a little to conditionally
output the two directives.
|
|
It is mostly complete, but still has a few ABI bugs when passing
floats in structs, or when structs are passed partly in register,
and partly on stack.
|
|
Signed-off-by: Detlef Riekenberg <[email protected]>
|
|
|
|
|
|
This makes it easier to determine which flag to pass to show the
desired debug info.
|
|
SCCP is currently the one and only
pass which seriously affects control
flow; so we must compute loop costs
afterwards.
|
|
Previously, we would skip ssa construction when
a temporary has a single definition. This is
only part of the ssa invariant: we must also
check that all uses are dominated by the single
definition. The new code does this.
In fact, qbe does not store all the dominators
for a block, so instead of walking the idom
linked list we use a rough heuristic and declare
conservatively that B0 dominates B1 when one of
the two conditions is true:
a. B0 is the start block
b. B0 is B1
Some measurements on a big file from Michael
Forney show that the code is still as fast as
before this patch.
|
|
|
|
This big diff does multiple changes to allow
the addition of new targets to qbe. The
changes are listed below in decreasing order
of impact.
1. Add a new Target structure.
To add support for a given target, one has to
implement all the members of the Target
structure. All the source files where changed
to use this interface where needed.
2. Single out amd64-specific code.
In this commit, the amd64 target T_amd64_sysv
is the only target available, it is implemented
in the amd64/ directory. All the non-static
items in this directory are prefixed with either
amd64_ or amd64_sysv (for items that are
specific to the System V ABI).
3. Centralize Ops information.
There is now a file 'ops.h' that must be used to
store all the available operations together with
their metadata. The various targets will only
select what they need; but it is beneficial that
there is only *one* place to change to add a new
instruction.
One good side effect of this change is that any
operation 'xyz' in the IL now as a corresponding
'Oxyz' in the code.
4. Misc fixes.
One notable change is that instruction selection
now generates generic comparison operations and
the lowering to the target's comparisons is done
in the emitter.
GAS directives for data are the same for many
targets, so data emission was extracted in a
file 'gas.c'.
5. Modularize the Makefile.
The Makefile now has a list of C files that
are target-independent (SRC), and one list
of C files per target. Each target can also
use its own 'all.h' header (for example to
define registers).
|
|
Notably, this adds a new pass to get rid of
jumps on jumps.
|
|
|
|
|
|
While a minimal dead store elimination is not
implemented, the generated code looks quite a
bit better with them enabled. It also is quite
cheap.
|
|
This change is backward compatible, calls to
"variadic" functions (like printf) must now be
annotated (with ...).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Abi lowering does not need use counts, but
they are needed for instruction selection.
I changed main to call filluse() between
these two passes.
|
|
|
|
|