| Age | Commit message (Collapse) | Author |
|
Replacement of tiny conditional jump graphlets with
conditional move instructions.
Currently enabled only for x86. Arm64 support using cselXX
will be essentially identical.
Adds (internal) frontend sel0/sel1 ops with flag-specific
backend xselXX following jnz implementation pattern.
Testing: standard QBE, cproc, harec, hare, roland
|
|
- Many stylistic nits.
- Removed blkmerge().
- Some minor bug fixes.
- GCM reassoc is now "sink"; a pass that
moves trivial ops in their target block
with the same goal of reducing register
pressure, but starting from instructions
that benefit from having their inputs
close.
|
|
|
|
More or less as proposed in its ninth iteration with the
addition of a gcmmove() functionality to restore coherent
local schedules.
Changes since RFC 8:
Features:
- generalization of phi 1/0 detection
- collapse linear jmp chains before GVN; simplifies if-graph
detection used in 0/non-0 value inference and if-elim...
- infer 0/non-0 values from dominating blk jnz; eliminates
redundant cmp eq/ne 0 and associated jnz/blocks, for example
redundant null pointer checks (hare codebase likes this)
- remove (emergent) empty if-then-else graphlets between GVN and
GCM; improves GCM instruction placement, particularly cmps.
- merge %addr =l add %addr1, N sequences - reduces tmp count,
register pressure.
- squash consecutive associative ops with constant args, e.g.
t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M
Bug Fixes:
- remove "cmp eq/ne of non-identical RCon's " in copyref().
RCon's are not guaranteed to be dedup'ed, and symbols can
alias.
Codebase:
- moved some stuff into cfg.c including blkmerge()
- some refactoring in gvn.c
- simplification of reassoc.c - always reassoc all cmp ops
and Kl add %t, N. Better on coremark, smaller codebase.
- minor simplification of movins() - use vins
Testing - standard QBE, cproc, hare, harec, coremark
[still have Rust build issues with latest roland]
Benchmark
- coremark is ~15%+ faster than master
- hare "HARETEST_INCLUDE='slow' make check" ~8% faster
(crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster)
Changes since RFC 7:
Bug fixes:
- remove isbad4gcm() in GVN/GCM - it is unsound due to different state
at GVN vs GCM time; replace with "reassociation" pass after GCM
- fix intra-blk use-before-def after GCM
- prevent GVN from deduping trapping instructions cos GCM will not
move them
- remove cmp eq/ne identical arg copy detection for floating point, it
is not valid for NaN
- fix cges/cged flagged as commutative in ops.h instead of cnes/cned
respectively; just a typo
Minor features:
- copy detection handles cmp le/lt/ge/gt with identical args
- treat (integer) div/rem by non-zero constant as non-trapping
- eliminate add N/sub N pairs in copy detection
- maintain accurate tmp use in GVN; not strictly necessary but enables
interim global state sanity checking
- "reassociation" of trivial constant offset load/store addresses, and
cmp ops with point-of-use in pass after GCM
- normalise commutative op arg order - e.g. op con, tmp -> op tmp, con
to simplify copy detection and GVN instruction dedup
Codebase:
- split out core copy detection and constant folding (back) out into
copy.c, fold.c respectively; gvn.c was getting monolithic
- generic support for instruction moving in ins.c - used by GCM and
reassoc
- new reassociation pass in reassoc.c
- other minor clean-up/refactor
Changes since RFC 6:
- More ext elimination in GVN by examination of def and use bit width
- elimination of redundant and mask by bit width examination
- Incorporation of Song's patch
Changes since RFC 5:
- avoidance of "bad" candidates for GVN/GCM - trivial address offset
calculations, and comparisons
- more copy detection mostly around boolean values
- allow elimination of unused load, alloc, trapping instructions
- detection of trivial boolean v ? 1 : 0 phi patterns
- bug fix for (removal of) "chg" optimisation in ins recreation - it
was missing removal of unused instructions in some cases
ifelim() between GVN and GCM; deeper nopunused()
|
|
Removes last re-allocation of b->ins.
|
|
Always used this way and factors setting b->nins.
Makes b->ins vector contract more obvious.
|
|
Scratching an itch - avoid unnecesary re-allocation in idup()
which is called often in the optimisation chain.
Blk::ins is reallocated in xxx_abi() - needs further fiddling.
|
|
|
|
|
|
|
|
|
|
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
|
|
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
|
|
The apple targets are not done yet.
|
|
|
|
I also moved some isel logic
that would have been repeated
a third time in util.c.
|
|
|
|
parseref() has code to reuse address constants, but this is not
done in other passes such as fold or isel. Introduce a new function
newcon() which takes a Con and returns a Ref for that constant, and
use this whenever creating address constants.
This is necessary to fix folding of address constants when one
operand is already folded. For example, in
%a =l add $x, 1
%b =l add %a, 2
%c =w loadw %b
%a and %b were folded to $x+1 and $x+3 respectively, but then the
second add is visited again since it uses %a. This gets folded to
$x+3 as well, but as a new distinct constant. This results in %b
getting labeled as bottom instead of either constant, disabling the
replacement of %b by a constant in subsequent instructions (such
as the loadw).
|
|
|
|
Reported by Alessandro Mantovani.
These addresses are likely bogus, but
they triggered an unwarranted assertion
failure. We now raise a civilized error.
|
|
|
|
|
|
Symbols in the source file are still limited in
length because the rest of the code assumes that
strings always fit in NString bytes.
Regardless, there is already a benefit because
comparing/copying symbol names does not require
using strcmp()/strcpy() anymore.
|
|
Before this commit, I tried to make sure that
two interfering temporaries never ended up in
the same phi class.
This was to make sure that their register hints
were not counterproductively stepping on each
other's toes. The idea is fine, but:
* the implementation is clumsy because it
mixes the orthogonal concepts of
(i) interference and (ii) phi classes;
* the hinting process in the register
allocator is hard to understand because
the disjoint-set data structure used for
phi classes is cut in arbitrary places.
After this commit, the phi classes *really* are
phi classes represented with a proper disjoint-set
data structure.
|
|
The arm64 ABI needs to know precisely what
floating point types are being used, so we
need to store that information.
I also made typ[] a dynamic array.
|
|
This big diff does multiple changes to allow
the addition of new targets to qbe. The
changes are listed below in decreasing order
of impact.
1. Add a new Target structure.
To add support for a given target, one has to
implement all the members of the Target
structure. All the source files where changed
to use this interface where needed.
2. Single out amd64-specific code.
In this commit, the amd64 target T_amd64_sysv
is the only target available, it is implemented
in the amd64/ directory. All the non-static
items in this directory are prefixed with either
amd64_ or amd64_sysv (for items that are
specific to the System V ABI).
3. Centralize Ops information.
There is now a file 'ops.h' that must be used to
store all the available operations together with
their metadata. The various targets will only
select what they need; but it is beneficial that
there is only *one* place to change to add a new
instruction.
One good side effect of this change is that any
operation 'xyz' in the IL now as a corresponding
'Oxyz' in the code.
4. Misc fixes.
One notable change is that instruction selection
now generates generic comparison operations and
the lowering to the target's comparisons is done
in the emitter.
GAS directives for data are the same for many
targets, so data emission was extracted in a
file 'gas.c'.
5. Modularize the Makefile.
The Makefile now has a list of C files that
are target-independent (SRC), and one list
of C files per target. Each target can also
use its own 'all.h' header (for example to
define registers).
|
|
|
|
|
|
|
|
This was not necessary as temporaries were never freed
and returned from an array zero initialized. But in the
coming load optimization, we sometimes free temporaries
by resetting fn->ntmp.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
int is used all over the place for temporaries,
maybe this should be changed, I don't know.
Another thing to consider is that temporaries
are currently on 12 bits (and will be on 29
or 30 bits in the future), so int will always be
safe to store them. We just loose the free
invariant of non-negativity.
|
|
|
|
|
|
|