| Age | Commit message (Collapse) | Author |
|
- Many stylistic nits.
- Removed blkmerge().
- Some minor bug fixes.
- GCM reassoc is now "sink"; a pass that
moves trivial ops in their target block
with the same goal of reducing register
pressure, but starting from instructions
that benefit from having their inputs
close.
|
|
More or less as proposed in its ninth iteration with the
addition of a gcmmove() functionality to restore coherent
local schedules.
Changes since RFC 8:
Features:
- generalization of phi 1/0 detection
- collapse linear jmp chains before GVN; simplifies if-graph
detection used in 0/non-0 value inference and if-elim...
- infer 0/non-0 values from dominating blk jnz; eliminates
redundant cmp eq/ne 0 and associated jnz/blocks, for example
redundant null pointer checks (hare codebase likes this)
- remove (emergent) empty if-then-else graphlets between GVN and
GCM; improves GCM instruction placement, particularly cmps.
- merge %addr =l add %addr1, N sequences - reduces tmp count,
register pressure.
- squash consecutive associative ops with constant args, e.g.
t1 = add t, N ... t2 = add t2, M -> t2 = add t, N+M
Bug Fixes:
- remove "cmp eq/ne of non-identical RCon's " in copyref().
RCon's are not guaranteed to be dedup'ed, and symbols can
alias.
Codebase:
- moved some stuff into cfg.c including blkmerge()
- some refactoring in gvn.c
- simplification of reassoc.c - always reassoc all cmp ops
and Kl add %t, N. Better on coremark, smaller codebase.
- minor simplification of movins() - use vins
Testing - standard QBE, cproc, hare, harec, coremark
[still have Rust build issues with latest roland]
Benchmark
- coremark is ~15%+ faster than master
- hare "HARETEST_INCLUDE='slow' make check" ~8% faster
(crypto::sha1::sha1_1gb is biggest obvious win - ~25% faster)
Changes since RFC 7:
Bug fixes:
- remove isbad4gcm() in GVN/GCM - it is unsound due to different state
at GVN vs GCM time; replace with "reassociation" pass after GCM
- fix intra-blk use-before-def after GCM
- prevent GVN from deduping trapping instructions cos GCM will not
move them
- remove cmp eq/ne identical arg copy detection for floating point, it
is not valid for NaN
- fix cges/cged flagged as commutative in ops.h instead of cnes/cned
respectively; just a typo
Minor features:
- copy detection handles cmp le/lt/ge/gt with identical args
- treat (integer) div/rem by non-zero constant as non-trapping
- eliminate add N/sub N pairs in copy detection
- maintain accurate tmp use in GVN; not strictly necessary but enables
interim global state sanity checking
- "reassociation" of trivial constant offset load/store addresses, and
cmp ops with point-of-use in pass after GCM
- normalise commutative op arg order - e.g. op con, tmp -> op tmp, con
to simplify copy detection and GVN instruction dedup
Codebase:
- split out core copy detection and constant folding (back) out into
copy.c, fold.c respectively; gvn.c was getting monolithic
- generic support for instruction moving in ins.c - used by GCM and
reassoc
- new reassociation pass in reassoc.c
- other minor clean-up/refactor
Changes since RFC 6:
- More ext elimination in GVN by examination of def and use bit width
- elimination of redundant and mask by bit width examination
- Incorporation of Song's patch
Changes since RFC 5:
- avoidance of "bad" candidates for GVN/GCM - trivial address offset
calculations, and comparisons
- more copy detection mostly around boolean values
- allow elimination of unused load, alloc, trapping instructions
- detection of trivial boolean v ? 1 : 0 phi patterns
- bug fix for (removal of) "chg" optimisation in ins recreation - it
was missing removal of unused instructions in some cases
ifelim() between GVN and GCM; deeper nopunused()
|
|
Always used this way and factors setting b->nins.
Makes b->ins vector contract more obvious.
|
|
When rbp is not necessary to compile
a leaf function, we skip saving and
restoring it.
|
|
Functions are now aligned on 16-byte
boundaries. This mimics gcc and should
help reduce the maximum perf impact of
cosmetic code changes. Previously, any
change in the output of qbe could have
far reaching implications on alignment.
Thanks to Roland Paterson-Jones for
pointing out the variability issue.
|
|
|
|
This significantly improves parsing performance for massive functions
with a huge number of temporaries. Parsing the 86MiB IL produced
by cproc during zig bootstrap drops from 17m15s to 2.5s (over 400x
speedup).
The speedup is much smaller for IL produced from normal non-autogenerated
C code. Parsing the sqlite3 amalgamation drops from 0.40s to 0.33s.
|
|
|
|
|
|
Credit goes to Roland Paterson-Jones
for spotting this bug.
|
|
The parsing code for these constants
conflicts with the Tplus token.
|
|
Otherwise, the alignment gets truncated to fit in char, so
`align 256` is handled as no alignment requirement.
|
|
dbgloc line [col]
This is implemented in a backwards-compatible manner.
|
|
|
|
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
|
|
This is consistent with newtmp()
and newcon().
|
|
Crashing loads of uninitialized memory
proved to be a problem when implementing
unions using qbe. This patch introduces
a new UNDEF Ref to represent data that is
known to be uninitialized. Optimization
passes can make use of it to eliminate
some code. In the last compilation stages,
UNDEF is treated as the constant 0xdeaddead.
|
|
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
|
|
The apple targets are not done yet.
|
|
|
|
|
|
|
|
Eg. data $a = { w $b $c }
|
|
|
|
When qbe is used with other tools is a bit hard to identify
what is the tool that is generating the error. Adding an
identifier at the beginning of the line makes much easier
to identify the tool generating the error.
|
|
Thanks to Daniel Xu for reporting.
|
|
The risc-v abi needs to know if a
type is defined as a union or not.
We cannot use nunion to obtain this
information because the risc-v abi
made the unfortunate decision of
treating
union { int i; }
differently from
int i;
So, instead, I introduce a single
bit flag 'isunion'.
|
|
|
|
This allows frontends to use BSS generically, without knowledge of
platform-dependent details.
|
|
|
|
|
|
|
|
Necessary for floating-point negation, because
`%result = sub 0, %operand` doesn't give the correct sign for 0/-0.
|
|
parseref() has code to reuse address constants, but this is not
done in other passes such as fold or isel. Introduce a new function
newcon() which takes a Con and returns a Ref for that constant, and
use this whenever creating address constants.
This is necessary to fix folding of address constants when one
operand is already folded. For example, in
%a =l add $x, 1
%b =l add %a, 2
%c =w loadw %b
%a and %b were folded to $x+1 and $x+3 respectively, but then the
second add is visited again since it uses %a. This gets folded to
$x+3 as well, but as a new distinct constant. This results in %b
getting labeled as bottom instead of either constant, disabling the
replacement of %b by a constant in subsequent instructions (such
as the loadw).
|
|
Some abis, like the riscv one, treat
arguments differently depending on
whether they are variadic or not.
To prepare for the upcomming riscv
target, we change the variadic call
syntax and give meaning to the
location of the '...' marker.
# new syntax
%ret =w call $f(w %regular, ..., w %variadic)
By nature of their abis, the change
is backwards compatible for existing
targets.
|
|
The documentation states that loadw is syntactic sugar for loadsw,
but it actually got parsed as Oload. If the result is an l temporary,
Oload behaves like Oloadl, not Oloadsw.
To fix this, parse Tloadw as Oloadsw explicitly.
|
|
This was causing issues with aggregate types. A simple reproduction is:
type :type.1 = align 8 { 24 }
type :type.2 = align 8 { w 1, :type.1 1 }
The size of type.2 should be 32, adding only 4 bytes of padding between
the first and second field. Prior to this patch, 20 bytes of padding was
added instead, causing the type to have a size of 48.
Signed-off-by: Drew DeVault <[email protected]>
|
|
Reported by Alessandro Mantovani.
Overly long function names would
trigger out-of-bounds accesses.
|
|
This allows you to explicitly specify the section to emit the data
directive for, allowing for sections other than .data: for example, .bss
or .init_array.
|
|
This now only limits the number of arguments when parsing the input SSA,
which is usually a small fixed size (depending on the frontend).
|
|
|
|
|
|
|
|
|