| Age | Commit message (Collapse) | Author |
|
Replacement of tiny conditional jump graphlets with
conditional move instructions.
Currently enabled only for x86. Arm64 support using cselXX
will be essentially identical.
Adds (internal) frontend sel0/sel1 ops with flag-specific
backend xselXX following jnz implementation pattern.
Testing: standard QBE, cproc, harec, hare, roland
|
|
|
|
|
|
|
|
On Apple platforms x18 is not guaranteed
to be preserved across context switches.
So we now use IP1 as scratch register.
En passant, one dubious use of IP0 in
arm64/emit.c fixarg() was transitioned
to IP1. I believe the previous code could
clobber a user value if IP0 was live.
|
|
Removes last re-allocation of b->ins.
|
|
Always used this way and factors setting b->nins.
Makes b->ins vector contract more obvious.
|
|
Scratching an itch - avoid unnecesary re-allocation in idup()
which is called often in the optimisation chain.
Blk::ins is reallocated in xxx_abi() - needs further fiddling.
|
|
|
|
- dynamic allocations could generate
bad 'and' instructions (for the
and with -16 in salloc()).
- symbols used in w context would
generate adrp and add instructions
on wN registers while they seem to
only work on xN registers.
Thanks to Rosie for reporting them.
|
|
Clang incorrectly optimizes this negation with -O2 and causes QBE to
emit 0 in place of INT64_MIN.
|
|
Hopefully the right time now!
|
|
when applying a custom set of CFLAGS under clang that does not include
-std=c99, asm is treated as a keyword and as such can not be used as an
identifier. this prevents the issue by renaming the offending variables.
|
|
Quotes are used on Apple target
variants to flag that we must
not add the _ symbol prefix.
|
|
This is incompatible with binutils gas older than 2.26.
|
|
dbgloc line [col]
This is implemented in a backwards-compatible manner.
|
|
Causes errors with stock toolchain
on OpenBSD.
|
|
|
|
|
|
Support "file" and "loc" directives. "file" takes a string (a file name)
assigns it a number, sets the current file to that number and records
the string for later. "loc" takes a single number and outputs location
information with a reference to the current file.
|
|
|
|
This is consistent with newtmp()
and newcon().
|
|
|
|
|
|
|
|
To match x86
|
|
|
|
The .val field is signed in RSlot.
Add a new dedicated function to
fetch it as a signed int.
|
|
It is handy to express when
the end of a block cannot be
reached. If a hlt terminator
is executed, it traps the
program.
We don't go the llvm way and
specify execution semantics as
undefined behavior.
|
|
Symbols are a useful abstraction
that occurs in both Con and Alias.
In this patch they get their own
struct. This new struct packages
a symbol name and a type; the type
tells us where the symbol name
must be interpreted (currently, in
gobal memory or in thread-local
storage).
The refactor fixed a bug in
addcon(), proving the value of
packaging symbol names with their
type.
|
|
It is quite similar to arm64_apple.
Probably, the call that needs to be
generated also provides extra
invariants on top of the regular
abi, but I have not checked that.
Clang generates code that is a bit
neater than qbe's because, on x86,
a load can be fused in a call
instruction! We do not bother with
supporting these since we expect
only sporadic use of the feature.
For reference, here is what clang
might output for a store to the
second entry of a thread-local
array of ints:
movq _x@TLVP(%rip), %rdi
callq *(%rdi)
movl %ecx, 4(%rax)
|
|
It is documented nowhere how this is
supposed to work. It is also quite easy
to have assertion failures pop in the
linker when generating asm slightly
different from clang's!
The best source of information is found
in LLVM's source code (AArch64ISelLowering.cpp).
I paste it here for future reference:
/// Darwin only has one TLS scheme which must be capable of dealing with the
/// fully general situation, in the worst case. This means:
/// + "extern __thread" declaration.
/// + Defined in a possibly unknown dynamic library.
///
/// The general system is that each __thread variable has a [3 x i64] descriptor
/// which contains information used by the runtime to calculate the address. The
/// only part of this the compiler needs to know about is the first xword, which
/// contains a function pointer that must be called with the address of the
/// entire descriptor in "x0".
///
/// Since this descriptor may be in a different unit, in general even the
/// descriptor must be accessed via an indirect load. The "ideal" code sequence
/// is:
/// adrp x0, _var@TLVPPAGE
/// ldr x0, [x0, _var@TLVPPAGEOFF] ; x0 now contains address of descriptor
/// ldr x1, [x0] ; x1 contains 1st entry of descriptor,
/// ; the function pointer
/// blr x1 ; Uses descriptor address in x0
/// ; Address of _var is now in x0.
///
/// If the address of _var's descriptor *is* known to the linker, then it can
/// change the first "ldr" instruction to an appropriate "add x0, x0, #imm" for
/// a slight efficiency gain.
The call 'blr x1' above is actually
special in that it trashes less registers
than what the abi would normally permit.
In qbe, I don't take advantage of this
and lower the call like a regular call.
We can revise this later on. Again, the
source for this information is LLVM's
source code:
// TLS calls preserve all registers except those that absolutely must be
// trashed: X0 (it takes an argument), LR (it's a call) and NZCV (let's not be
// silly).
|
|
It is more natural to branch on a
flag than have different function
pointers for high-level passes.
|
|
|
|
The apple targets are not done yet.
|
|
|
|
Should make qbe work on apple
arm-based hardware.
|
|
The general idea is to give abis a
chance to talk before we've done all
the optimizations. Currently, all
targets eliminate {par,arg,ret}{sb,ub,...}
during this pass. The forthcoming
arm64_apple will, however, insert
proper extensions during abi0.
Moving forward abis can, for example,
lower small-aggregates passing there
so that memory optimizations can
interact better with function calls.
|
|
We have a uint alias that we use
everywhere else. I also added a
todo about unhandled large offsets
in arm64/emit.
|
|
This generates tidier code and is pic
friendly because it lets the linker
trampoline calls to dynlinked libs.
|
|
apple support is more than assembly syntax
in case of arm64 machines, and apple syntax
is currently useless in all cases but amd64;
rather than having a -G option that only
makes sense with amd64, we add a new target
amd64_apple
|
|
The maximum immediate size for 1, 2, 4, and 8 byte loads/stores is
4095, 8190, 16380, and 32760 respectively[0][1][2].
[0] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRB--immediate-
[1] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDRH--immediate-
[2] https://developer.arm.com/documentation/dui0802/a/A64-Data-Transfer-Instructions/LDR--immediate-
|
|
The recent changes in arm and riscv
typclass() set ngp to 1 when a struct
is returned via a caller-provided
buffer. This interacts bogusly with
selret() that ends up declaring a gp
register live when none is set in
the returning sequence.
The fix is simply to set cty to zero
(all registers dead) in case a caller-
provided buffer is used.
|
|
|
|
The x9 register is used for
the env parameter.
|
|
I also moved some isel logic
that would have been repeated
a third time in util.c.
|
|
|
|
The riscv test abi8.ssa caught a bug
in the arm backend. It turns out we
were using the wrong class when loading
pointers to aggregates from the stack.
The fix is simple and mirrors what is
done in the riscv abi.
|
|
The risc-v abi needs to know if a
type is defined as a union or not.
We cannot use nunion to obtain this
information because the risc-v abi
made the unfortunate decision of
treating
union { int i; }
differently from
int i;
So, instead, I introduce a single
bit flag 'isunion'.
|
|
|