Post by Sebastian BiallasThanks, this works. But I guess it's linux-only.
I believe any other x86_64 OS that is reasonnable enough implements a
similar feature.
Post by Sebastian BiallasPost by Gwenole Beauchesneor build with -fPIE + normal mmap() and use RIP addressing.
Hmm, how does this help? Should I relocate all other code once I know
where my translation cache is?
With PIE (Position Independent Executable), the code + data sections
are relocated above 32-bit, possibly randomized. "Normal mmap()" was a
little vague. Here are a few ideas to increase the likelihood to have
the resulting area next to the relocated .text (so that branches to
non-JIT code can fit into 32-bit offsets).
a) Set the mmap() start arg so that it rounds up &main to the next 128
MB segment, for example. That is (void *)((((uintptr_t)&main) + MB_128)
& -MB_128) with const uintptr_t MB_128 = 128 * 1024 * 1024. Note
however that, nowadays (with kernel 2.6), brk space can grow at will.
So, the typical limit of 32 MB for brk is gone (used for small
allocations < 128 KB, by default). So, 128 MB is assuming you know your
global use of malloc() fits that and you don't have memory leaks.
Otherwise, it might overlap your mmap()'ed region.
b) Allocate your translation cache as uint8
translation_cache[translation_cache_size]
__attribute__((aligned(4096))). And then mprotect(+PROT_EXEC) that
region on startup. Although POSIX doesn't guarantee it, this works on
Linux. That solution is IMHO, more predictable.
With PIE, data is relocated above 32-bit too. So, you can use
RIP-relative addressing to access global data of your program from the
JIT generated code. This generally encodes by one byte shorter than
using the SIB prefix to get an absolute 32-bit address, which you can't
use anyway if you relocated your program above 32-bit.
Now that you have your pearpc executable relocated above 32-bit, you
have up to 2^32 - 4 KB (IIRC) full address space left to implement
hwmmu correctly. The following approach may work: shm_open() +
ftruncate() the fd to the desired Mac RAM size. Then, mmap 4K pages at
the desired offset from this fd + MAP_FIXED at a specific address which
corresponds to the Mac-side VA. You can have a normal mmap() of the
whole fd to get the physical Mac RAM.