[Pearpc-devel] amd64 in cvs

Discussion:

Jens von der Heydt

2006-10-16 15:13:49 UTC

Permalink

Hi,

I noticed several AMD64 - Tags in CVS today. Is anybody working on
this already?

Jens

Sebastian Biallas

2006-10-16 15:46:57 UTC

Permalink

Post by Jens von der Heydt
Hi,
I noticed several AMD64 - Tags in CVS today. Is anybody working on
this already?

Yes, me :)
I have bought an Athlon X2, so I need a pearpc port :)

Unfortunately this is linux-only, since windows has a complete different
ABI, which makes platform-independence pretty much impossible.

Sebastian

Jens von der Heydt

2006-10-16 15:51:43 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Jens von der Heydt
Hi,
I noticed several AMD64 - Tags in CVS today. Is anybody working on
this already?

Yes, me :)
I have bought an Athlon X2, so I need a pearpc port :)
Unfortunately this is linux-only, since windows has a complete
different
ABI, which makes platform-independence pretty much impossible.
Sebastian

Yes, windows... that's what I found when I looked at the possibilities.
but can't we overcome those different calling conventions by using
macros?

Jens

Sebastian Biallas

2006-10-16 15:59:27 UTC

Permalink

Post by Jens von der Heydt
Yes, windows... that's what I found when I looked at the possibilities.
but can't we overcome those different calling conventions by using
macros?

The best way would be a gcc __attribute__((elf_abi)) or something like
this. I'll ask on gcc-help whether there are any plans in this direction.

Sebastian

Jens von der Heydt

2006-10-16 16:17:58 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Jens von der Heydt
Yes, windows... that's what I found when I looked at the
possibilities.
but can't we overcome those different calling conventions by using
macros?

The best way would be a gcc __attribute__((elf_abi)) or something like
this. I'll ask on gcc-help whether there are any plans in this
direction.
Sebastian

Sounds good. what's the status of the jitc_64 branch anyways?

Jens

Sebastian Biallas

2006-10-16 16:50:09 UTC

Permalink

Post by Jens von der Heydt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Jens von der Heydt
Yes, windows... that's what I found when I looked at the
possibilities.
but can't we overcome those different calling conventions by using
macros?

The best way would be a gcc __attribute__((elf_abi)) or something like
this. I'll ask on gcc-help whether there are any plans in this direction.
Sebastian

Sounds good. what's the status of the jitc_64 branch anyways?

Just started. Due to some x86_64 considerations (and making SMP
possible), the conversion might take a little bit.

Sebastian

Ryan Hennessee

2006-10-16 17:05:04 UTC

Permalink

Post by Sebastian Biallas
Just started. Due to some x86_64 considerations (and making SMP
possible), the conversion might take a little bit.
Sebastian

So with the proliferation of dual core CPUs, and quad core CPU's
in the near
future, what are your plans for PearPC? Would it be possible to
make a "dual
CPU" version of PearPC? Or would it just be a single CPU
emulated but have
it highly multithreaded to take advantage of the multiple
processors? Just
curious.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Jens von der Heydt

2006-10-16 17:21:28 UTC

Permalink

Post by Ryan Hennessee

Post by Sebastian Biallas
Just started. Due to some x86_64 considerations (and making SMP
possible), the conversion might take a little bit.
Sebastian

PearPC is already multithreaded. It does not emulate SMP
but there are several concurent threads - though it does
not really generate a workload of 100% for each core / cpu.

I think an additional runtime optimizer that analyzed code execution
paths (for example) would gain more than SMP

Jens

Sebastian Biallas

2006-10-16 17:22:26 UTC

Permalink

Post by Ryan Hennessee

Post by Sebastian Biallas
Just started. Due to some x86_64 considerations (and making SMP
possible), the conversion might take a little bit.
Sebastian

So with the proliferation of dual core CPUs, and quad core CPU's
in the near
future, what are your plans for PearPC? Would it be possible to
make a "dual
CPU" version of PearPC?

I hope so. I don't see any real problems (at least yet).

Post by Ryan Hennessee
Or would it just be a single CPU
emulated but have
it highly multithreaded to take advantage of the multiple
processors?

I don't think that this is possible.

Sebastian

Sebastian Biallas

2006-10-26 14:36:06 UTC

Permalink

Current status of AMD64:

- - it compiles :)
- - float and vector jit disabled
- - no longer gcpu and gjitc (copied to the stack)

- - And a problem:
Since near calls can only reach addresses in the +- 2 GiB range, the
translation cache should be that near to the normal code. Unfortunately
mmap reserve memory far far away from the normal code.

And I don't know how to tell mmap where I want to alloc. The hint is
only used when MAP_FIXED is used (it seems) and when using MAP_FIXED
mmap destroys already mapped and important addresses.

Any idea how to get around these mmap limitations?

Sebastian

Cassy "SnowGirl"

2006-10-26 15:17:19 UTC

Permalink

Post by Sebastian Biallas
- - it compiles :)
- - float and vector jit disabled
- - no longer gcpu and gjitc (copied to the stack)
Since near calls can only reach addresses in the +- 2 GiB range, the
translation cache should be that near to the normal code. Unfortunately
mmap reserve memory far far away from the normal code.
And I don't know how to tell mmap where I want to alloc. The hint is
only used when MAP_FIXED is used (it seems) and when using MAP_FIXED
mmap destroys already mapped and important addresses.
Any idea how to get around these mmap limitations?

Wouldn't it be possible to move an address into a register and jump to that
address? So, rather than using a near call to get to the translation cache
use a form of far call. Is there a reason why we need the near calls to
access the translation cache?

--
Cassy

Sebastian Biallas

2006-10-26 15:24:44 UTC

Permalink

Post by Cassy "SnowGirl"
Wouldn't it be possible to move an address into a register and jump to
that address?

Yes, of course, but..

Post by Cassy "SnowGirl"
So, rather than using a near call to get to the
translation cache use a form of far call. Is there a reason why we need
the near calls to access the translation cache?

.. speed and size. Currently we are calling a lot of things quite often.

Indirect call (mov reg, imm; call reg == 12 bytes)
Direct call (call imm == 5 bytes)

Sebastian

Alex Smith

2006-10-27 15:19:31 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
- - it compiles :)
- - float and vector jit disabled
- - no longer gcpu and gjitc (copied to the stack)
Since near calls can only reach addresses in the +- 2 GiB range, the
translation cache should be that near to the normal code. Unfortunately
mmap reserve memory far far away from the normal code.
And I don't know how to tell mmap where I want to alloc. The hint is
only used when MAP_FIXED is used (it seems) and when using MAP_FIXED
mmap destroys already mapped and important addresses.
Any idea how to get around these mmap limitations?
Sebastian

Hi!

First, sorry for the late reply :)

Second, I test this last night and got a segmentation fault when it
tried to boot. Is this known?

( alex ) ~/Source/amd64_branch $ src/ppc pearpc.cfg
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by
the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111 USA

AuthenticAMD
[CPU/MMU] new pagetable: sdr1 = 0x00300003
[CPU/MMU] new pagetable: sdr1 accepted
[CPU/MMU] number of pages: 2^15 pagetable_start: 0x00300000 size: 2^18
start: 0
Loading XCOFF...
[CPU/CPU] execution started at 05600adc
*** &gCPU: 5ea160, &gJITC: 5ea760
0
Segmentation fault

Third, great work! It's good to see that you are still working on PearPC.

If you need any extra help with testing on AMD64, I'd be happy to test.

Thanks,
Alex

--
Alex Smith
Frugalware Linux developer - http://www.frugalware.org
PearPC.net - http://www.pearpc.net

Gwenole Beauchesne

2006-10-26 18:36:39 UTC

Permalink

Post by Sebastian Biallas
Any idea how to get around these mmap limitations?

MAP_32BIT or build with -fPIE + normal mmap() and use RIP addressing.

Sebastian Biallas

2006-10-26 19:06:14 UTC

Permalink

Post by Gwenole Beauchesne

Post by Sebastian Biallas
Any idea how to get around these mmap limitations?

MAP_32BIT

Thanks, this works. But I guess it's linux-only.

Post by Gwenole Beauchesne
or build with -fPIE + normal mmap() and use RIP addressing.

Hmm, how does this help? Should I relocate all other code once I know
where my translation cache is?

BTW: Thanks for NSPluginWrapper :)

Sebastian

Jens von der Heydt

2006-10-26 21:13:25 UTC

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Gwenole Beauchesne

Post by Sebastian Biallas
Any idea how to get around these mmap limitations?

MAP_32BIT

Thanks, this works. But I guess it's linux-only.

Post by Gwenole Beauchesne
or build with -fPIE + normal mmap() and use RIP addressing.

Hmm, how does this help? Should I relocate all other code once I know
where my translation cache is?
BTW: Thanks for NSPluginWrapper :)
Sebastian

I'd say that -fPIE / -pie also is linux - only:
http://209.85.135.104/search?q=cache:qK2wuhR0dNsJ:www.topside.org/
~ashes/hlfs/hlfs-book-SVN-20060717/glibc/chapter02/pie.html+%22-fPIE%
22+gcc&hl=de&ct=clnk&cd=20

Jens

Gwenole Beauchesne

2006-10-26 22:05:42 UTC

Permalink

Post by Sebastian Biallas
Thanks, this works. But I guess it's linux-only.

I believe any other x86_64 OS that is reasonnable enough implements a
similar feature.

Post by Sebastian Biallas

Post by Gwenole Beauchesne
or build with -fPIE + normal mmap() and use RIP addressing.

Hmm, how does this help? Should I relocate all other code once I know
where my translation cache is?

With PIE (Position Independent Executable), the code + data sections
are relocated above 32-bit, possibly randomized. "Normal mmap()" was a
little vague. Here are a few ideas to increase the likelihood to have
the resulting area next to the relocated .text (so that branches to
non-JIT code can fit into 32-bit offsets).

a) Set the mmap() start arg so that it rounds up &main to the next 128
MB segment, for example. That is (void *)((((uintptr_t)&main) + MB_128)
& -MB_128) with const uintptr_t MB_128 = 128 * 1024 * 1024. Note
however that, nowadays (with kernel 2.6), brk space can grow at will.
So, the typical limit of 32 MB for brk is gone (used for small
allocations < 128 KB, by default). So, 128 MB is assuming you know your
global use of malloc() fits that and you don't have memory leaks.
Otherwise, it might overlap your mmap()'ed region.

b) Allocate your translation cache as uint8
translation_cache[translation_cache_size]
__attribute__((aligned(4096))). And then mprotect(+PROT_EXEC) that
region on startup. Although POSIX doesn't guarantee it, this works on
Linux. That solution is IMHO, more predictable.

With PIE, data is relocated above 32-bit too. So, you can use
RIP-relative addressing to access global data of your program from the
JIT generated code. This generally encodes by one byte shorter than
using the SIB prefix to get an absolute 32-bit address, which you can't
use anyway if you relocated your program above 32-bit.

Now that you have your pearpc executable relocated above 32-bit, you
have up to 2^32 - 4 KB (IIRC) full address space left to implement
hwmmu correctly. The following approach may work: shm_open() +
ftruncate() the fd to the desired Mac RAM size. Then, mmap 4K pages at
the desired offset from this fd + MAP_FIXED at a specific address which
corresponds to the Mac-side VA. You can have a normal mmap() of the
whole fd to get the physical Mac RAM.