[Pearpc-devel] another small change in jitc

Discussion:

Jens von der Heydt

2006-04-26 12:37:06 UTC

Jitc produces code like this:

mov [r3], edx
mov eax, [current_code_base]
add eax, 00000064
mov [lr], eax
mov eax, 00000078
call ppc_heartbeat_ext_rel_asm
mov eax, 00000078
call ppc_new_pc_this_page_asm
nop

in some branch conditions.
Notice the double loading of EAX. This makes sence if we take
ppc_heartbeat_ext_rel_asm to destroy EAX.
It does in some conditions but then it will never return, I gather
from its sources. Correct me if I'm wrong !

Therefore I think that the second LOAD of eax is not needed here.
See below the
corrected version from ppc_opc.cc. I commented out the second gen of
Load EAX.

I tested this and benchmarked again. Combined with my previous post
to this list
I get an XBench estimate of 18.424 Points (tested with 5 test runs).
Not bad
If I look at the score of 7.484 without the 2 patches I did.

static inline void ppc_opc_gen_set_pc_rel(uint32 li)
{
li += gJITC.pc;
if (li < 4096) {
/*
* We assure here 7+6+5+5 bytes, to have enough
space for
* four instructions (since we want to modify
them)
*/
jitcEmitAssure(7+6+5+5);

asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_heartbeat_ext_rel_asm);
// asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_new_pc_this_page_asm);
asmNOP(3);
} else {
asmALURegImm(X86_MOV, EAX, li);
asmJMP((NativeAddress)ppc_new_pc_rel_asm);
}
}

The only change I did to the above function is to remark that one
line. So pardon me for not creating a patch.

Please comment.

Jens

Stefan Reinauer

2006-04-26 12:55:22 UTC

Permalink

I tested this and benchmarked again. Combined with my previous post to
this list I get an XBench estimate of 18.424 Points (tested with 5
test runs). Not bad If I look at the score of 7.484 without the 2
patches I did.

Does this imply a speedup of factor 2.46 with those two patches?

You're the man! If you continue like that we'll probably end up with
running OSX/ppc on the intel macs because it's faster ;-)))

Stefan

Jens von der Heydt

2006-04-26 13:19:12 UTC

Permalink

Post by Stefan Reinauer

Does this imply a speedup of factor 2.46 with those two patches?
You're the man! If you continue like that we'll probably end up with
running OSX/ppc on the intel macs because it's faster ;-)))
Stefan

Well the calculation would say you're right, but I only did bench all
CPU and Memory- Functions.
I did not run Disk I/O and graphics though I would tend to say that
they benefit too. But still,
with those two small patches PearPC does not fly :) .. if they dont
break anything.

Jens

Sebastian Biallas

2006-04-27 00:33:14 UTC

Permalink

Post by Jens von der Heydt
static inline void ppc_opc_gen_set_pc_rel(uint32 li)
{
li += gJITC.pc;
if (li < 4096) {
/*
* We assure here 7+6+5+5 bytes, to have enough
space for
* four instructions (since we want to modify them)
*/
jitcEmitAssure(7+6+5+5);
asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_heartbeat_ext_rel_asm);
// asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_new_pc_this_page_asm);
asmNOP(3);
} else {
asmALURegImm(X86_MOV, EAX, li);
asmJMP((NativeAddress)ppc_new_pc_rel_asm);
}
}

The logic is much more complicated here: This is self-modifying code.
ppc_new_pc_this_page_asm will modify the callers code to a direct jump
to the indented destination when taken (since it's an inner-page jump,
it's independent of MMU-stuff. So we can patch it to a native jump once
we know where this jump should go to). But I guess we could get rid of
the mov (but not that simple, take a look at ppc_new_pc_this_page_asm:
it relys on the above layout).

One might argue that this code is very fragile :)
But this self-modifying stuff *has* infact a huge performance gain since
it speeds up inner loops.

And also take a note of the "#if 0"ed stuff in
ppc_new_pc_this_page_asm.. I planned to inline ppc_heartbeat_ext_rel_asm
but writing such hackish code is a huge pain in the ass especially
because gdb sucks. Well, I could use td32.

- --
Sebastian

Jens von der Heydt

2006-04-27 07:14:50 UTC

Permalink

Post by Sebastian Biallas

Post by Jens von der Heydt
* We assure here 7+6+5+5 bytes, to have enough
space for
* four instructions (since we want to modify them)
*/
jitcEmitAssure(7+6+5+5);
asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_heartbeat_ext_rel_asm);
// asmMOVRegImm_NoFlags(EAX, li);
asmCALL((NativeAddress)ppc_new_pc_this_page_asm);
asmNOP(3);

Now that you mention it, I thought that this jitcEmitAssure was
pretty important :)
but I did not take it as a seat belt for later patching,

Post by Sebastian Biallas
One might argue that this code is very fragile :)
But this self-modifying stuff *has* infact a huge performance gain since
it speeds up inner loops.

Of course one might say that, but as long as we document these
important parts (and you should have done so more clearly, at least
in the above
C function) I'm completely ok with this. Branches and MMU are the most
expensive things for PearPC. Whatever we can do to speed them up....

Post by Sebastian Biallas
And also take a note of the "#if 0"ed stuff in
ppc_new_pc_this_page_asm.. I planned to inline
ppc_heartbeat_ext_rel_asm
but writing such hackish code is a huge pain in the ass especially
because gdb sucks. Well, I could use td32.

Yes, I had a look. Looks like you had some GDB-fun (tm) and lost the
battle.
What if we replaced the mov %eax, %%% with asmNOP(5) ?
You have enough space to patch and my move is gone. I tested it
and the speed is still better in XBench.

Post by Sebastian Biallas
- --
Sebastian

jens

PS: what about the other small fix I mentioned in that other mail?