Discussion:
[Pearpc-devel] small jitc speedup. please comment
Jens von der Heydt
2006-04-26 11:31:53 UTC
Permalink
Hi folks,

I was reading through jitc.log when I found some
small assembler parts that really could be easily optimized.

for example something like this:


mflr r0
mov eax, [ lr ]
mov ecd, eax
.


That's pretty easy and ok. But what I asked myself was why It did
something like the above
even in the case that another register (nearly 3-4 opcodes ealier)
already contained [ lr ]
and wasn't re-used. So there should be cases where we can actually
save one of
the above moves.


So I changed function move_mem (reg1, reg2..) in x86asm.c:

Org:

static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
}

to this:

static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg = jitcGetClientRegisterMapping(creg2);
if (reg == REG_NO) {
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
} else {
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg);
}
}



This does work for me and produces more optimized assembler when
possible.
Booting OSX generated about 23,000 times optimized versions of
register moves.

The interesting bit is that when running XBench (5 times per Test) I get
around 7.484 Points, and with optimization I get 13.176 Points.

That's quite a difference I would say. I would not imagine such a small
assembler change to have so dramatic effect. I wonder if the above
function
is right at all, since my jitc practive is very limited. Please have
a look and comment.


Jens
Alex Smith
2006-05-07 06:31:08 UTC
Permalink
Post by Jens von der Heydt
Hi folks,
I was reading through jitc.log when I found some
small assembler parts that really could be easily optimized.
mflr r0
mov eax, [ lr ]
mov ecd, eax
.
That's pretty easy and ok. But what I asked myself was why It did
something like the above
even in the case that another register (nearly 3-4 opcodes ealier)
already contained [ lr ]
and wasn't re-used. So there should be cases where we can actually
save one of
the above moves.
static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
}
static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg = jitcGetClientRegisterMapping(creg2);
if (reg == REG_NO) {
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
} else {
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg);
}
}
This does work for me and produces more optimized assembler when
possible.
Booting OSX generated about 23,000 times optimized versions of
register moves.
The interesting bit is that when running XBench (5 times per Test) I get
around 7.484 Points, and with optimization I get 13.176 Points.
That's quite a difference I would say. I would not imagine such a small
assembler change to have so dramatic effect. I wonder if the above
function
is right at all, since my jitc practive is very limited. Please have
a look and comment.
Jens
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Pearpc-devel mailing list
https://lists.sourceforge.net/lists/listinfo/pearpc-devel
It was mentioned on the PearPC.net forums that this change is not in
x86_asm.cc, so I checked and found that it should be in ppc_opc.cc

Alex
Jens von der Heydt
2006-05-09 06:11:04 UTC
Permalink
Post by Jens von der Heydt
static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg = jitcGetClientRegisterMapping(creg2);
if (reg == REG_NO) {
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
} else {
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg);
}
}
It was mentioned on the PearPC.net forums that this change is not
in x86_asm.cc, so I checked and found that it should be in ppc_opc.cc
Alex
Alex, though I think this patch is correct I would like people to
actually comment on it and report if it gave some speed improvements.
Sebastian did not comment on this part so it could also be a buggy
change :)

Jens
Hugh McMaster
2006-05-09 07:56:07 UTC
Permalink
Post by Jens von der Heydt
Post by Jens von der Heydt
static void inline move_reg(PPC_Register creg1, PPC_Register creg2)
{
NativeReg reg = jitcGetClientRegisterMapping(creg2);
if (reg == REG_NO) {
NativeReg reg2 = jitcGetClientRegister(creg2);
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg2);
} else {
NativeReg reg1 = jitcMapClientRegisterDirty(creg1);
asmALURegReg(X86_MOV, reg1, reg);
}
}
It was mentioned on the PearPC.net forums that this change is not
in x86_asm.cc, so I checked and found that it should be in ppc_opc.cc
Alex
Alex, though I think this patch is correct I would like people to
actually comment on it and report if it gave some speed improvements.
Sebastian did not comment on this part so it could also be a buggy
change :)
Jens
I tried the compiled build on PearPC.net. Contrary to what a user on the
forums said, I got a speed improvement. The standard 0.4 build booted (to
the finder being ready) in a time of 1 minute 26 seconds. With the two
patches compiled in, PearPC booted in a time of 1 minute 9 seconds.

Hope this helps.
Jens von der Heydt
2006-05-09 08:32:56 UTC
Permalink
Post by Alex Smith
Post by Alex Smith
It was mentioned on the PearPC.net forums that this change is not
in x86_asm.cc, so I checked and found that it should be in
ppc_opc.cc
Post by Alex Smith
Alex
Alex, though I think this patch is correct I would like people to
actually comment on it and report if it gave some speed improvements.
Sebastian did not comment on this part so it could also be a buggy
change :)
Jens
I tried the compiled build on PearPC.net. Contrary to what a user
on the forums said, I got a speed improvement. The standard 0.4
build booted (to the finder being ready) in a time of 1 minute 26
seconds. With the two patches compiled in, PearPC booted in a time
of 1 minute 9 seconds.
Hi,

the time PearPC takes to boot is not an exact method in this case.
You would have to run a benchmark though if you can reproduce
the behaviour and every boot cycle is faster for you, it's an
indication.

Jens

Loading...