[Pearpc-devel] SSSE3 New Instructions

Discussion:

Cassy "SnowGirl"

2006-10-06 00:24:44 UTC

So, I had it mentioned to me that Intel came out with a new set of SSE
instructions, and finally, there's one that could allow us to do VPERM more
efficiently than dumping everything out and doing it scalar-wise.
The instruction is PSHUFB, and it takes two byte vectors as arguments: A = (
a_0, a_1, a_2 ... a_15), and B = (b_0, b_1, b_2, ... b_15) It then replaces
A with ( a_(b_0) a_(b_1) a_(b_2) ... a_(b_15) ) Unless the top bit of b_n
is 1, in which case it sets it equal to zero. With this, we could translate
vperm vrA, vrB, vrC to:

MOVPS XMMA, [vrA] # map XMMA -> vrA
MOVPS XMMB, [vrB] # map XMMB -> vrB
MOVPS XMMC, [vrC] # map XMMC -> vrC

PSHUFB XMMA, XMMC # invalidate XMMA mapping

PXOR XMMC, [vrSIGN] # invalidate XMMC mapping
PSHUFB XMMB, XMMC # invalidate XMMB mapping

POR XMMA, XMMB # map XMMA -> vrD

I think that would run a bit faster than having to dump the registers out to
memory, loading offsets, etc. Of course, there's still the problem that
we're accessing in the wrong order, but that should just require a step
after loading XMMC. We should just be able to subtract 0 by the number,
then add 15 to each, then and with 0x8F. So, final solution would look
something like this:

MOVPS XMMA, [vrA] # map XMMA -> vrA
MOVPS XMMB, [vrB] # map XMMB -> vrB

PXOR XMMC, XMMC
PSUBB XMMC, [vrC] # unless someone knows a faster way to
negate
PADDB XMMC, [FIFTEENS]
PAND XMMC, [PERM_MASKS]

PSHUFB XMMA, XMMC # invalidate XMMA mapping

PXOR XMMC, [vrSIGN]
PSHUFB XMMB, XMMC # invalidate XMMB mapping

POR XMMA, XMMB # map XMMA -> vrD

Of course, we would also need support to get the CPU info for SSSE3, and it
will only be available with very new processors, (of course, the same
argument put off many SSE2 implementations for a long time also)

It looks like there are a few interesting instructions that could help with
VMHRADDSHS, which actually does get called often with OSX.

--
Cassy

Jens von der Heydt

2006-10-06 08:59:10 UTC

Permalink

Post by Cassy "SnowGirl"
So, I had it mentioned to me that Intel came out with a new set of
SSE instructions, and finally, there's one that could allow us to
do VPERM more efficiently than dumping everything out and doing it
scalar-wise.
The instruction is PSHUFB, and it takes two byte vectors as
arguments: A = ( a_0, a_1, a_2 ... a_15), and B = (b_0, b_1,
b_2, ... b_15) It then replaces A with ( a_(b_0) a_(b_1) a_
(b_2) ... a_(b_15) ) Unless the top bit of b_n is 1, in which case
it sets it equal to zero. With this, we could translate vperm vrA,
....
MOVPS XMMA, [vrA] # map XMMA -> vrA
MOVPS XMMB, [vrB] # map XMMB -> vrB
PXOR XMMC, XMMC
PSUBB XMMC, [vrC] # unless someone knows a faster way
to negate
PADDB XMMC, [FIFTEENS]
PAND XMMC, [PERM_MASKS]
PSHUFB XMMA, XMMC # invalidate XMMA mapping
PXOR XMMC, [vrSIGN]
PSHUFB XMMB, XMMC # invalidate XMMB mapping
POR XMMA, XMMB # map XMMA -> vrD
Of course, we would also need support to get the CPU info for
SSSE3, and it will only be available with very new processors, (of
course, the same argument put off many SSE2 implementations for a
long time also)
It looks like there are a few interesting instructions that could
help with VMHRADDSHS, which actually does get called often with OSX.
--
Cassy

Cassy,

I'm always happy to see people actually reading the code and also
work on it. Could you perhaps provide a
patch that included your changes? You should include checks for SSE3
just like we had for SSE2 in CPUINFO.

Jens

Cassy "SnowGirl"

2006-10-06 14:22:41 UTC

Permalink

lol, Jens, it's me, the person, who was formerly Daniel Foesch.

I suppose some other people on the list didn't get a heads up that I'm
changing gender, but well, I guess now is as as good as any.

Hi all, this is Daniel, I'm changing genders, and will be known in the
future as Cassondra, or Cassy among my friends.

--
Cassy

Jens von der Heydt

2006-10-06 14:30:42 UTC

Permalink

Post by Jens von der Heydt
Cassy,
I'm always happy to see people actually reading the code and also
work on it. Could you perhaps provide a
patch that included your changes? You should include checks for SSE3
just like we had for SSE2 in CPUINFO.
Jens
lol, Jens, it's me, the person, who was formerly Daniel Foesch.
I suppose some other people on the list didn't get a heads up that
I'm changing gender, but well, I guess now is as as good as any.
Hi all, this is Daniel, I'm changing genders, and will be known in
the future as Cassondra, or Cassy among my friends.
--
Cassy

Oh well, I should've remembered that name. I'm sorry, :-)
... and I thought we had a new genious working on something usefull .
hehe

Jens

Brian A. Onn

2006-10-06 20:59:58 UTC

Permalink

Post by Jens von der Heydt

Post by Cassy "SnowGirl"
lol, Jens, it's me, the person, who was formerly Daniel Foesch.
I suppose some other people on the list didn't get a heads up that
I'm changing gender, but well, I guess now is as as good as any.
Hi all, this is Daniel, I'm changing genders, and will be known in
the future as Cassondra, or Cassy among my friends.
--
Cassy

Oh well, I should've remembered that name. I'm sorry, :-)
... and I thought we had a new genious working on something usefull . hehe
Jens

hehe... Wow. Me too. I missed the heads-up and thought we had a new
bright light among us :)

-- Brian

Ryan Hennessee

2006-10-06 14:27:41 UTC

Permalink

From:pearpc-devel-***@lists.sourceforge.net[mailto:pearpc-devel-***@lists.sourceforge.net] On Behalf Of Cassy"SnowGirl"
Sent: Friday, October 06, 2006 9:23 AM
To: pearpc-***@lists.sourceforge.net
Subject: Re: [Pearpc-devel] SSSE3 New Instructions

2006/10/6, Jens von der Heydt <***@vdh-webservice.de>:

Cassy,

I'm always happy to see people actually reading the code and also
work on it. Could you perhaps provide a
patch that included your changes? You should include checks for SSE3
just like we had for SSE2 in CPUINFO.

Jens

What about SSE4, which is in the Core 2 Duo’s Idon’t know much about it, but maybe theirs something useful.

---------------------------------
How low will we go? Check out Yahoo! Messengers low PC-to-Phone call rates.

Jens von der Heydt

2006-10-06 14:32:06 UTC

Permalink

Post by Jens von der Heydt
Cassy,
I'm always happy to see people actually reading the code and also
work on it. Could you perhaps provide a
patch that included your changes? You should include checks for SSE3
just like we had for SSE2 in CPUINFO.
Jens
What about SSE4, which is in the Core 2 Duos I dont know much
about it, but maybe theirs something useful.

Why not.... As long as we do a check for CPU-Specs I'm ok with it, if
you can find something usefull in SSE4 :)

Jens

g***@free.fr

2006-10-06 14:54:06 UTC

Permalink

Post by Ryan Hennessee
What about SSE4, which is in the Core 2 Duo’s Idon’t know much
about it, but maybe theirs something useful.

What people thought was SSE4 in new Core 2 platforms is actually SSSE3. SSE4 is
not disclosed yet to the public.

Sebastian Biallas

2006-10-06 16:55:46 UTC

Permalink

Post by g***@free.fr

Post by Ryan Hennessee
What about SSE4, which is in the Core 2 Duo’s Idon’t know much
about it, but maybe theirs something useful.

What people thought was SSE4 in new Core 2 platforms is actually SSSE3. SSE4 is
not disclosed yet to the public.

I guess the final SSE4 instruction set isn't even frozen.

Sebastian