Post by p***@coreytabaka.comThis may be a little OT, but I've noticed that there is a lot of overkill on
the byte swapping; there are numerous places where a value is byte swapped
only to be swapped back in the function that the value gets passed to. This
is a considerable performance degredation, esp since it occurs most often in
the IO pipe.
Theoretically, byte swapping should only occur on reads/writes to memory
(except maybe in very specific circumstances). On bigendian machines, byts
wapping should rarely occur. Does anyone else agree with this?
This kind of cleanup could make for a nice addition to the release.
It's a very good idea, and I agree in many ways. Of course, there are
issues here were for instance, when doing memcpy with altivec, that we
byte swap an entire 128-bit vector just to byte-swap it back when
writing it to memory. Why do this? Because it's very difficult to
track this sort of thing, and prevent it.
I made a preliminary patch that would not byte-swap unless it had to.
Unfortunately, since there's no in-vector way to byteswap a vector on
the x86 (thanks Intel/AMD) if I just load the vector to a vector
register, then later fix its byte-order, then I have to write the
vector out to memory, fix it's byte-order by moving it to a different
piece of memory, then load it back up into the vector.
This looks all nice, until you run into a problem like, if I load up
two vectors at a time. The second vector might page fault, so I have
to write out the first vector to the reg file. Now, let's say that
the second read page faults, then we'll pick up some other code, and
load in the page, then go back to executing from where we left off, at
the second vector read. How do we know what byte-order the first
vector was stored in? So, now we have to fix byte-order for every
register write to our emulated register bank.
It turns out that this overhead of tracking this stuff results in a
slow-down and a net loss of performance.
Byteswapping on the x86 is relatively low impact. Either we're just
executing a BSWAP (a very low cost instruction), or we're
reading/writing 16 bytes in reverse order, which due to the way
cache-lines work is no slower than reading/writing 16 bytes in forward
order.
Anyways, if you have any explicit points that you can point out where
it's absolutely useless, we can look into it after 0.4 release. Right
now, we're triyng to actually get a release out, and we need to hit
things that are important right now for stability. Not add features.
(The second IDE controller patch is hardly a "new feature" anymore, as
so many people have been "testing" it for us.)
--
Daniel Foesch