vDSP on iPod 3G, ARM11

Thanks to the generous creator of Bowmaster, I now have an iPod Touch 3G at my disposal to test the hardware for one week.

Here I’m gonna start to compare the interleaved-array fillrate using sequential loops vs. the performance of vectorized operations with vDSP. I did this for my iPod 2G in another post.

The Hardware:

The operation I want to compare is done by the CPU. On the iPod Touch 3G thats an ARM Cortex-A8 833Mhz (underclocked at 600Mhz). The iPod Touch 2G only had an ARM11 620Mhz(533Mhz) and in the WWDC Session 202 – The Accelerate framework for iPhone (available on iTunes for registered Developers) they said, the Cortex has superior vector-operation capabilities.

So let’s test them and see for ourselves 🙂

Again I’m putting 10000 scrub-geometries into the interleaved array, each consisting of 486 vertices. The code is pretty much the same as in the old post, except that I don’t move the normals around anymore.

I changed the normals  datatype to “short” to decrease the size of the interleaved array and they don’t really look worse if you don’t turn their normals. Therefore the times on the old iPod 2G changed a little so I’m going to test there again, too.

So the sequential code looks very straight-forward now:

for (int k=0; k<scrubVertexCount; k++) {
 _interleavedVerts[_vertexCount+k].v.x = xc + co*scrubX[k] - si*scrubY[k];
 _interleavedVerts[_vertexCount+k].v.y = yc + si*scrubX[k] + co*scrubY[k];

The vDSP-Code was:

//tempVecX = co* x + xc;
vDSP_vsmsa(scrubX, 1, &co, &xc, tempVecX, 1, scrubVertexCount);
//tempVecY = si* x + yc;
vDSP_vsmsa(scrubX, 1, &si, &yc, tempVecY, 1, scrubVertexCount);
// x = -si*y + tempVecX
vDSP_vsma(scrubY, 1, &msi, tempVecX, 1, &_interleavedVerts[_vertexCount].v.x, stride, scrubVertexCount);
// y = co*y  + tempVecY
vDSP_vsma(scrubY, 1, &co, tempVecY, 1, &_interleavedVerts[_vertexCount].v.y, stride, scrubVertexCount);


So , what do we make of this data?

On 3G, the advantage of using vDSP stays roughly the same, at about 2.5 as before (Remember we had more complicated sequential code).

Things that spring to mind are:

  • even though the ARM Cortex-A8 has only 77 more Mhz, for vector-operations it seems to be 5 times as fast. I suspect that some pipeline that gets the data to the processor from memory is bigger (or just more cache)
  • the advantage of using vDSP is still about 2, so compared to the advantage on 2G we lost a little.

I also think that maybe the 3G runs a little code optimization of the sequential code, maybe even some vectorization of its own!

One Response to “vDSP on iPod 3G, ARM11”

  1. […] Developing Tactica Blog about a dev trying to make a good strategy game for the iPhone « vDSP on iPod 3G, ARM11 […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: