Tweaking the samplesCallbackEvent loop
Playing around with the new Flash player 10 audio processing functionality the need for optimization becomes very apparent when you want to apply effects to several tracks of audio.
With a sample rate of 44100 and a dozen stereo tracks we are talking over a million samples to be processed per second where each process you apply will have probably at least some 30 operations. All of a sudden the great performance of AVM2 becomes quite limiting.
So it's important to squeeze out every drop of performance you can by optimizing the code.
First of all I have been benchmarking the performance of running code inside the processing loop, in a function, in an external class and inside a loaded swf (would have been neat for the possibility to plug in effects without recompiling the main swf).
The code I used for testing was to process a value and return it like this (obviously without function enclosure when doing the processing in the local scope):
public function calculate(num:Number):Number
{
return num * 1.01;
}
The time needed in ms when calling the function 10 000 000 times:
- Locally: 46
- Calling a function in the same class: 213
- Calling a function in a separate class: 213
- Calling a function in an externally loaded swf: 2347
Not so surprising results.
Having processing code in an external swf is obviously not an option. I tried with both simply sticking a function in the swf or in a class which I retrieved by applicationDomain.getDefinition and both methods performed equally bad.
Doing processing locally instead of in a separate function or class is a lot faster, but obviously that could easily becomes very cumbersome and ugly.
At least there is nothing lost on having the function in a separate class compared to having a function in the same class.
Something that does surprise me a bit is that when just calling the function once and having the loop inside the function instead the resulting time was 75ms.
That's about 30 ms added for just one function call so it seems like the first call is a lot more expensive.
One would think that the conclusion is that the best approach when processing audio if one like to avoid placing the code inside the samplesCallbackEvent loop seems to be calling the processing code once and then iterate over the size of the buffer in the class for the effect.
This is exactly what I was suggested by Spender when I posted a 3-band EQ example.
The problem there and why my attempt at implementing his suggestion failed at making an improvement is that reading and writing floats in a ByteArray is slower than the function calls.
Testing to writeFloat 10 000 000 times to then read loop through them to read them again takes 1727 ms. So compared with the 213 ms doing the same amount of function calls it's clear that function calls actually is comparatively cheap. A Vector fares a bit better then the ByteArray with 1239 ms.
So the optimal approach seems to be to only do samples.readFloat once then use the returned value doing function calls for each process you like to apply before you do samplesCallbackData.writeFloat




hey leo,
i’ve been benchmarking too! it’s important to get this right from the outset. have you done any investigation into the Vector class. I’m seeing good results with this, and I am now thinking that the fastest method (in the case of several chained fx) is to copy any ByteArray data into a Vector, and to use the Vector to pass data between processors, then to copy it to the output bytearray when finished.
I also wonder if some of the results seen here are due to internal (player) casting between single and double precision floating point numbers. Every time you read/write a float to a ByteArray, there must somewhere be an implicit conversion between double precision (Number type) and the single precision stored in the ByteArray. I’m starting to suspect that there’s quite a performance hit here.
hmm.. just read your post again. so what you are saying is that it’s best to pull data through a chain of processors a sample at a time, and not bother passing it through in buffers if each processor needs to both read and write to any buffer?
good post btw.
That’s the conclusion I have reached, but I do find it very surprising.
It does contradict what I would have expected and what seems to be usual procedure when comparing to for example VST plugins or how it’s done in popforge, which I would think is well optimized.
I tried using a Vector which is a bit faster than the ByteArray, but like I mentioned in my test reading and writing from a Vector 10 000 000 times takes 1239 ms while passing a value to a function and returning it takes 213 ms.
So it seems like in AS3 the optimal approach must be to make sure you only read and write once from the ByteArray and simply pass the sample trough a function for each effect you like to apply.