Audio Library for FX

Now that @dxb is on board we have the expert for this library, which is good. To unleash the full potential of musically creative people and to enable them to work in parallel with graphics and code we need to get the conversion tools done soon. Otherwise I feel they have to fiddle too much with things they do not care about and will also keep developer busy.

Great! hopefully he can help with adding fx support to ATMlib2. I’ve been staring at ATMlib2 for a while but I’ts not easy to visualize the data flow from the synth part. Optimizing the osc part is doable. But that doesn’t require any fx code.

1 Like

I meant to convert at least part of the interrupt handler to assembler and thought I would ask you to double check whatever I came up with but I don’t believe switching to assembler is a high priority task. I suspect there are (a lot?) more CPU cycles to gain by processing and buffering samples in blocks of 16 for instance. The advantage being that the phase accumulator and phase increment values can be kept in registers during the whole batch update reducing per-sample overhead.

Which repo/branch are you looking at? I am hoping we would build on the version I am currently working on (

Which fx?

I wasn’t sure where to start so I started looking at the master branch. Seems I picked the wrong one :slight_smile:

The external serial flash cart / chip has been named the ‘fx’ chip.

What we want to do is ofload data that is normally stored in PROGMEM to the fx chip. We also want to do this with music scores. and ATMlib2 came up.

But reading data from fx chip randomly is much slower then from progmem (90 cycle setup time. Reading sequential data can be sped up considerably (1 time set up then 18 cycles down to 2 cycles per byte when using wait cycles useful).

Another issue is that fx chip can not be accessed by interrupts.

So ideally we would need a interupt driven PSG that is fed data/commands by a play sound function invoked by the main program.

1 Like

Whoops, so did I. :smile:

1 Like

I see. I was aware that may be a possibility but I wasn’t sure. Thank you for the explanation.

@bateske mentioned something about streaming sprite data from it. But I am left wondering if it will be practical to fetch game data during gameplay from SPI flash given the scarcity of RAM on this MCU, transfer setup latency and overhead (which you just mentioned).

Another question is: isn’t the number of instructions needed to deal with game objects stored in RAM going to increase? For instance for a sprite in RAM compared to having a sprite in a fixed progmem location?

I am thinking out loud and I realize I just showed up (late) and I am committing a bit of a faux pas when raising doubts about the fundamentals.

Currently the osc update interrupt calls a synth handler every 16 samples (1kHz) and the handler processes score commands and advances score pointers. While it is possible to feed commands fetched from elsewhere it would currently defy the ability of the synth to call/return from patterns (which is kinda the basic mechanims the synth was intended to be based on) because the synth needs an address of the score command to return to and the address is supposed to be in progmem.

Also scores are very small I see very little point in moving them in and out of external SPI flash. The main problem with ATMLib is the size of its codebase.

Still thinking out loud, if sequences of synth commands are to be fetched from external flash I would suggest to keep osc.c and implement a much simpler synth that can consume something similar to FTII pattern commands. Such a synth would not need to care as much about command size and would lack the complexity added by pattern call/return. i.e. trade score size for code size. As a bonus the synth may be able to consume a subset of a common tracker format making it easier for people to contribute.

Note that trackers assume wavetable-like instruments so the hypothetical synth would have to provide a mechanism to define non-wavetable instruments based on osc.c capabilities.

probably meaning drawing sprite graphics from SPI. I’ve added a drawBitmap function to the current library that draws (masked)bitmaps very fast. Here’s a demo

(There’s also a newer faster version running at 125fps but it only works on read hardware sofar)

I got that far to understand that.

yes the synth would be like a sound chip which gets fed primitive commands in sequence from fx memory stream once a frame.

Thinking out loud, maybe we could use something based on AY8910 player/trackers? Just found this

Migrated to a new thread for audio discussion specifically.

Thank you.

External flash to framebuffer blitting, makes sense and nice work by the way. Now this is what I’d like to try (I cannot do it myself right now): how much CPU is left for game logic/music when blitting, say 128+20 8x8 tiles/sprites at 30 fps, compared to blitting from internal flash? (128 tiles to fill the screen + 20 tiles for game characters) I think that would be closer to a benchmark I’d like to see as a game developer. Does that make sense or am I off mark?

Sounds good. I am guessing a buffer for command sequences will be needed because the synth must be interrupt driven (divided down sample rate) and during interrupts you cannot easily take over SPI transfers from external flash which may be in progress and restore them before exiting the interrupt. So some task (I suspect the main task if this has to be in keeping with Arduino style) must fill a synth command buffer when not rendering from external flash and the synth will pick up commands from the buffer when the interrupt calls it.

I am going to look into those. I didn’t know about, nice find! My gut feeling right now is that we should try to support a subset of popular tracker effects/format, if we can, because that way creative people can use modern tools they are used to (e.g. MilkyTracker, Renoise) and then put the resulting score through a converter script.

“How much performance cost is reading from external memory” is totally a good question, I’m curious too it would be something that should be published. From the looks of things, not very much. I’d venture a guess somewhere between 5 and 20% more CPU? I think a lot of this is being handled with the SPI hardware. It’s a shame we don’t have DMA because I think a majority of this could be completely handed off.

The tricky thing is coming up with benchmarks that result in workloads similar to that of real games. That’s why I mentioned trying to render many small tiles (instead of large ones) and pin FPS to a value that results in good gameplay without going overboard then look at how much CPU is left instead of trying to render as many as possible.

So drawing many smaller ‘tiles’ is more expensive than drawing few large ones.

9 posts were merged into an existing topic: Arduboy FX library

I spent some time mulling this over about this and my current thinking is that, unless scores are massive, fetching them from SPI flash will increase SRAM usage (say 64 bytes per channel) on top of what ATMLib normally uses and will likely increase PROGMEM usage to accommodate the added complexity.

The extra SRAM usage is for ring buffers for spooling commands between SPI flash and ATMLib. They are needed because commands need to be available when interrupts occur so they have to be pre-fretched from SPI flash. This pre-fretching also makes correctness of playback dependent on these ring buffers to never overflow which in turn means main loop execution time becomes a factor.

So basically I would warn againt implementing “streaming” ATMLib commands and score data is usually very small anyway (the largest song in Arduventure is ~ 200 bytes). Having said that I’m happy to modify ATMLib so that it would be possible to do it if people really wanted to and wrote the “glue”.

IMO the priorities should be: making it easy for prople to create music and reduce the size of the library. I need to sit down and write some code to see how much PROGMEM I can free up but the basica idea is to switch from the stacked pattern (call pattern/return) approach to one of that maps very closely to well known trackers (pattern repeat/jump). Not having to deal with nested patterns should simplify the code. I am also looking at swapping current effects out for a simpler approach inspited by the envelope generator of AY8910 (thank you @Mr.Blinky for reminding me of its existence).

So I have taken what’s in osc.c and added basic building blocks for crating effects. Blocks framed by a colored background may be removed. The arpeggiator for instance could be substituted by explicit note changes defined in instruments.

None of this stuff is remotely final or decided but I wanted to share what I have been thinking.


I think this is actually less of an issue than it would be without the FX chip.
In pre-FX Arduboy games, the asset data (images, sounds) was always a big memory hog.
With the FX chip, that’s no longer a problem, so there’s more space for code.

Wow thanks for the very detailed answer. Using standard tracker tools is really something that we should look for.
Regarding the additional SRAM usage. You said it is 64 bytes per channel. How many channels do we have?

I am only afraid that this might become a factor when having many songs + some sfx. Also by allowing tracker tools to create music the number of scores that will end up in games and their size might matters soon. Maybe not and you are right, it is more a gutt feeling. If you have time after the reworking it would be good to see if playing from the flash is feasible and how much it costs us in real scenario.

Thanks for looking into it so thoroughly.

I understand that with 256 bytes additional RAM for 4 channels and the overhead of code and time to fill those buffers.

are those ring buffers for ATM tracker commands or ‘expanced’ primitive osc commands?

ATM tracker commands. However, if we go ahead with the plan I outlined above, commands in patterns that implement (complex) instruments would end up being more primitive than they are now.

Good point.

I agree. We should probably pick a subset of a popular tracker’s commands and change ATMlib so that those commands can be mapped to ATMLib ones (easily).

4 by default configured at compile time. We could go up to 8 with minor modifications to osc.c

1 Like

8*64 = 512KB, which is a sizable chunk of RAM.

If you aren’t opposed to it I’d recommend using template parameters instead of defines for selecting the number of channels used.


template<uint8_t channels = 4>
class ATMLib
	// ...

The only thing I can see this being an issue with is if you’re writing an interrupt and that has to be different depending on the number of channels being used, in which case you’d probably have to use conditional compilation.

Those 64 bytes per channel are needed only if commands are fetched from SPI flash but I don’t think it’s worth doing. osc.c needs 4 bytes per channel.

Timer register setup and some constants are depend on channel count. The interrupt handler needs to iterate over an array of channel_count elements.