Arduboy FX library

Balls!

55 of them.

as masked sprites with with 60 moving background tiles (of which 45 are visable) running smoothly at 60fps (with USB cable disconnected)

You’ll need to run this on hardware because of the “AR” detection (and maybe the glitch)

drawballs-test.arduboy (11.8 KB)

Edit:

Here’s a video @bateske

I’m sure I can get to 60 with some more optimizing :slight_smile:

3 Likes

:heavy_heart_exclamation:

This is great, can the balls be animated with frames??

Are the balls being streamed from external memory? Since they are all the same are you using the same memory pointer in ram or anything?

Could all of the sprites be different and animated and still have this same level of performance?

1 Like

Is the music coming from the Arduboy? How did you manage this demo so fast? You are a magician?

Looks awesome.

Sure

All the sprites (masked bitmaps) and background tiles (self masked) are drawn from the flash cart

Yes

void Cart::drawBitmap(
    int16_t x, 
    int16_t y,
    uint24_t address,
    uint8_t frame,
    uint8_t mode
    )

you can have 256 frames for a sprite. Only if the width or width are different (or 256 isn’t enough ) you need to specify a different address.

Sprites and tiles can have any width and height from 1 up to 32767 pixels

No Sorry. Just playing some AY-8910 Speccy demo tunes in the background

1 Like

So good! Can you share the code? Could you help me turn it into a demo where each sprite is unique animated?

It would be a great demo to have 4 frame animations of 55 different kind of monsters that fit in your pocket. :slight_smile:

Showing that to other game developers I feel would get them psyched about the hardware.

I’ll share the code after tidying it up a bit. @bateske It’s on Github now

No problem got the sprites ?

Edit:

Added link to source.

HELP WANTED: finding enough 16x16 animated sprites :grinning:

1 Like

Hey, just saw the Cart:wait() expands to:

in      r0, 0x2d
sbrs    r0, 7
rjmp    .-6

and in the datasheet there is an example to do it like that:

wait:
  sbis  SPSR, SPIF
  rjmp wait

Do you think we can do the same or is it related to my toolchain that it does expand differently?

Hmmm, SPSR is 0x2d and therefore not in the lower IO address space, thus sbis cannot be used. Strange to see this example in the datasheet for the 16u4 and 32u4. There is a sloppy note under “Code Examples”… :angry:

They probably copy and paste the examples across multiple datasheets.


Edit:

Actually it could just be an oversight.
I checked the ATmega328P datasheet (picked a chip at random) and they give:

Wait_Transmit:
; Wait for transmission complete
in r16, SPSR
sbrs r16, SPIF
rjmp Wait_Transmit

So it’s possible that it’s a mistake on the datasheet.

Though the 328P’s datasheet is more recent (after Atmel was acquired by Microchip).

1 Like

I was just about to write that. I saw the examples in the datasheet too. It’s sloppy of them.
At one time I actually changed to sbis and facepalmed my self making me remember thats why I used 3 instructions :smile:

1 Like

I’ve been tinkering around with a few things recently and came across the seekData function which I used reading map data. Random access is not very fast due to the addressing that needs to be done each time. However I modified the seekData function to be slightly more efficient.

void Cart::seekData(uint24_t address)
{
  enable();
  SPDR = SFC_READ;
 #ifdef ARDUINO_ARCH_AVR
  asm volatile( // assembly optimizer for AVR platform
    "lds  r0, %[page]+0 \n"
    "add  %B[addr], r0  \n"
    "lds  r0, %[page]+1 \n"
    "adc  %C[addr], r0  \n"
    :[addr] "+&r" (address)
    :[page] ""    (&programDataPage)
    :
  );
 #else // C++ version for non AVR platforms
  address += (uint24_t)programDataPage << 8;
 #endif
 uint8_t t = address >> 16;
 wait();
 SPDR = t;
 t = address >> 8;
 wait();
 SPDR = t;
 t = address;
 asm volatile("nop\n");
 wait();
 SPDR = t;
 asm volatile("nop\n");
 wait();
 SPDR = 0;
}

So the shifting and address calculation is done while the read command and the first address byte already leaves the SPI FIFO. It saves a few cycles, not saving the world but I though you might be interested. Maybe even I should put this under the AVR compile switch…

It could be even written in assembly to make it more efficient and having a C++ reference code along with that. E.g. put the assembly under compile switch.

Thanks for that and bringing this under attention again. Seek data will be used a lot and it may even save a few more cycles than it may look at first glance.

I’ll do a full assembly version and hook up my logic analyzer to see if the nop will shorten the wait loops or not. (depends on the number of even/odd cycles passed)

I’m also planning to make a proper library so the cart files don’t have to be added to the project anymore. With this I’m also planning to rename the cart class to fx class

You are welcome!

Another thing that is bothering me all the time and maybe it is worth discussing this here. I am always thinking about some kind of “read ahead” feature. Not sure if it makes sense but it would be like that:

Actually there could be two flavours:

  1. a library function will install an interrupt handler for the SPI and continues to read X bytes through it into a user defined buffer whenever the application instructs it to do so.

  2. There is a library function that allows to read ahead but with a new address, same as 1 but with additional address sequence.

I do not know the ISR overhead for AVR chips. The idea is in my head for a while and maybe one can tell me if it is nonsense or worth a try.

Glad to hear that Cart is going to be renamed to FX,
it makes a lot of sense.

I’ll have to try to find some time to try out Cart and add some utility functions.


uint8_t t = address >> 16;
wait();
SPDR = t;
t = address >> 8;
wait();
SPDR = t;
t = address;

Some thoughts about this code:

  • I think the C-style cast should be changed to static_cast, because it’s more specific
  • I’m not sure what the situation currently is regarding uint24_t, but if portability is a concern then it would be best to have a fallback for when uint24_t isn’t available
  • I don’t think it’s a good idea to reuse t, I think it would be better to declare a new (preferably const) variable each time. It’s possible that doing so might even save a few bytes.

I vaguely remember @Mr.Blinky saying that this wouldn’t be possible because the screen and the FX chip share the SPI connection, but I could be misremembering.

I think it would theoretically be possible if the interrupt could be disabled before the screen is updated and renabled afterwards,
but depending on the speed/timescale I don’t think it would be particularly fast.

I totally get that. But an interrupt service isn’t really economical. When the interrupt is triggered the service routine is called and that takes 5 cycles, a bare minimum interrupt service wrapper would take 10 cycles and the return from interrupt takes another. That’s already 20 cycles wasted without doing anything yet. If the service code is written carefully without affecting the status register you could take 6 cycles of that reducing the bare interrupt wrapper to 14 cycles. a bare (unsave)minimum read SPI data and store it in an incremental buffer would still take 11 cycles.

In addition to this there will be more overhead. The display also uses SPI and the flash needs to be disabled when the data is copied to the display. disabling the flash will terminate the flash read command. So after the display function has completed a new read command must be issued (at the cost of another 100-ish cycles).

TL;DR
a bare minimum SPI service routine would take at least 25 cycles per transfer. Reading data from the buffer isn’t included, including that too it would take probably 40
cycles or more to read. Efectively twice as slow.

Appart from this, the issue is that random reads slows the reading down. using interrupts doesn’t change that.

so are you saying the class should be capitalized to FX ?

Correct I said that. If you really wanted to use interrupts. it’s posible but it’s more trouble than it’s worth. (disabling interrupts during display, keep track of flash address, etc)

correct and costly in bytes

2 Likes

I was assuming it was going to be.

Using capitalised initials for acronyms is the convention the Arduino library set
(e.g. IPAddress, PluggableUSBModule),
so I’m assuming that’s what most libraries follow (or at least probably should follow).

I probably should have said “… saying that this wouldn’t be practical because …”.

Thanks for your detailed answer. My goal was mainly to ease the “interweaving” of the SPI commands with main loop code. But looking at your numbers it would totally make no sense as the overhead is already more than it takes to transfer one byte via SPI.
I start wondering why the AVR has an interrupt for the SPI (maybe for the super slow SPI speeds…).

It will be now :slight_smile:

It’s handy for when the AVR is a SPI slave besides the super low speeds.

2 Likes

I was just thinking, it’s a shame that the people responsible for the Arduino library chose to use Print as their solution for text printing.

Being a class with a limited pool of print and println functions means it’s hard to introduce functionality for printing kinds of objects (i.e. new types).
Granted there’s the Printable class, but that won’t work for types like enum class,
and even on class types, inheriting Printable forces the implicit sizeof(void*) overhead (for the pointer to the virtual table) onto the class.

Instead they could have used something more along the lines of <iostream>'s overriding of the << operator.
That decision often gets a lot of flak because << isn’t the most obvious operator for printing text,
but the decision to allow printing via free functions means that it’s more flexible and easy to extend.
For example it would be possible to print an enum class simply by providing an overload of the << operator.

If anyone actually read this far and is wondering what the hell this has to do with the FX library, I was thinking of ways to print strings read from the FX chip.

As it stands the only possibility I can think of is:

class FXString : public Printable
{
private:
	uint24_t address;

public:
	FXString() = default;

	constexpr FXString(uint24_t address) :
		address(address)
	{
	}

	size_t printTo(Print & print) const override
	{
		// Set FX address.
		// Read bytes from FX,
		// calling print.write(value),
		// until '\0' is encountered.
	}
	
	// Other stuff ???
};

Which should end up having a size of 5.

If they had gone for the << route then it could have just been:

class FXString
{
private:
	uint24_t address;

public:
	FXString() = default;

	constexpr FXString(uint24_t address) :
		address(address)
	{
	}

	constexpr uint24_t getAddress() const
	{
		return this->address;
	}
};

// Returns a `Print &` to allow 'method chaining'
inline Print & operator <<(Print & print, const FXString & string)
{
	// Set FX address.
	// Read bytes from FX,
	// calling print.write(value),
	// until '\0' is encountered.
	return print;
}

And then FXString would only have been 3 bytes and there would have been no virtual table involved.