Combining Arduboy2's drawExternalMask and ArdBitmap's drawBitmap into one

I like Arduboy2’s masking, and I like ArdBitmap’s mirroring, so I combined the two into one method. I find this one method meets all of my sprite needs, and the mirroring saves a lot of PROGMEM space. Thanks to the authors of these great libraries!

Combined method is here:

Oh, I also had an issue with ArdBitmap’s vertical mirroring (created an issue here). My combined method in the gist fixes this bug.



How does this compare to the original in terms of size and performance?

1 Like

Performance-wise it should be just about identical to drawExternalMask if you are not mirroring, as it only needs to check if mirroring is on a handful of times. If you are horizontal mirroring, I can’t imagine the perf difference is noticeable as it’s just doing a handful of extra calculations. Vertical mirroring requires flipping the data in all the bytes in the bitmap and the mask, so I suspect it is a bit slower. I haven’t noticed any difference in my game.

Size wise is hard to say. It depends on how well you can take advantage of mirroring. The actual function seems to be smaller than using drawExternalMask. A test sketch I just built using SpritesB::drawExternalMask came out to 7924 bytes, but with this drawBitmap it was 7472 bytes.

For my game, sometimes I can choose to mirror instead of choosing a different sprite frame at the same cost, and in those cases I get really nice savings wins. Other parts of my game I have to add more logic to decide when to mirror, and that logic eats into the savings. I’d estimate so far mirroring has saved me about 2k of space.

1 Like

I also added invert as an option. It will draw the sprite with white being black and black being white. This again is a nice space savings if you ever need inverted versions of your sprites.

When inverting, it does data = data ^ 0xFF for every bite in the bitmap, so possibly a slight perf hit.

Why data = data ^ 0xFF and not data = ~data?
The latter should be shorter
(assuming the compiler doesn’t realise that it can convert the former into the latter).

Ah yeah, ~ would work too.

Actually it needs to be data = ~data & mask_data to get the proper effect.

If someone benched this you’d likely find horizontal a little slower (that extra check inside the innermost loop will cost you something) and vertical WAY slower. Bit shifting is SUPER expensive on this platform. It might actually even be faster to write a bunch of IF/AND/OR code that just manually moved every single bit.

Of course if you have plenty of CPU to spare you wouldn’t actually notice any difference in your game - since you never actually “see” the rendering speed. In that case you’re paying with slightly less battery life since you’re using more of the CPU to get your rendering done. You would start to see a slowdown once you hit 100% CPU.

1 Like
data = (data & 0xF0) >> 4 | (data & 0x0F) << 4;

I dunno if the compiler is smart enough but in AVR assembly you can do this with a single cycle with SWAP which just reverses the nibbles of a byte. If the compiler isn’t doing this you’d improved this a LOT by dropping to inline assembler just for that single instruction.

Bit shifting on this platform at the assembler level becomes a for loop. So at the lowest level the CPU is doing:

for(i=4; i>0; i--) 
 value <<= 1

So that’s 2 for loops two ANDs, one OR… so easily 20+ CPU cycles vs a single cycle for a SWAP.

1 Like

Thanks Josh, that’s good info. If I start to see a CPU hit I’ll investigate dropping to assembler. For my game, flash storage is more precious than CPU, so so far it’s proven to be a good trade off.

I’d imagine if someone is making a CPU intensive game, having actual flipped sprites stored in flash would be better than using this method.

1 Like