I like Arduboy2’s masking, and I like ArdBitmap’s mirroring, so I combined the two into one method. I find this one method meets all of my sprite needs, and the mirroring saves a lot of PROGMEM space. Thanks to the authors of these great libraries!
Performance-wise it should be just about identical to drawExternalMask if you are not mirroring, as it only needs to check if mirroring is on a handful of times. If you are horizontal mirroring, I can’t imagine the perf difference is noticeable as it’s just doing a handful of extra calculations. Vertical mirroring requires flipping the data in all the bytes in the bitmap and the mask, so I suspect it is a bit slower. I haven’t noticed any difference in my game.
Size wise is hard to say. It depends on how well you can take advantage of mirroring. The actual function seems to be smaller than using drawExternalMask. A test sketch I just built using SpritesB::drawExternalMask came out to 7924 bytes, but with this drawBitmap it was 7472 bytes.
For my game, sometimes I can choose to mirror instead of choosing a different sprite frame at the same cost, and in those cases I get really nice savings wins. Other parts of my game I have to add more logic to decide when to mirror, and that logic eats into the savings. I’d estimate so far mirroring has saved me about 2k of space.
If someone benched this you’d likely find horizontal a little slower (that extra check inside the innermost loop will cost you something) and vertical WAY slower. Bit shifting is SUPER expensive on this platform. It might actually even be faster to write a bunch of IF/AND/OR code that just manually moved every single bit.
Of course if you have plenty of CPU to spare you wouldn’t actually notice any difference in your game - since you never actually “see” the rendering speed. In that case you’re paying with slightly less battery life since you’re using more of the CPU to get your rendering done. You would start to see a slowdown once you hit 100% CPU.
I dunno if the compiler is smart enough but in AVR assembly you can do this with a single cycle with SWAP which just reverses the nibbles of a byte. If the compiler isn’t doing this you’d improved this a LOT by dropping to inline assembler just for that single instruction.
Bit shifting on this platform at the assembler level becomes a for loop. So at the lowest level the CPU is doing:
for(i=4; i>0; i--)
value <<= 1
So that’s 2 for loops two ANDs, one OR… so easily 20+ CPU cycles vs a single cycle for a SWAP.