An Optimisation Story

I thought I’d share this story because other people might find it useful…

Today I was working on the virtual machine for the ‘monster collecting game’ project.

While compiling, I noticed something odd:
Global variables use 2086 bytes (81%) of dynamic memory

I was not using anywhere near enough statically-allocated memory to explain this (I checked - my entire Game object was 333 bytes), so I went digging around to find out where it was all going.

The answer: a function called toString.

Basically, to debug the opcodes I’m using for the virtual machine I need to be able to print them, and by default C++ doesn’t provide an ‘opcode (i.e. enum/enum class) to string’ function, so I had to write my own.

I was implementing this function like so:

inline FlashStringHelper toString(Opcode opcode)
		case Opcode::NoOperation:
			return F("NoOperation");
		case Opcode::EndScript:
			return F("EndScript");

		// ...

			return F("Invalid");

To check that it was this function causing the issue, I simply commented out the switch and used a return nullptr; so the code would compile.
(I didn’t need to run the code to know if this was the source of the problem, so it doesn’t matter that the return nullptr would have made everything explode at runtime.)

Doing that got me this:
Global variables use 1574 bytes (61%) of dynamic memory,

So where was the RAM going?
The clue is in the size difference: 2086 - 1574 = 512
512 happens to be 256 * 2.

If that isn’t enough to give it away, here’s a few more facts:

  • My Opcode type is an enum class that uses uint8_t as its underlying type
  • There are 256 possible values an Opcode can have, including all the invalid opcodes
  • On Arduboy, sizeof(char *) (or indeed any non-member pointer) is 2

So basically the switch is creating a lookup table - it’s generating a 256-element array of pointers and using the opcode value to index them. That’s actually a sensible thing to do. The problem is that it’s putting them in RAM when it ought to be putting them in progmem.

Ultimately this is because the compiler used by Arduino (GCC’s C++ compiler) was built for conventional computers that have a unified RAM-ROM interface, so there are times when it doesn’t realise that it should be putting something in progmem.

The solution was to manually create a new 256-element lookup table in progmem.
(With a lot of help from VSCodium’s regex-replace tool.)

inline FlashStringHelper toString(Opcode opcode)
	return readFlashStringPointer(&opcodeLookup[static_cast<uint8_t>(opcode)]);

The end result had the smaller 1574 byte RAM size with only a tiny increase in progmem size (about 14 bytes or so).

The moral of this story: the compiler is smart, but it wasn’t built with AVR in mind.
Sometimes it needs a bit of help. (In this case it needed a lot of help.)