Amusing byte reductions

Hi, just wondering if someone more versed in arduboy/arduino can help me out here. I have noticed that making changes to code where I include a reference to a global variable often results in a reduction in the size of my code.

For example, I have a bunch of code for item use, with the following line it is 22470 total.

 buttons_this_frame |= BTN_Lock;

All this does is cause buttons_last_frame in the next frame to have that flag set. Other parts of the code already do this in certain spots, so it shouldn’t be activating dead code paths (which should have the opposite effect on size anyway). Also, I realised that this line of code was redundant because other parts of code mean the semantics of the game done require button locking. It is basically a NOOP in terms of changes to how the game plays. When i remove the line it compiles at 22474 bytes. So somehow removing a bitwise OR on a global var has cost me 4 bytes!

I am going to take a guess and presume that this is because the addition of the global variable causes the compiler to do some different things when compiling and that is where i get the gains from. In this case it is 4 bytes which is fairly nothing, in another case though a similar process (decrementing a counter in the global ‘backpack’) saved me 22 bytes.

Am i going crazy, has anyone else seen this happening? Should I be liberally sprinkling my code with effective NOOPs because it seems to save me bytes, just in case it does.

1 Like

The compiler is configured to do link level optimisation, which can result in some pretty strange code size changes, as you’ve observed. Sometimes it seems to get things wrong.

To figure out what’s going on, you can compare the compiled output of two versions but you have to understand ATmega assembler.

Using objdump -S will produce and assembly listing. I can provide specific details if you’re interested.

2 Likes

Hard to tell anything from just one line, but I will say that using global variables does tend to produce more code than using local ones.

For example, from the “Atmel AVR4027: Tips and Tricks to Optimize Your C Code for 8-bit AVR Mirocontrollers”.

In most cases, the use of global variables is not recommended. Use local variables
whenever possible. If a variable is used only in a function, then it should be declared
inside the function as a local variable.
In theory, the choice of whether to declare a variable as a global or local variable
should be decided by how it is used.
If a global variable is declared, a unique address in the SRAM will be assigned to this
variable at program link time. Also accessing to a global variable will typically need
extra bytes (usually two bytes for a 16 bits long address) to get its address.
Local variables are preferably assigned to a register or allocated to stack if supported
when they are declared. As the function becomes active, the function’s local variables
become active as well. Once the function exits, the function’s local variables can be
removed.

They give this as an example:

#include <avr/io.h>
uint8_t global_1;
int main(void)
{
global_1 = 0xAA;
PORTB = global_1;
}

Program: 104 bytes (1.3% full)
(.text + .data + .bootloader)
Data: 1 byte (0.1% full)
(.data + .bss + .noinit)

#include <avr/io.h>
int main(void)
{
uint8_t local_1;
local_1 = 0xAA;
PORTB = local_1;
}

Program: 84 bytes (1.0% full)
(.text + .data + .bootloader)
Data: 0 bytes (0.0% full)
(.data + .bss + .noinit)

No idea if that’s related to what you’re finding or not.
It could be related to the SRAM operations - reading might be cheaper than writing or vice versa, the compiler could be caching or using specialised instructions for certain things.

I remember reading about that. I’ve actually taken that on board with my current game (my first one made poor use of global variables). In this case though the global variable was global with good purpose. It was just bizarre that adding it reduced code size :stuck_out_tongue:

1 Like

Hrm, now I think about it, it could have been something to do with memory alignment or something.

If you found a way to replicate the issue it would be interesting to poke it with a stick to find out why it happens.

Here’s one you can poke at:
You’ll have to revert the Arduboy2 library to version 3.0.0 using the library manager
Sketch > Include Library > Manage Libraries

Compile the following:

#include <Arduboy2.h>

Arduboy2 arduboy;

void setup() {
  arduboy.begin();
  arduboy.display(CLEAR_BUFFER);
//  arduboy.display(CLEAR_BUFFER);
}

void loop() {
}

Then uncomment the second commented out arduboy.display(CLEAR_BUFFER);

The compile will now be 10 bytes less.

4 Likes

It works on 3.1.1 as well.

I’m assuming it must be some kind of weird code folding or caching going on within display.

If you use false instead of true for the second argument a single call it results in 7630 instead of 7670 and for two calls it results in 7634 instead of 7660.

If I could find out how to get the assembly code it would probably be clearer what was going on, but the fact the phenomenon only happens with clear = true is interesting.

I am relieved that this can be reproduced so easily, I was worried I had maybe mangled something in the structure of my game. But damn, 40 bytes, that is significant space. I could store 5 more item sprites in that sort of real estate.

I mentioned previously that I could give details on producing an assembly listing. Since you’re interested, here goes:

(Note that I use Linux, but the procedure should be similar for Windows or Mac. For Windows, there will be backslashes instead of forward slashes as separators in the paths, and paths will probably begin with a drive letter.)

  • In the Arduino IDE, select File > Preferences and select Show verbose output during: compilation, then click on OK.

  • Open and compile (verify) the sketch you’re interested in.

  • In the output window, look for a line that is running a avr-gcc command. It will start with something similar to:
    ".../hardware/tools/avr/bin/avr-gcc" -c -g -Os -Wall ...

  • This will give you the path to the bin directory that contains the compiler, linker and other tools. The path is everything up to, and including bin without the leading quote, if present. Going forward, I’ll refer to this as <binPath>.

  • In the <binPath> directory, there should be an executable named avr-objdump. This is the tool that can produce an assembly listing.

  • The next thing you need to determine is the “temporary working directory” that was used to build and compile the sketch. This will be given as the path at the end of the same avr-gcc command above. On my Linux system, it’s something like:
    /tmp/arduino_build_486332/

  • Open a command prompt window and switch to this working directory. In it, there should be a file named the same as your sketch and ending with a .ino.elf extension. This is the file that the assembly code listing is extracted from.

  • To produce an assembly listing, you run avr-objdump -S using the .elf file and redirect its output to a file name of your choice:
    <binPath>/avr-objdump -S sketchname.ino.elf > assembly.txt

  • assembly.txt will contain the assembly code with the source code intermixed.

Note that because of the optimisation that the compiler and linker perform, the source code will often only vaguely correspond to the surrounding assembly code.

2 Likes

This is great, I saw you offered this to me on a different thread, I forgot to take you up on the offer. Thanks so much for this.

1 Like

Luckily I can skip the first few steps because I know where all my arduino stuff is and I know where the temp files get put.
It’s the commandline stuff that I really needed.

Thanks for that though, I’ve now made a batch file so I can just drag and drop stuff to decompile it instead of worrying about compiler flags. Obviously it won’t work on Linux, but if you run into any Windows users asking about decompiling, I uploaded it to github for all to use.

1 Like

Did anyone else notice the bug in the left-hand column of example 4.4 of AVR4027? Passed quietly by gcc (5.4.0) even with -Wall -Wextra, but clang (3.6) caught it with -Wall.

Which bug?

// insert 20 characters of filler here

There’s more than one?

if ad_result > 240 then it outputs an uninitialized variable. The right-hand version isn’t clearly correct, but it at least doesn’t generate errors from clang.

That’s a known issue.

The righthand version doesn’t have that issue because it behaves differently.

It’s easy to shrink code if you’re allowed to change the behavior :grinning:

1 Like