Image corruption when using drawPlusMask() in some cases [register allocation issue]

I was merely experimenting to see what effect it would have.
It is typically easier to figure out why something works than why something doesn’t work.

Also I’m pretty confident the two test cases we have at the moment are suitable.


I’ve found something that seems to work, but I don’t know why.

I changed [buffer_ofs_jump] and [sprite_ofs_jump] to the "l" constraint (lower register). It very slightly increased the code size, but the code ran and worked properly and it seems to scale. But then it broke in the ‘box’ example.

Even more oddly, if I make just one of them a lower register I get a warning about an “undefined combination of operands” coming from an .s file in the temp directory, which rather annoyingly appears to get deleted not long after. I get the same message by removing the “r24”. The sketch doesn’t work when this occurs.

Several other changes (such as removing [y_count], which doesn’t appear to be used in the assembly code) result in this same error but I have no clue as to what’s causing it. I suspect there’s a way to stop the compiler deleting that file or turn the warning into an error though.

Doesn’t work here. I get a different kind of corruption. But I’m using the code with less register pressure to begin with… something is funny here and if you change the requirements enough you’ll get the compiler to make the right choice for your samples, but that doesn’t mean we’ve really truly fixed anything. By forcing “l” you’re probably just increasing the pressure further for your sample cases and the choices the compiler is making for registers somehow fix things.

It’s possible disassembling the compiled output would help, but I’m really not sure I have the time to dig into this that deeply. If never inlining solves this (for all cases) then I think perhaps we just make the core library never inline.

The temp directory should stick around quite a while if Arduino is still open. Just find it’s path and go there. At least that’s been my experience.

Well in some cases you can just read the code and find the bug… or at least make educated guesses… but for me it usually starts with an idea of why it doesn’t work… otherwise I’m just randomly changing stuff and crossing my fingers. But one by one my ideas are getting cancelled out on this one. :frowning:

The code I’m working with right now… with extra push/pop, less register use, etc. (test branch)

I’m beginning to think this is an issue with the compiler’s register allocation heuristic.

The double “l” constraint is the only thing other than noinline that I’ve found to work, but I’m beginning to distrust some of the results I’m getting. Sometimes the exact same code produces different results. I suspect this is due to some sort of caching mechanism or multiple IDE instances interfering with each other.

So far it’s the only thing I’ve tried that doesn’t seem to break. The original code works fine as long as it’s not inlined, there’s something about inlining it that corrupts it (and I suspect that ‘something’ is the compiler picking the wrong registers).

The directory does, that particular file doesn’t.
The name is obviously a hash, so it’s probably being generated by Windows’ ‘get temp directory’ function and being deleted part way through the build process.

Precisely.
Normally I don’t resort to throwing things at the wall to see what sticks, but I’m just not seeing any patterns here. Everything I’ve tried has been in an attempt to isolate the problem, but the only pattern seems to be ‘the compiler can’t find the right registers’.


If I get time I’ll have another go tomorrow, but I do have other things to be doing.

Frankly my faith in finding a solution isn’t very high.
I think the only way to find a solution is going to be to look at the generated assembly and see where it’s differing, but even then I suspect any solution is just going to be fighting the compiler’s register choices.

I think maybe the best thing for now would be to run a few more tests on the noinline solution and if nothing breaks then use that as the official solution. Then maybe come back to looking for a better solution further down the line when someone’s got the time to sit down and work through the code. At the end of the day a less than ideal but still functional solution is better than leaving it broken until a better solution is found.

I found the problem. More to follow when I figure out what the fuck is going on. :slight_smile:

Latest on my test branch. Basically the fix was to declare r28 and r29 as clobbered (as they are). The compiler (in my disassembly) would sometimes decide to use Y for it’s own evil purposes (for other of our variables)… so Y was being “double used” depending on how it was compiled, which could produce all sorts of insane effects.

I was doing the right thing to PUSH/POP Y but I wasn’t taking into account the fact that the compiler might use it for my OTHER variables… I thought that the fact it wouldn’t give it to me DIRECTLY meant it wouldn’t give it to me indirectly.

After adding r28 and r29 the original sample now works perfectly. I haven’t tried any others.

@Pharap If you can confirm the fix I can make a clean PR that fixes this and also a few other of the small issues we’ve found while looking into this.

It seems that this issue can also manifest itself to such a degree that the resulting hex file is corrupted bad enough to brick the device. Poor @filmote seems to run into this situation as all he did to go from a functioning flash to a bricked unit was make some modifications to the graphics.

I did upgrade Arduino 1.8.3 to 1.8.4 … not sure if they are related as I had some good program uploads before the units froze. Yes that is units, plural.

I wonder it if is this problem. I added a new drawOverwrite of a large image (128 x 48) then tried to add some animations using 18 x 20 images using drawExternalMask. I am not sure I understand what the issue described above is but could it be related?

Your ‘fix’ breaks on the box example:

I:\Documents\Arduino\libraries\Arduboy2\src\Sprites.cpp: In function 'drawBitmap':

I:\Documents\Arduino\libraries\Arduboy2\src\Sprites.cpp:368:1: error: r28 cannot be used in asm here

 }

 ^

I:\Documents\Arduino\libraries\Arduboy2\src\Sprites.cpp:368:1: error: r29 cannot be used in asm here

I:\Documents\Arduino\libraries\Arduboy2\src\Sprites.cpp:368:1: error: r28 cannot be used in asm here

I:\Documents\Arduino\libraries\Arduboy2\src\Sprites.cpp:368:1: error: r29 cannot be used in asm here

lto-wrapper: I:\AppData\Local\Arduino15\packages\arduino\tools\avr-gcc\4.9.2-atmel3.5.4-arduino2/bin/avr-gcc returned 1 exit status

i:/appdata/local/arduino15/packages/arduino/tools/avr-gcc/4.9.2-atmel3.5.4-arduino2/bin/../lib/gcc/avr/4.9.2/../../../../avr/bin/ld.exe: error: lto-wrapper failed

collect2.exe: error: ld returned 1 exit status

exit status 1
Error compiling for board Arduboy.

As a disclaimer: I’m still using IDE version 1.8.1.

I’ll try upgrading later to see if that’s why it worked for you and not me, but I did try clobbering r28 and r29 once before and ran into the exact same problem.

Which example is the “Box” example again? And is this a case of inlining vs not inlining you think? If we can only reserve Y when non-inlined then we just need to tell the compiler that and be done with this.

Also were you trying with my other changes or just the r28, r29 change? Freeing up the other 3 registers could make a difference here.

“Questor” is the original case with the skeleton, made by oldmantom.

“Box” is the example I posted that demonstrates the other issue that crops up as a side effect of some of the solutions to the “Questor” example:

The first time I tried was a few days ago before you arrived, the time just now (from which I grabbed that error message) was using the test branch on your github that you linked to earlier.

I managed to get the code into the state where it could draw this:

By changing [buffer_ofs] from "x" to "w" and removing clobbers r26 through to r31.

Leaving r28 and r29 in fixes it completely for the “questor” case but breaks it for the “box” case.

I also removed the push and pop pairs for the X register.
(This change means it’s no longer being specifically used, so saving it like this is no longer necessary.)

I took all this, forked your code and committed to my fork:

https://github.com/Pharap/Arduboy2/commits/test

In case you want to give it a run or can see something that helps explain why this causes that particular change.


I’ve also updated the “box” testcase to help study the behaviour of Sprites::drawPlusMask and not just its side effects on the other functions:

#include <Arduboy2.h>
Arduboy2 arduboy;

const unsigned char PROGMEM image[] =
{
    8, 8,
    0xFF, 0x81, 0x81, 0x81, 0x81, 0x81, 0x81, 0xFF
};

const unsigned char PROGMEM imagePlus[] =
{
    8, 8,
    0xFF, 0xFF,
    0x81, 0x81,
    0x81, 0x81,
    0x81, 0x81,
    0x81, 0x81,
    0x81, 0x81,
    0x81, 0x81,
    0xFF, 0xFF
};

void setup()
{
	arduboy.begin();
}

void loop()
{
	if (!(arduboy.nextFrame()))
		return;
    
	arduboy.clear();
	
	Sprites::drawSelfMasked(8, 8, image, 0);
	Sprites::drawSelfMasked(8, 24, image, 0);
	
	Sprites::drawOverwrite(24, 8, image, 0);
	Sprites::drawOverwrite(24, 24, image, 0);
	
	Sprites::drawPlusMask(40, 8, imagePlus, 0);	
	Sprites::drawPlusMask(40, 24, imagePlus, 0);
	
	Sprites::drawErase(64, 8, image, 0);
	Sprites::drawErase(64, 24, image, 0);
	
	arduboy.display();
}

This should display as 6 identical boxes lined up neatly in a 3x2 formation (assuming I’m understanding the ‘plus mask’ format correctly - 1 byte of image followed by 1 byte of mask, making the image and mask interlaced).

I had purposely removed it from the input to ease register pressure. :slight_smile: The less registers going in the easier job the compiler has to allocate them. I would have written some nicer assembly before it ended up in a PR.

I reverted that an hour later when I realised the mistake, but forgot to update the link. I’ll update it now, but here it is again to make sure:

https://github.com/Pharap/Arduboy2/commits/test

Also don’t worry, I’m not attempting to PR this yet or anything, I just forked your copy it’s easier for you to access the code, to see what changes I’ve made and to be sure that I’m basing my changes on your copy…

Try this:

https://github.com/MLXXXp/Arduboy2/pull/15

I left the

        adiw r28, 63 
        adiw r28, 63
        adiw r28, 2

in.

It’s just as many bytes as loading a temp variable (you can’t use tmp_reg because you can’t LDI with it, and doesn’t requite me to find a temp register (you could use xi and front-load the reset)… or you could push/pop a temp register… but then you’d be throwing again more bytes. So unless someone knows a smaller way to add 128, I think this stays.

Like I said, I put it back in no less than an hour later when I realised there was a side effect I hadn’t spotted beforehand.


I can confirm that branch compiles and works successfully.
The register juggling approach has proven successful.
Now to await @MLXXXp’s approval.


Although you don’t need to get rid of the 3 adiws, I decided to see if it was possible for the sake of curiosity.

I discovered that there is in fact a way you could get rid of them:

If you loaded [buffer_ofs_jump] with just (WIDTH) (128), you could subtract that from Y and then use [x_count] which is set to (rendered_width) to subtract from [buffer_ofs_jump], making it equivalent to the value that was originally being fed in.

i.e.
[buffer_ofs_jump] "a" (WIDTH-rendered_width)
becomes
[buffer_ofs_jump] "a" (WIDTH)

and

"push r28\n" // save Y
"push r29\n"
"movw r28, %[buffer_ofs]\n" // Y = buffer_ofs_2
"adiw r28, 63\n" // buffer_ofs_2 = buffer_ofs + 128
"adiw r28, 63\n"
"adiw r28, 2\n"

becomes

"push r28\n" // save Y
"push r29\n"
"movw r28, %[buffer_ofs]\n" // Y = buffer_ofs_2
"add r28, %A[buffer_ofs_jump]\n"
"adc r29, %B[buffer_ofs_jump]\n"
"sub %[buffer_ofs_jump], %[x_count]\n"

It doesn’t seem to save any bytes though.

Of course it relies on WIDTH being 128 forevermore, so it’s possibly not an ideal solution. (Unless of course the reason 128 is being subtracted from Y is because that’s the width of the screen, in which case it’s probably a better solution because it will scale.

I’ve applied @Dreamer3’s pull request to the Arduboy2 master branch but I haven’t done a release yet. I’d like to do some regression testing on as many sketches that use Sprites as possible.

@oldmantom, Are you still working on your game? If so, could you use the latest Arduboy2 library with @Dreamer3’s fix and let us know if there are any problems? You’ll have to manually install the library from a clone or downloaded zip. Or, you could just replace Sprites.cpp.

1 Like

My only potential area of concern would be some sketch that makes the compiler juggle registers differently and run out of them and therefore refuse to compile because of excessive register pressure.

Well, a compiler error would be better than getting a clean compile but then having the sketch crash or run strangely. But that’s why I’d like try to do some regression testing on existing stable games.

@MLXXXp I am still working on the game and should be able to try compiling it with new library one evening this week.

It is quite something to see everyone come together to resolve the issue - even if I can’t make much sense of the technical details of the problem!

1 Like

It’s a great community.

1 Like