drawPlusMask artifacts, frequent bricking after upload

(Thomas) #24

In the call of .drawExternalmask it doesn’t to my knowledge use the assembly code in drawbitmap

  204     case SPRITE_MASKED:
  205       uint8_t *mask_ofs;
  206       mask_ofs = (uint8_t *)mask + (start_h * w) + xOffset;
  207       for (uint8_t a = 0; a < loop_h; a++) {
  208         for (uint8_t iCol = 0; iCol < rendered_width; iCol++) {
  209           // NOTE: you might think in the yOffset==0 case that this results
  210           // in more effort, but in all my testing the compiler was forcing
  211           // 16-bit math to happen here anyways, so this isn't actually
  212           // compiling to more code than it otherwise would. If the offset
  213           // is 0 the high part of the word will just never be used.
  214 
  215           // load data and bit shift
  216           // mask needs to be bit flipped
  217           mask_data = ~(pgm_read_byte(mask_ofs) * mul_amt);
  218           bitmap_data = pgm_read_byte(bofs) * mul_amt;
  219 
  220           if (sRow >= 0) {
  221             data = Arduboy2Base::sBuffer[ofs];
  222             data &= (uint8_t)(mask_data);
  223             data |= (uint8_t)(bitmap_data);
  224             Arduboy2Base::sBuffer[ofs] = data;
  225           }
  226           if (yOffset != 0 && sRow < 7) {
  227             data = Arduboy2Base::sBuffer[ofs + WIDTH];
  228             data &= (*((unsigned char *) (&mask_data) + 1));
  229             data |= (*((unsigned char *) (&bitmap_data) + 1));
  230             Arduboy2Base::sBuffer[ofs + WIDTH] = data;
  231           }
  232           ofs++;
  233           mask_ofs++;
  234           bofs++;
  235         }
  236         sRow++;
  237         bofs += w - rendered_width;
  238         mask_ofs += w - rendered_width;
  239         ofs += WIDTH - rendered_width;
  240       }
  241       break;
  242 
0 Likes

(Andrew Livesay) #25

Ah, okay. Do you know of a tool that will generate for use with drawExternalMask?

1 Like

(Thomas) #26

I think @pharap Has one

1 Like

#27

I’m definetly not seeing that here. Interesting. I’ll have a look at the code.

2 Likes

(Simon) #28

You can use the TeamARG tool to create masks. Simply remove the first two bytes (the dimensions) from the output.

1 Like

(Pharap) #29

One of these:

Though if you used something to generate a plus-mask sprite the same tool probably has a setting for separating the mask.

I second @Mr.Blinky about not seeing it.

That said, from the looks of the commit history v4.1.0 might not have the fix.
Try getting a copy of the Arduboy2 library straight from the master branch (using download zip) and then dumping the contents of the zip in your Arduino/libraries folder (replacing whatever copy of Arduboy2 you had installed).

Unfortunately it’s all part of the same function, but there’s some odd rules about inlining that mean the assembly code might not be used depending on the circumstances.

1 Like

#30

@Pharap he used 4.1.0 I looked at the reversed dissassembly I made from the .elf and found the bug

;--case SPRITE_PLUS_MASK:-------------------------------------------------------
     b6e:	cf 93       	push	r28         
     b70:	df 93       	push	r29
     b72:	eb 01       	movw	r28, r22    ;buffer_page2_ofs

00000baa <next_loop_y>:
     baa:	2a 95       	dec	r18             ;yi
     bac:	51 f0       	breq	.+20     	; 0xbc2 <finished>
     
     bae:	99 2d       	mov	r25, r9         ;xi = x_count
     bb0:	33 95       	inc	r19             ;sRow
     bb2:	11 24       	eor	r1, r1
     bb4:	ec 0f       	add	r30, r28        ;!!!spriteofs + lsb(buffer_ofs_page_2) instead of sprite_ofs_jump !!!
     bb6:	f1 1d       	adc	r31, r1
     bb8:	a0 0f       	add	r26, r16        ;buffer_ofs += buffer_ofs_jump 
     bba:	b1 1d       	adc	r27, r1
     bbc:	c0 0f       	add	r28, r16        ;buffer_ofs_page_2 += buffer_ofs_jump
     bbe:	d1 1d       	adc	r29, r1
     bc0:	d9 cf       	rjmp	.-78     	; 0xb74 <loop_x>

The problem is that the r28 and r29 registers (which make up the Y register) are not declared as clobbers and the Y register is not declared either. so the compiler thinks it can use r28 and r29 in the other C code of that function. So it decides to put sprite_ofs_jump in r28 (see line bb4:). But the assembly code at the beginning overwrites it by putting buffer_ofs_page_2 into r28,r29 (see line b72:)

Fixing takes a bit more then just clobbering R28,r29 or declaring Y as pointer as the compiler generates an error when doing so. probably because it runs out of registers pointer registers.

2 Likes

(Thomas) #31

And thus the Assembly Rabbit hole has opened

1 Like

(Pharap) #32

Yeah, the last attempt to fix was this:

But that’s only in the master branch, it hasn’t made it into a release yet.

The easiest thing would be to just swap to a C++ code version, but I know for a fact that increases the code size substantially so it would break games that are desperately near the progmem limit, so that’s a bad idea too.

I’ve got a feeling that __attribute__ ((noinline)) might fix it since that worked before the assembly rewrite.

Otherwise there’s going to have to be some more register juggling put in place.

0 Likes

(Scott) #33

Yes, I haven’t had time to test the fix thoroughly and (until now) nobody else has complained about problems.

@alivesay, Could you please test some code that causes the artifacts using the unreleased Arduboy2 library on the head on GitHub?

0 Likes

#34

And I’ve added a few changes for multiple display and resolution support. I’ll have a go at the sprite plus mask assembly code.

1 Like

(Andrew Livesay) #35

Sorry for the late reply – I hit my new user first day post limit.

I am able to use drawExternalMask() without artifacts.

@Pharap FWIW I did try the attribute((noinline)) fix on 4.1.0 but it didn’t fix the issue with drawPlusMask().

@MLXXXp Will test that ASAP.

2 Likes

(Scott) #36

That makes sense, since drawPlusMask() is the only one that uses assembly code.

0 Likes

(Andrew Livesay) #37

I tested the latest git version of Arduboy2 using my old code that uses drawPlusMask and the artifacts do still appear.

Thanks everyone! Let me know if I can do any more testing.

1 Like

(Scott) #38

That’s strange. I tried your latest code on GitHub (commit number 6739e84563823fb08675b7a88d2325e562404329).

With the latest release, 4.1.0, of the Arduboy2 library, I get the artifacts as you described. Using the latest Arduboy2 library from GitHub, there are no artifacts.

Here are the details of my testing:

  • Tested with both a real Arduboy and my breadboarded system based on an Arduino Micro.
  • Sketch program and global variable sizes given below are with Board: Arduboy selected.
  • Arduino IDE version 1.8.4
  • Arduino AVR Boards version 1.6.20
    (as displayed using Tools > Board: > Boards Manager...)
  • To be safe, flashlight mode was used for all uploads.
  • With the 4.1.0 Arduboy2 release (installed using the Arduino Library Manager):
    Sketch uses 11776 bytes (41%) of program storage space.
    Global variables use 1888 bytes (73%) of dynamic memory
  • With the latest Arduboy2 code from the GitHub master branch
    (commit number 6360eb2b37502538ebde4e58e6ccb840b0a5f463):
    Sketch uses 11792 bytes (41%) of program storage space.
    Global variables use 1888 bytes (73%) of dynamic memory
0 Likes

#39

And the differences are in the sprite draw mask assembly bit. buffer_page2_ofs is not passed on through registers but calculated in assembly by pasing on WIDTH as an immediate value, saving register(s). also the register declaration is different. But the bug is still there. The compiler may decide to use r28, r29 to pass on values to the assembly in other situations.

0 Likes

(Scott) #40

I haven’t had much to do with the assembly code for drawPlusMask(). I’m not sure if @Dreamer3 (yyyc514 on GitHub) did the original code but he’s the one who worked out the latest changes.

@Mr.Blinky, If you think there’s still problems and have any fixes, please let us know.

0 Likes

(Andrew Livesay) #41

@MLXXXp Thanks again for testing. I’m guessing I didn’t download the zip from github master correctly, or didn’t get the IDE to pick it up.

0 Likes

(Josh Goebel) #42

AFAIK my fix completely resolves the r28/r29 issue as it forces the use of lower registers by all the incoming/outgoing variables… then we steal r28/r29 for ourselves by pushing/popping it… but always possible there is some OTHER problem. If someone is using the PATCHED library and still seeing an issue it’d be helpful to know for sure.

3 Likes

(Andrew Livesay) #43

I just tried testing again with the latest github version of Arduboy2 and this version does seem to work correctly.

1 Like