Bricked help needed [Solved-ish]


(Kevin) #19

They stick it at the end of ram using a function called RAMEND.

This kind of all came about from this thread almost 2 years ago:

And you can see the change they made here:

So from what I can tell we should be storing 0x7777 at RAMEND-1?

Or is it possible it is using location 0x0000??? If we are actually using an older bootloader, this is where arduino would look, and not at 0x0800?

// Backup ram value if its not a newer bootloader. // This should avoid memory corruption at least a bit, not fully

Are we in the “not fully” category? I swear we are using the newest bootloader from after this fix…


(Scott) #20

@bateske,
I did a bit more digging and discovered that RAM addresses don’t start at 0. RAM starts at address 0x100. Therefore, if the bootloader is storing a magic number at address 0x800 then this is really only 0x700 bytes into RAM space.

0x700 is 1792 in decimal. This would explain why Sirene is OK when compiled with Arduboy2 V3.0.0 because it only uses 1788 bytes of global variables. However, Sirene has problems when compiled with Arduboy2 V3.1.0 because it uses 1796 bytes of global variables, which when initialised would overwrite the contents of address offset 1792.

So, this is good evidence that the Arduboy bootloader is storing the magic number at RAM address 0x800.


(Kevin) #21

So I went in and looked, yeah for sure:

volatile uint16_t *const bootKeyPtr = (volatile uint16_t *)0x0800;

This has never been changed in their bootloader code. So chalk this up to me not reading the source code of the bootloader before deploying it. I figured their most recent one would implement this.

So this is an issue that was open for 2 years with arduino, the change was accepted and merged but the follow through into the bootloader never materialized. Although we clearly would have only been able to make this improvement on the second batch of units actually since this change happened after we shipped pre-orders.

So, if we change flashlight mode to write the magic number, then this should resolve the issue then? (At least to allow flashlight mode to always work)


(Kevin) #22

So, long story short, reset button is your friend. Be gentle with it. It seriously almost didn’t make it into production, the original didn’t have one.


(Scott) #23

Hopefully, yes. I plan to work on this tomorrow. If things go well, there will be a new version of Arduboy2 available in a day or two.


(ET) #24

NICE! I’m loving this “find a problem lets work on a solution”. you all are great. Just wish I did more than break my device, lol, but I guess we all have to start somewhere.

In my first 6 days with Arduboy I have learned more about coding and writing programs then I would have ever imagined.

Yes I use Linux, yes I have multiple raspberry pie’s, never an Arduino device let alone code in any language.
You guys and this lil device are all top notch - TY - I feel like I have a new found hobby and its all thanks to Arduboy!

p.s. good luck on the boot loader code BC i’am WAY lost on that stuff still.


(Scott) #25

So I’ve spent some time looking at adding a save and restore of the bootloader magic number to flashlight mode, as discussed above. I’m confused with what I’m seeing.

Before actually modifying the library, I decided to write a sketch that reads the magic number, to confirm the location and value and make sure it can be read and written.

As I understand it, RAM address 0x800 is the magic number location, which should contain a value of 0x7777 unless it’s been overwritten by a sketch that uses a large amount of RAM. Newer versions of the bootloader and USB code have been modified to put the magic number in the last two bytes of RAM, at 0xAFE and 0xAFF. From what I’m seeing this is not the case.

Instead, at the end of RAM there is a value that changes depending on the sketch code. Based on my research, I believe this is the return address for a call to function main(). At address 0x800 I get either 0x0000 or the same value as at the end of RAM, depending on if the USB port is connected to the PC. Each time the Arduboy is attached to the PC the RAM end value will be copied to the magic number location. In no case have I ever seen either location be 0x7777.

Can anyone shed some light on what’s happening here?

The following is my latest test sketch. It just continually displays the values at 0x800 (MAGIC_KEY_POS) and the end of RAM (RAMEND - 1). You can press the A button or B button to change their values.

#include <Arduboy2.h>

Arduboy2 arduboy;

void setup() {
  arduboy.begin();
  arduboy.setFrameRate(15);
}

void loop() {
  if (!arduboy.nextFrame()) {
    return;
  }
  arduboy.pollButtons();

  if (arduboy.justPressed(A_BUTTON)) {
    (*(uint8_t*) MAGIC_KEY_POS)++;
    (*(uint8_t*) (MAGIC_KEY_POS + 1))--;
  }

  if (arduboy.justPressed(B_BUTTON)) {
    (*(uint8_t*) (RAMEND - 1))--;
    (*(uint8_t*) RAMEND)++;
  }

  showValues();
}

void showValues() {
  arduboy.clear();

  arduboy.print("MAGIC_KEY_POS: 0x");
  hexPrint4(MAGIC_KEY_POS);
  arduboy.print("\nValue: 0x");
  hexPrint4(*(uint16_t*) MAGIC_KEY_POS);

  arduboy.print("\n\nRAMEND - 1: 0x");
  hexPrint4(RAMEND - 1);
  arduboy.print("\nValue: 0x");
  hexPrint4(*(uint16_t*) (RAMEND - 1));

  arduboy.display();
}

void hexPrint4(uint16_t val) {
  if (val < 0x1000) {
    arduboy.print('0');
  }
  if (val < 0x100) {
    arduboy.print('0');
  }
  if (val < 0x10) {
    arduboy.print('0');
  }

  arduboy.print(val, HEX);
}

I haven’t found a case where I’ve actually “bricked” the Arduboy by changing the values.

The code I’ve found so far that defines, tests or manipulates the magic number is in files Caterina.c, USBcore.h, USBcore.cpp and CDC.cpp

I’ll continue to look at this tomorrow, but any help would be appreciated.

My plan is to temporarily save whatever is at the magic number location when flashlight mode is entered, replace it with the magic number, then restore the saved value before exiting flashlight mode. To do this I need to know the magic number location (which appears to be 0x800) and the value to put there (which should be 0x7777 but doesn’t appear to be), with the intent that the bootloader will see the proper magic number and remain active to allow sketch upload.


(Kevin) #26

Well the auto reset is triggered when the host reinitialize the serial connection?

Does the USB stack then set the magic number before doing the reset, or does it just reset trusting that the magic number is already entered?


(Scott) #27

If it was set just before doing the reset, then we wouldn’t have any problems, would we? I’m assuming the issue is that the sketch is overwriting the magic number due to the fact that the number is hard coded to a location in an area that the sketch is using for a variable. If the magic number were set to the proper value just before a reset, then it wouldn’t matter that the sketch had changed it.

I’m just starting to trace through the bootloader and USB code to try and gain an understanding of exactly how the magic number is intended to work. I should have a more precise answer in a few hours.


(Scott) #28

ARRRGH! :tired_face: I now know more than I would have liked to about the bootloader magic number, and the reason that Sirene won’t allow a new sketch to be loaded using flashlight mode. For all the gory details, read on…

It turns out that I was incorrect in answering @bateske’s question in my previous post. The USB code does indeed set the magic number just before initiating the sequence to enter the bootloader. The problem is that the bootloader isn’t started immediately after the magic number is set.

The steps to enter the bootloader, and the reason for needing a magic number, are as follows:

  1. Address 0x800 in RAM is where the bootloader looks for the magic number but it isn’t touched until a PC connected via USB needs to use the bootloader. Until then, the magic number location is free for use by the sketch for any purpose. Note that this location is hard coded in the bootloader, and the bootloader can’t be changed without a hardware programmer connected to the Arduboy circuit board.
  • The USB interface determines that the PC wants to enter the bootloader. (The details on how aren’t important.)
  • The USB code saves the current value stored at the magic number location. This is because there’s a possibility that the detection is a false alarm. If the sketch was using this location, we want to be able to restore the value when the false alarm is realised later in the sequence. (What constitutes a false alarm and when/how the saved value is restored is not important.)
  • The USB code sets the magic number value, 0x7777, at RAM address 0x800 but it doesn’t go directly to the bootloader at this time. Instead, it sets the “watchdog timer” for a 120 millisecond timeout. This timeout is to allow time for any USB interface cleanup that needs to occur before entering the bootloader.
  • When the watchdog timer times out, it generates an interrupt which starts the bootloader.
  • The bootloader looks to see if the magic number location contains 0x7777. If so, the bootloader enters command mode and processes the commands from the PC to upload a new sketch or perform other actions.
  • If the magic number is not 0x7777 it assumes that the watchdog timer expired for some other reason; possibly because the sketch was actually using it for its intended purpose. In this case, the bootloader just restarts the sketch instead of entering command mode.

The problem with this sequence is that for the 120ms window while the watchdog timer is counting down, the sketch continues to execute as normal. If during that time the sketch happens to be using the magic number location, and overwrites it with something other than 0x7777, then when the timer expires and the bootloader starts it will just restart the sketch instead of processing commands from the PC.

Normally, using flashlight mode will put the sketch in a tight loop, which prevents the magic number location from being overwritten, thus allowing to bootloader to enter command mode when the watchdog expires.

However, there’s an exception. Flashlight mode doesn’t stop interrupt service routines from running. If an interrupt occurs during the 120ms watchdog window, and its service routine has a variable located at the magic number address, and it changes that variable so the magic number location becomes something other than 0x7777, then the bootloader will just restart the sketch when the watchdog expires.

Although the chances of this happening are very slim, this is what happens with Sirene when compiled with version 3.1.0 of the Arduboy2 library. A variable used by the interrupt service routine that counts milliseconds just happened to end up at the magic number location, so it changes many times during the 120ms watchdog window. :cry:

So is there something that can be done to alleviate or reduce the problem? That’s my job for tomorrow. It may be possible to disable most interrupts while in flashlight mode (but not the watchdog timer interrupt). Otherwise, writing 0x7777 to the magic number location rapidly and continuously may reverse any changes made by interrupt service routines most of the time.

To be continued…


Simple Fidget Spinner
"new" seems to not run constructors?
Arduboy custom bootloader
Can't upload new sketches [Solved]
MicroCity - City simulation in your pocket
The Bounce - A ball physics platformer
(Kevin) #29

Yeah just Cli(); should work?

Excellent work.

Oh by the way, is it for sure that it’s always checking 0x0800? Did they not actually implement the RAMEND-1 thing?


(Scott) #30

No, because that would disable all interrupts, including the watchdog interrupt, so you would never enter the bootloader. Interrupts are also used for the timers used for the delay(), delayMicroseconds(), millis(), etc functions. If any of the USB cleanup or other background stuff that executes in the 120ms window requires interrupts in order to work properly, then we can’t disable those interrupts.

To play it safe, I think I’ll try leaving interrupts alone and just hammer a 0x7777 to the magic number location as fast as possible. Hopefully this will replace any overwrites done by interrupts often enough that the bootloader will be entered a very high percentage of the time. If it doesn’t, you just try again. Holding the UP button down while trying to upload in flashlight mode would possibly help with this. We’ll see after I’ve experimented a bit.

Yes, the standard Caterina/Leonardo bootloader, which the Arduboy uses (at least mine does), checks location 0x800. It hasn’t been updated to use RAMEND-1. The USB code included in every sketch, which invokes the bootloader, has been updated to support the magic number at either 0x800 or RAMEND-1.

To change the bootloader you would need to change the magic number location to RAMEND-1 and also put a “LUFA class bootloader” signature at the proper location in the code (I believe it’s 0xDCFB at the end of program memory). I’m not sure if there are any other requirements to actually make it a LUFA class bootloader.

The commit that added the changes to the USB code indicates that NicoHood’s Hoodloader2 bootloader does this.


(Kevin) #31

Making these changes will probably increase the library size pushing some games over the limit again huh?

Thanks for the help by the way it’s awesome!


(Scott) #32

I suspect that there are very few games that close to the limit.


(Kevin) #33

Fair enough, but my suggestion still stands.


(Scott) #34

I’m going to make the safeMode() function public and modify it with the same fix as flashlight(). Any sketch that goes over the program size limit due to increases in the size of flashlight() should be able to use safeMode() instead, to get the size back down. (I don’t know what the savings will be yet.)


(Josh Goebel) #35

Just to add a comment really fast the watchdog reset will ALWAYS happen (unless code inbetween alters the watchdog sequence itself). The question then becomes at the time the watchdog triggers will the memory location in question contain the magic # or not. Setting it manually inside a tight look wouldn’t completely eliminate the problem but it would likely make it 1000 times less likely to happen.

Timer fires every 1ms and we can run 16,000 instructions in that same amount of time. Of course it would be easier to just stop timer1 (or is it 0) for flashlight mode and turn it back on after exiting flashlight mode. The core Arduino stuff already has methods to power down the timer signals (which would accomplish the same thing IIRC).


(Scott) #36

I think doing this is good enough. In my testing, I haven’t had a problem since adding code to just hammer the magic number in flashlight(). I’ve verified that the timer overflow counter is located at 0x800 during these tests and that the sketch still bricks if I don’t use flashlight mode.

Since the chances of an ISR variable overlapping the magic number is going to be extremely rare, I don’t think adding extra code to shut down the timer is necessary. And, I don’t feel comfortable with messing with timers and interrupts while waiting for the watchdog to kick in. You never know what background code may need them, now or in the future.


(Josh Goebel) #37

None if we control the full “boot” sequence and this is considered pre-boot functionality.

Really that should only happen if reset has been triggered though, no? Shouldn’t this be PRESERVING the magic number, not SETTING it if it’s not already set?


(Josh Goebel) #38

Correct, because the top of memory is the top of the stack. When main is called the location to return to is pushed onto the stack just like with any other call… I suggested using RAMEND long ago because in Arduino sketches main will never return since loop is called infinitely - hence you have 2 bytes of RAM that will never be used for anything useful and a perfect place to store a magic flag like this.