Brainstorm how to prevent issues with huge RAM usage

Hi,

I want to be able to use max RAM and just flash sketches without entering flashlight mode. So I just browsed a bit through the CDC code of Arduino and came up with the following idea.
In the Arduboy library we could place the following code (being Dr. Evil… muahahahahaaa :smiling_imp:):

bool __real_CDC_Setup(USBSetup& setup);

bool __wrap_CDC_Setup(USBSetup& setup)
{
        uint8_t wdcsr;
        bool ret = __real_CDC_Setup(setup);
        wdcsr = WDTCSR;
        if (wdcsr & _BV(3))
                for (;;)
                        ;
        return ret;
}

To the linker flags we add the following:

-Wl,--wrap=CDC_Setup

The idea is to wrap this function by the linker and replace it with our own. Then inside this function we check if the watchdog got enabled. If so we just loop and wait for the watchdog to reset the board.

Any brave people here to try it out?

If you want to try out be sure you will be able to unbrick it with an external programmer!

Just in case the idea is too stupid.

  1. How do you plan on adding those flags?
  • I haven’t taken the time to learn exactly how high RAM usage causes the upload problem, but I was under the impression that the sketch clobbers a flag at a certain location which the bootloader is relying on. Are you sure that whatever the problem is will affect the watchdog?

I just tried it but it has no effect.

According to the code the CDC_Setup function is enabling the watchdog (to 120ms) when the serial port gets opened @1200baud and writes a magic number to 0x800. This should be enough time to handle all the USB stuff till the serial is closed. Then the board should reset and the bootloader will see the magic value at 0x800 (i think) and load a new sketch.

My idea was now that whenever this function enables the watchdog we could just wait till it triggers so the current sketch has no change to overwrite the magic value.
While writing this I think it could lead to issues with the CDC code… hmmm

About the flags, I use platformio and here I can add the easily to the build flags. For Arduino I think there are already board files for the Arduboy. Maybe it would be possible to add them here as custom build flags.

Sadly the above approach did not give any result. A bit strange to my mind. I am missing something that I do not understand yet.
Maybe some other interrupt is coming in, killing the magic value? Or the wrapping is odd and it prevents the CDC code from doing the cleanup that leads to the new serial device for flashing. I think I need to dig deeper.

Many people select Arduino Leonardo as the board type when uploading to the Arduboy. Anything added to the Arduboy board files wouldn’t work in this case.

I think just thoroughly documenting the technique of holding the UP button while booting, and pointing it out as often as possible, will make it become common knowledge.

The next version of the Arduboy library will allow sketches looking for code space to use boot() instead of begin(). This will remove flashlight unless the sketch calls it after boot().

For sketches that use boot() and don’t call flashlight() but use enough RAM to cause the problem, we would recommend they include a small recovery code segment after boot()

  arduboy.boot();

  if (arduboy.buttonsState() == UP_BUTTON) {
    while(true) { }
  }

I see. Basically I am fine with the flashlight mode. Sometimes I just like to find out why things are not working. The gain of my approach would not bother most people.
Maybe I will continue tinkering around a little with this.

I like the idea with just using boot. It will free up some space.
For this also I found some macros like digitalWriteFast that we could use instead of the regular one. The are also relacements for pinMode and the like. These free a big amount of memory too.
I’ll try to find them again.

There’s been some discussion similar to this

If you’re interested in my efforts with regards to improving the library you can look at my repository. Basically, I forked the Arduboy V1.2 development code and called it Arduboy2. It can co-exist in the IDE with the current Arduboy V1.1 library, so existing sketches don’t break. I haven’t gotten to updating the README.md, other documentation or example sketches, but you can look at the commit logs and comments in the .h files for some information. It’s still a work in progress but close to what I feel is releasable.

Thank you. I will look into this repository. The digitalWriteFast macros I found here:

http://code.google.com/p/digitalwritefast/

We will need to find out how much bytes we will gain. The macros impose some restrictions and will only be effective with the correct parameters. I’ll put that on my list and see what I can find out.
Generally I like to see that some flash gets freed. The leonardo stuff already eats up quite some space.

As soon as a pin operation is called even once somewhere with a variable as a pin number, it will pull in all the code for the regular version of the function. If this is the case, using the fast functions elsewhere may be faster but probably wouldn’t save much code.

The Arduboy library currently uses a variable for pin numbers in function bootPins() in core/core.cpp so you’d have to consider that, at least.

Also using any additional libraries which use variables for pin numbers, such as the new ArduboyPlaytune, will cause the problem.

Yes you are right and anyway all the usual Arduino functions are using all the function that are based on progmem tables (pins_arduino.h).
What I meant is that we can construct such macros ourself for the 32u4 and use them in the Arduboy core library.
I just did a quick test on my code:

Before:

Program:   28022 bytes (85.5% Full)

After:

Program:   27988 bytes (85.4% Full)

I only modified:

void ArduboyCore::LCDDataMode()
{
  *dcport |= dcpinmask;
  *csport &= ~cspinmask;
}

to this:

void ArduboyCore::LCDDataMode()
{
  bitWrite(PORTD, 4, (1));
  bitWrite(PORTD, 6, (0));
}

This is just for testing. The bitWrite stuff we can nicely pack into digitalWriteFast macros for the Arduboy.
There are some other functions that can be written like that and then we can also remove dcport and the like to free some RAM.
I haven’t looked into the linker output but at the end we might even remove the function at all an call these directly when needed. The impact should be very low as their size (code size) is something around 4 bytes (both together).

This also improves performance as there is just one instruction per bit change now.

Maybe for the next release.

You guys need to be using objdump (pretty sure other examples on the forums) to see what’s actually taking up space in your programs. Version 1.2 of the library has killed most of the easy junk though… so any remaining space savings are going to be harder to win.

Probably easier to optimized your program code than to find bloat in the core library.

1 Like

Yes I used it already. The digitalwritefast macros are just some low hanging fruits. As you can see in my above example it can save 32 byte easily. Also there are some more places where we can do this (e.g. we could also remove the table for the pins). Despite the size a single sbi instruction is faster than a function call and the use of pgm_read_…

I agree it is not the huge amount of memory and of course optimizing the app code is more efficient but on this platform every byte counts.

Here I found some good start

I think its worth a look.

Sure but you’re fighting an uphill battle… optimizing core lib might give you 32-64 bytes here and there… optimizing your sketch itself could give you 1-2k back once you truly understand things NOT to do.

I ran around and got lots of the little wins… if someone else wants to get the rest I suppose it can’t hurt, but it’s not going to make a BIG difference like teaching people how to optimize their own code would.

Actually (with 1.2) I took most of the big wins too (removal of tunes, optional use fonts/printing, etc)… but sure there is still a little more fat…

For sure it will not be a big win in size. Also we need to take care the lib is not getting too much obfuscated so less experienced users can still easily learn from it.
Btw where can I find version 1.2? Got a bit lost.