How much memory can my application use?

In another thread, I revealed that I had bricked a couple of Arduboys that I have and the feedback I have from @bateske is that there may be a bug in the compiler (I am using 1.8.4) that combined with my code has caused the ‘perfect storm’ to brick my machines. It has been linked with the graphics issues people are having with drawPlusMask().

This could all be true or is the problem simpler than that?

My programme uses about 88% of Progmem and 75% of Dynamic RAM. I see Circuit Dude uses 97% Progmem so that does not appear to be the issue.

How much DRAM does the the ‘environment’ need for things like call stacks, local variables and anything else it needs? I understand this will change based on the programme it is running, but what is a realistic working number?

2 Likes

Am I the only one who programs this way … counting bytes with each change?

For those playing along at home … note the settings at the top of page two 25,788 (Progmem) / 1,965 Dynamic RAM and the comment ‘Before Changes’. The final reading is 24.760 / 1903 and in there I have managed to add a splash screen with some basic animations, a theme song and two other sound snippets in ArduboyTones format and some other features.

Saving memory on one hand and immediately consuming it on the other!

3 Likes

As far as I know the 75% of dynamic ram should include all of the environment as well as if you compile a blank sketch it still uses 5% and simply invoking arduboy.begin uses 47%

1 Like

I don’t keep an entire log, but I do dump the size output to notepad and compare changes.

I do make todo lists and work logic out on paper though, I’m glad to see other people also still use paper.

It’s actually even simpler than that.
Just declaring a global Arduboy2 variable will cause that leap because that’s the size of the Arduboy2 being stored in the global data section.

It includes all the statically declared stuff, but it doesn’t account for stack usage, which grows with allocation of local variables and function calls.

It is somtimes possible to know (at compile time) exactly what the maximum stack usage will be, but those times are rare because certain commonly used things (like indirect function calls) make it practically impossible. It’s a similar issue to the halting problem in that there are constructs that make it impossible without actually running/simulating the code.

Reference? I’m not aware of any issues with the core library that would brick devices. Typically the Arduino IDE will show you a warning (after compiling) when your % of memory usage becomes too high that you could possible get tripped up by the stack, etc.

Bricking the device is (intentionally) very difficult. Even for the some of the Arduboys that shipped without protection for the bootloader… still the flash instructions only run from the bootloader area of flash… and the flash instructions are the only way I know of to “brick” it in software. So to flash it “accidentally” you’d have to JMP directly into the bootloader area and perhaps even setup the state exactly as necessary before hand. Of course it’d be possible to purposely compile a program to do this… but much harder to do so accidentally.

You’d have to push the jump address into the stack and then “return”. I suppose if you had a memory corruption bug you could accidentally push EXACT that address into EXACTLY the right memory location - and the bug be such that normal execution continues until you hit a RETURN… and I’m not saying that’s not possible… just that it’s pretty unlikely.

It is much simpler to write a sketch that locks up the device (requiring a reboot) or even a sketch that messes up the “usb auto reset to flash” functionality (requiring the reset pin or safe mode). But accidentally flashing the device (in software) is a stretch.

Now if you pick the wrong option from the Arduino IDE it may be possible to reflash the bootloader area (once) and hence mess things up pretty good. Far more likely someone would do this than to accidentally compile a sketch that corrupts flash.

@Dreamer3 you are right, it shows a warning at an arbitrary 75% and my code had just clipped over that (hence the OP). The problem occurred when I added ‘just one more’ graphic to my program - of course this might just have been me crossing the memory line.

@bateske suggested the link to graphics drawing here Image corruption when using drawPlusMask() in some cases [register allocation issue] and in private correspondence. I am not convinced as I am not using the drawPlusMask() function at all.

1 Like

The issue is if the reset button is not working, the bootloader is getting blown away, which isn’t going to be the result of overflowing the ram. Linking it to the drawplusmask issue was just simply anecdotal based on timing and the similarity of being image related, but I suppose if the draw method you are using isn’t going through inline assembly (I haven’t checked) then maybe not.

1 Like

Is this actually possible? And in this instance he had successfully flashed the unit prior and had only changed image data.

There is a menu option to reflash the bootloader (or whole flash), is there not? If so picking it and having a unit with unlocked flash and then flashing the WRONG file (not a working bootloader) would essentially brick it. The bootloader itself has the capability to overwrite itself (unless the lock bits are set) if told to.

I can confirm that I did not do this.

1 Like

You can’t reflash the usb bootloader over usb, you need an external programmer.

1 Like

If the bootloader area protection fuses haven’t been set, then you can “clobber” the bootloader over USB. If you don’t mess up the code in the bootloader that’s actually doing the writing, then in theory you can reflash the rest of the bootloader.

Unfortunately, I cannot run the diagnostic tool that I have seen @eried using to see if the fuse is set or not. I am planning on sending the units back to Kevin for a looksee.

I never did get an answer to my original questions:

How much DRAM does the the ‘environment’ need for things like call stacks, local variables and anything else it needs? I understand this will change based on the programme it is running, but what is a realistic working number?

1 Like

I can’t say for sure, but for dynamically allocated RAM, beyond the global variables shown when you compile, e.g.:
Global variables use XXXX bytes
I’d guess the Arduino environment itself would use less than a dozen bytes, once the sketch is executing loop(). The Arduboy2 object you create will allocate a few dozen bytes for its object variables.

Everything else will depend on what function calls you make, how deeply they’re nested, variables allocated by any other objects you create, and what local variables the functions have allocated at any given time. Interrupt service routines will also use some RAM when invoked. So, it’s pretty hard to give a generic “releastic” number.

The Arduino code that calls loop() is just:

	for (;;) {
		loop();
		if (serialEventRun) serialEventRun();
	}

and serialEventRun will never be true for the Arduboy.

I’ll give my best shot at answering this.
I’m not an expert on this stuff but I’m reasonably well read so hopefully my interpretation is close to the truth.

I’ve done a bit of research into this before, so I’ll link to some useful resources:

I’ll make an attempt to summarise these and explain what goes on, though I might be a bit inaccurate.
Excuse the formality, I am attempting to give this as a more general/technical answer rather than aiming it specifically at one person or at beginners.


Registers

AVR has 32 directly addressable8-bit general-purpose registers.
These are named R0 through to R31.
All 32 registers can be used for 8-bit arithmetic operations.

The upper 16 registers (R16 to R31) can be used with ‘immediate’ instructions, whilst the lower 16 registers (R0 to R15) cannot.
(An ‘immediate’ instruction is an instruction in which the second operand is a constant value embedded in the instruction)

In addition, some of the upper registers may be treated as ‘register pairs’ such that two consecutive 8-bit registers act as a single 16-bit register. In this case, the higher-indexed odd numbered register is the high byte and the lower-indexed even numbered register is the low byte. For example, R31:R30 is the highest pair.

Three of the register pairs are special in that they have additional features that can only be used with them. To this end, the pairs are referred to by special names: X (R27:R26), Y (R29:R28) and Z (R31:R30).
All three support special addressing modes called ‘postincrement’ and ‘predecrement’, which can occur alongside another instruction. For example, the instruction ST X+, r0 stores the content of R0 into the memory address pointed to by X (i.e. X acts as a pointer), after which X is incremented (hence post-increment). The instruction ST X-, r0 decrements X and then stores the content of R0 into the memory address pointed to by X (hence pre-decrement).

R0 and R1 are the implicit output registers of the multiplication operations MUL, MULS, MULSU, FMUL, FMULS and FMULSU.

R0 is treated as a ‘scratch register’. I.e. it is expected to be ‘clobbered’ and doesn’t need to be restored to its previous value after use.
R1 is intended to be treated as an always zero register. It is not strictly always zero as some instructions (such as MUL) will ‘clobber’ it and it must be restored to zero after use.
These two factors are convention only, beyond their use as MUL operation outputs there is otherwise nothing special about R0 or R1.

In addition to the basic 32 registers, there are some non-addressable special purpose registers:

  • PC - the 16/22-bit program counter - this points to the next instruction to be executed. This is either implicitly incremented after every instruction or explicitly changed via jump or branch instructions.
  • SP - the 8/16-bit stack pointer - this points to the top of the stack. Note that the stack ‘grows’ down, so decrementing SP increases the stack and incrementing it decreases the stack.
  • SREG - the 8-bit status register - this holds meta information, e.g. Did the last addition just overflow? Was the last operation result 0? Are interrupts enabled?

(There’s more to it than just this, but this is all that’s really relevant for the scope of the question. Factoring in things EEPROM and PROGMEM will complicate things.)


Opcodes

All opcodes are either 1 word (2 bytes) or 2 words (4 bytes) long.

Note that:

  • ‘r’ = R0 to R31 acceptable
  • k = constant value

PUSH r - 1 word - Pushes the contents of r onto the stack. Decrements SP by 1.
POP r - 1 word - Pops the contents of the top of the stack off the stack and into r. Increments SP by 1.
CALL k - 2 words - k is 22-bits wide. Pushes PC onto the stack. Decrements SP by 2/3 (2 on 16-bit address processors, 3 on 22-bit address processors). Moves k into PC. (Execution continues from address k.)
RCALL k - 1 word - k is 12-bits wide. Pushes PC onto the stack. Decrements SP by 2/3 (2 on 16-bit address processors, 3 on 22-bit address processors). Adds k to PC. (Execution continues from address PC + k.)
RET - 1 word = Increments SP by 2/3 (2 on 16-bit address processors, 3 on 22-bit address processors). Retrieves PC from the stack. (Execution continues from the address formerly on the stack.)


Calling

To establish some terminology :
(Note that I am assuming familiarity with the terms stack/call stack, function and function call/calling a function.)

  • The caller is the function/code that is calling another function.
  • The callee is the function being called by a caller.
  • To save a register means to record its value (on the stack) so that it may later be restored.
  • To restore a register means to take its saved value (off the stack) and place it back into the register.
  • To call-clobber a register means for the callee to alter the contents of the register and to not restore them to what they were prior to returning control of execution to the caller. I.e. the caller must expect the callee to alter the contents of the register and if the register’s contents are important it is up to the caller to save and restore them.
  • To call-save a register means for the callee to either not use the register or to alter the contents of the register and then restore the contents of the register before returning control of execution back to the caller. I.e. the caller may expect the contents to appear unaltered when the callee has completed and the caller does not have to worry about saving and restoring them.
  • A return address is the address that the callee should return control to when finishing. The address will be (by convention) the address of an instruction within the caller.
  • A calling convention is a specific protocol agreeing how a function call is to occur. It covers which registers are call-clobbered, which registers are call-saved, how arguments should be passed around, how the return address should be found et cetera.

In the AVR calling convention, registers R18 to R27, R30 and R31 are call-clobbered and thus are saved by the caller. Registers R2 to R17, R28 and R29 are call-saved and thus are saved by the callee.
R0 and R1 are different because of the protocol that already exists around them.
R0 is implicitly call clobbered, but this is due to the usual protocol for working with it, which is classed separately to the calling convention.
R1 is implicitly call saved, but this is again due to the protocol of setting it back to zero after working with it, which is again a separate matter to AVR the calling convention.

The AVR stack frame layout is:

Stack Frame
Arguments
Return Address
Saved Registers
Local Variables

Firstly, if the return type is greater than 8 bytes (e.g. 9 bytes), a block of memory is allocated by the caller (on the stack) to hold the data of said type. (This means that any type larger than 8 bytes is effectively being implictly passed as a pointer.)

Then any call-clobbered registers that the caller needs to save are pushed to the stack.

Initial arguments will be passed by register.
Any remaining arguments beyond a certain threshold will be pushed to the stack as part of the stack frame.
(How the argument registers are decided is a somewhat complicated process that I won’t go into. However it’s worth noting that 1-byte arguments effectively consume two registers , and the maximum number of 1-byte arguments passable in registers is about 9 arguments.)

Then the return address is pushed onto the stack as the call is made.
This is the point where the context switches from caller to callee.

Then any call-saved registers that the callee needs to overwrite are pushed to the stack.

Then, if there aren’t enough registers to handle all the locals, some space will be allocated on the stack to save/store locals from. (This is most likely allocated by subtracting a constant value from SP rather than using multiple PUSH instructions.)


The function performs all the operations that it needs to.

When execution of the function completes, the return value is saved either to a subset of the group of registers used to pass arguments or to the block of memory allocated by the caller to store the return value.

The space required for the local variables is then returned to the stack. (Again, it is most likely that this is deallocated by adding a constant value to SP rather than using POP instructions.)

Call-saved registers are restored from the stack (in reverse order to which they were saved since it’s a call stack).

The RET instruction is called to return execution back to the caller.
This is the point where the context switches back from callee to caller.

Any non-register arguments are deallocated.

Call-clobbered registers are restored from the stack.
The return result is either juggled into a different set of registers or some of the call-clobbered registers are restored into different registers from what they where before.

Execution continues.


So basically it really does depend on what you’re doing.
At absolute minimum, assuming a call is not inlined, a no-argument no-return value does-nothing function will require the space for the return address (2-3 bytes). After that the amount of stack space required varies wildly and depends on whether the function is inlined or not.

Take the function uint8_t add(uint8_t l, uint8_t r). Assuming there is no inlining (in a normal situation there obviously would be, but pretend there isn’t any) and the argument registers are in use, this would take about 6-7 bytes of stack space. 4 to save the argument registers, 2-3 for the return address. So basically, in general each call won’t take that much. It would be the same for uint16_t add(uint16_t l, uint16_t r). Upgrading to uint32_t add(uint32_t l, uint32_t r) would need 8 bytes to save argument registers, making it take 10-11 bytes of stack total, et cetera.

If you take the worst case scenario you could have 9 bytes allocated for a return value, all 12 call-clobbered registers saved, N bytes of arguments, the return address, all 16 call-saved registers and M bytes of arguments, your function would take 9 + 12 + 16 + N + M bytes of RAM through stack space, which is a minimum of 37. All in all, it’s not that much still. It would have to be quite heavily compounded (i.e. lots of nested calls) to topple the stack, depending on what the stack maximum is.

Note however that use of malloc adds pressure to the stack because the end of the stack is also the end of malloc's address space.

Simple experiments have shown that overflowing the stack seems to only cause the Arduboy to reset, but the tests I have tried are certainly not extensive.


All in all, the real risks are from:

  • nesting too many calls
  • using too many local variables (especially ones with a large scope)
  • non-tail-called recursive functions
  • allocating particularly large objects on the stack

If you’re following best practices you should be fine.


(Also there’s a delay in me posting this because the server sent me a 500 error and then wouldn’t let me post again because it was “too similar to what you recently posted” so I had to wait for the duplicate post recogniser to cool down. That took around 15-30 minutes.)

1 Like

Just an F.Y.I.:

The IDE will give a Low memory available warning if less than 640 bytes are available for local variables.

Global variables use 1921 bytes (75%) of dynamic memory, leaving 639 bytes for local variables. Maximum is 2560 bytes.
Low memory available, stability problems may occur.

Yes … the 75% / 640 bytes seems a bit arbitrary, hence my question as to what the environment really needs. I see from your response that the complexity of the code will directly impact on performance - I had already guessed that - so there is probably no ‘rule of thumb’ that covers most scenarios.

Wow … I am goigf to have to absorb what this all means. Thank you for taking the time to piece together allof the research though, it will give me no excuse not to actually roll my sleeves up and understand. The last time I got this dirty with a CPU, registers and machine code was with a 6502 / 6510 processor.

I had to refresh my memory and found this on the old wikipedia - the 6502’s registers include one 8-bit accumulator register (A), two 8-bit index registers (X and Y), 7 processor status flag bits §, an 8-bit stack pointer (S), and a 16-bit program counter. Seems really simple compared to the 32 the ACR processor has.

Thanks to both for your answers.

1 Like

No problem. Hopefully it will benefit other people as well.

Oddly enough, prior to looking at the AVR ISA the last ISA I looked at was the 6502 (for a NES emulator I started making). Nice little ISA. Not very powerful but a nice introduction into the world of CPUs and their instruction sets.