Strange memory problem

Hi!

If my code looks like this:

//    case Class::Directive::Close: //reset sleep
//      if (arg) {
//        _bit_mask = Class::setBits(_bit_mask, 7, 15, 4);
//      } else {
//        _bit_mask = Class::setBits(_bit_mask, 7, 0, 4);
//      }
//      return this;
//      break;
    case Class::Directive::Block: //set block path or bind
      if (arg) {
        _bit_mask = Class::setBits(_bit_mask, 3, 1, 1);
      } else {
        _bit_mask = Class::setBits(_bit_mask, 3, 0, 1);
      }

…here is my compile message:

Sketch uses 27732 bytes (96%) of program storage space. Maximum is 28672 bytes.
Global variables use 1438 bytes (56%) of dynamic memory, leaving 1122 bytes for local variables. Maximum is 2560 bytes.

setBits defined as static function.

When i uncomment my code:

    case Class::Directive::Close: //reset sleep
      if (arg) {
        _bit_mask = Class::setBits(_bit_mask, 7, 15, 4);
      } else {
        _bit_mask = Class::setBits(_bit_mask, 7, 0, 4);
      }
      return this;
      break;
    case Class::Directive::Block: //set block path or bind
      if (arg) {
        _bit_mask = Class::setBits(_bit_mask, 3, 1, 1);
      } else {
        _bit_mask = Class::setBits(_bit_mask, 3, 0, 1);
      }
      return this;
      break;

…then it don’t compile:



text section exceeds available space in boardSketch uses 29780 bytes (103%) of program storage space. Maximum is 28672 bytes.

Global variables use 1438 bytes (56%) of dynamic memory, leaving 1122 bytes for local variables. Maximum is 2560 bytes.
Sketch too big; see https://support.arduino.cc/hc/en-us/articles/360013825179 for tips on reducing it.
Error compiling for board Arduino Leonardo.

What happened? Why 96% becomes 103% with this lines? Maybe i need to clear cache or something like this. Please help me.

There’s at least half a dozen different possibilities.

E.g.

  • If you aren’t using setBits much before this then it could be because setBits is no longer being inlined.
  • Or perhaps the opposite, that it’s a large function and it is being inlined, producing more code than if it wasn’t inlined.
  • It could be due to how the compiler is implementing the switch statement.
  • It could be a knock-on affect in which the change to this code is changing how the compiler chooses to compile the surrounding code as well.
  • If setBits is doing some bit shifting then AVR’s lack of a barrel shifter might be involved

(By the way, if you return this; then the break; is redundant because it will never be reached - it’s dead code, and the compiler should warn you about that if you have warnings set to ‘all’.)

Whatever the reason, there’s only one solution: you must find a way to reduce memory usage.

Without more context and/or information I can’t suggest can’t really suggest much.
With the available information the best I can suggest is to try:

  • Getting rid of setBits, by writing its contents manually
  • Only calling setBits at the end of the switch (and using variables to store the arguments.)
  • Turning your switch statement into two switch statements, with one running when arg is true and the other when it’s false. (A bit like loop fission.)

If you share the code then someone might be able to suggest something else.

As it is there’s just not enough information to provide anything other than general suggestions, let alone a concrete answer. The compiler is a complex beast, and context really matters.

1 Like

Thanks! Here is my setBits definition:

byte Class::setBits(byte x, byte pos_new_bits, byte new_bits, byte num_new_bits) {
  return (x & (~(((byte)(pow(2, num_new_bits)) - 1) << (pos_new_bits + 1 - num_new_bits)))) | (new_bits << (pos_new_bits + 1 - num_new_bits));
}

It sets next N bits after position.

Maybe its because of pow()

I will try your methods and reply.

Yes, that pow is going to be horrendously expensive.
It’s forcing the values to be upgraded to double.


I suspect what’s going on is this:
When you were only using setBits a few times the compiler was inlining it, and in the process was computing pow(2, /*integer literal*/) at compile time.

After the function is no longer being inlined, the compiler would have to actually call pow because it can no longer compute it, which in turn means that pow's function body has to be generated, as well as everything pow is using.

AVR doesn’t have hardware floating points, so all floating point operations have to be simulated in software, so all the operations pow is using (even simple things like multiplication and addition) have to be implemented in software.

Consequently that one call to pow is probably causing a large number of functions to be generated, which takes you over the memory limit.


The first thing you could try is replacing pow(2, num_new_bits) with 1 << num_new_bits. I suspect there might be a more optimal way to implement setBits though. It’s hard to tell from that code what you’re actually trying to calculate.

1 Like

Yes, it works! Now its only 96% of memory and i have 4% left to finish my game. I forgot that doubles so hard for Arduboy. Thanks again!

1 Like

I have a strange memory problem I keep forgetting how weird I am.

2 Likes

Im trying to replace bits like this:

suppose i have

10101010
76543210

I want to replace random access bits for example at positions 321 to 111 like this:

10101110
76543210

It helps me with less memory usage.

The problem was that i already using this function. I can’t understand why compiler added more functions in memory with another call.

They’re expensive even for desktop computers.
Even if this were a desktop program, the 1 << num_new_bits solution would be much cheaper.

On x86 1 << num_new_bits would probably only be 1-3 machine code instructions, whereas pow would either be a full function call (i.e. lots of instructions), or if it were implemented with a machine code pow there would still be the overhead of converting from integer to double and back again, as well as moving the data into the floating point registers to be operated on.

Granted, when a CPU is running on the order of gigahertz that’s not a big deal, but in principle the bit shift operation will nearly always be faster. Especially if it’s something you’re doing a lot.

(A lot of games actually use float instead of double because the added speed outweighs the loss of precision.)

You mean something like this?

uint8_t setBits(uint8_t base_value, uint8_t value, uint8_t position, uint8_t size)
{
	const uint8_t mask = ((1 << size) - 1);
	const uint8_t shifted_mask = ~(mask << position);
	return ((base_value & shifted_mask) | (value << position));
}

Thus setBits(0xAA, 0x07, 1, 3) == 0xAE.
(In C++14: setBits(0b10101010, 0b111, 1, 3) == 0b10101110)

That will work, but if you’re always using literal values for the later parameters then it’s cheaper to either do it by hand or make use of some template functions because they can move some of the calculations to compile time.

E.g.

template<uint8_t position, uint8_t size>
uint8_t setBits(uint8_t base_value, uint8_t value)
{
	constexpr uint8_t mask = ((1 << size) - 1);
	constexpr uint8_t shifted_mask = ~(mask << position);
	return ((base_value & shifted_mask) | (value << position));
}

template<uint8_t value, uint8_t position, uint8_t size>
uint8_t setBits(uint8_t base_value)
{
	constexpr uint8_t mask = ((1 << size) - 1);
	constexpr uint8_t shifted_mask = ~(mask << position);
	constexpr uint8_t shifted_value = (value << position);
	return ((base_value & shifted_mask) | shifted_value);
}

(The constexpr variables will be calculated at compile time, leaving only the return expression to be done at runtime.)

Edit:
You’d use the template versions as setBits<1, 3>(0xAA, 0x07) and setBits<0x7, 1, 3>(0xAA), and the result should be a lot cheaper because the compiler will precalculate the masks. In the latter case it also eliminates all the shifts, so the result should just be a few small operations (potentially just two instructions: & and |).