Tips for saving/reducing SRAM usage?

(Andrew Crawford) #1

Hey everyone, I’ve been developing/working on a couple different games/ideas kind of trying to get a better feel for limitations as I go (since I’m more new to this)… I started out trying to keep my sprites as simple/small as possible and thanks to @Pharap I’ve cut down on a lot of RAM and ROM that would otherwise be wasted using things like structs and functions… I still have a decent bit of space to work with to be able to turn either game into something (and can hopefully find/implement more ways to reduce memory usage), but all that being said…

Are there any tips for saving RAM or reducing usage? I know it may be a bit of a loaded question because it would depend on what you were doing, I’m just curious as all I’ve come up with only says to put constants into progmem and cut down on local variables and that’s about it. I know there are ways to reduce ROM usage using things like compression (although this requires more RAM, of course), I was just wondering if there was anything else out there for the opposite (less RAM more ROM), and for what applications, and couldn’t find any similar threads so just figured it was worth asking. Thanks!

(Simon) #2

As you pointed out … it really depends on what you are doing.

If you are building a game with a large world or multiple levels, you will want to look at compression. In its simplest form, you can just pack multiple objects into a single byte. ‘Run Length Encoding’ is also really simple to implement and is great if you have a bit of RAM to expand the level into so that you only need to do it as the player enters the level.

A simple example of packing objects into an array:

Before:

const uint8_t world[] PROGMEM = {
0x01, 0x00, 0x00, 0x00, 0x00, 0x01,
0x01, 0x00, 0x00, 0x00, 0x00, 0x01,
0x01, 0x00, 0x02, 0x00, 0x00, 0x01,
0x01, 0x01, 0x01, 0x01, 0x01, 0x01 };  // 24 bytes

uint8_t getWorldElement(uint8_t x, uint8_t y) {
  return pgm_read_byte( &world[x + (y * 6)] );
}

After:

const uint8_t world[] PROGMEM = {
0x10, 0x00, 0x01,
0x10, 0x00, 0x01,
0x10, 0x20, 0x01,
0x01, 0x01, 0x01 };  // 12 bytes

uint8_t getWorldElement(uint8_t x, uint8_t y) {

  if (x % 2 == 1) {
    return pgm_read_byte( &world[(x  / 2) + (y * 3)] & 0x0F );
  } 
  else {
    return (pgm_read_byte( &world[(x  / 2) + (y * 3)] & 0xFF ) >> 4);
  }

}

As you can see, it does not use any RAM to unpack the world initially, it simply does it on the fly.

(untested code but should work).

1 Like
(Andrew Crawford) #3

Oh alright, thanks! I was under the impression that compression was typically better on ROM than on RAM (just what I gathered), but I could see it potentially being more useful to do more with less too, so to speak.

I don’t quite understand the math behind your example, but I think I understand the concept, if I’m not mistaken… is it essentially a way of cutting down on the data stored if there are multiple of the same value sequentially? Sorry, just want to be sure i follow. Thanks again!

(Simon) #4

Right … I should have been more specific.

Taking the third row, in the before example:

0x01, 0x00,    0x02, 0x00,    0x00, 0x01,

which is the same as:

1, 0,     2, 0,    0, 1

and reduced this to:

0x10, 0x20, 0x01,

The two bytes are packed into one. The first byte of the original data becomes the ‘top half’ of the byte for the output and the second byte becomes the ‘second half’.

Hope this makes sense.

1 Like
(Pharap) #5

Local variables don’t necessarily use RAM.

It depends on how many you have and how many are being used.

For example, loop counters almost always spend their entire lifetime in registers rather than RAM.

Global variables on the other hand will always use RAM.

Identify what data you have that doesn’t need to change and make it constexpr.
A common one is sizes (i.e. width and height), because objects in games rarely change size.

Also look for redundancies.

(I’m mentioning these because I know these both apply to your game.)


Actually it’s 12 bytes.
(6 * 4) == 24
((6 / 2) * 4) == (3 * 4) == 12

You’re better off using ((x % 2) != 0),
because if x were signed that would cause a bug with negative values.
It won’t matter in this case because the integer is unsigned, but it’s a good to make a habit of using != 0,
because then you don’t need to worry about whether the number is signed or unsigned.

Alternatively you could just use ((x % 2) == 0) and swap the argument order.

E.g.

constexpr size_t worldWidth = 6;
constexpr size_t worldHeight = 4;

const uint8_t world[] PROGMEM
{
	0x10, 0x00, 0x01,
	0x10, 0x00, 0x01,
	0x10, 0x20, 0x01,
	0x01, 0x01, 0x01,
};

uint8_t getWorldElement(uint8_t x, uint8_t y)
{
	// 'Flatten' the coordinates to get an index
	const size_t logicalIndex = ((y * worldWidth) + x);

	// Account for packing
	const size_t physicalIndex = (logicalIndex / 2);
	
	// Get the byte
	const uint8_t value = pgm_read_byte(&world[index]);

	// Unpack the byte and return the result
	return ((x % 2) == 0) ? ((value >> 4) & 0x0F) : ((value >> 0) & 0x0F);
}
2 Likes
(Boti Kis) #6

I want to mention some obvious things:

  • Use the smallest primitive types you need.
  • Plan ahead for your own datatypes to also use the smallest primitives.
  • Think about which of your datatypes you will store in arrays.
  • Optimize the shit out of those. See an example below.
    Like @filmote suggested you can compress data in different ways.
Click to view the Sample

Lets have a simple enemy which stores its health, its position and the damage it deals.

class Enemy
{
   public:
   uint16_t healtpoints;
   uint16_t damage;
   Point position;
};

An object of this class is going to have 8 bytes.
The smallest primitive has 1 byte which means you can store an integer number up to 255 unsigned.
The health and the damage probably don’t need to be large.
In that case we can reduce to uint8_t instead.

So we can reduce to:

class Enemy
{
   public:
   uint8_t healtpoints;
   uint8_t damage;
   Point position;
};

By doing that we already save 2 bytes which will result in a final size of 6 bytes for the instance.
Also there are probably going to be more enemies which are stored in an array.
This array is now going to be smaller :smiley: .

But we don’t stop here.
The damage will be probalby the same for all enemies(depends on your games design).
So we can make it a class variable instead an instance variable of it.

class Enemy
{
   public:
   constexpr static uint8_t damage;

   uint8_t healtpoints;
   Point position;
};

By doing this the damage is part of the class instead of every instance which saves us ram if we have a lot of instances of Enemy. It can be accessed by Enemy::damage.

And again we don’t stop here.
The Point is convenient for storing the position but do we really need 65.536 vertical and horizontal positions? Probably the map won’t be that big so one byte with a max value of 255 could be enough. By unwraping the Point into two seperate uint8_ts we can save 2 more bytes:

class Enemy
{
   public:
   constexpr static uint8_t damage;

   uint8_t healtpoints;
   uint8_t posX;
   uint8_t posY;
};

Which save us again 2 more bytes and leave us with an instance size of 3 bytes.
One could argue to make a custom Point2 class which holds the two uint8_ts.
But we can compress even further by not doing that!

If the hp of the enemy doesn’t need to be big, probably only around 10, we would only need 4 bits (2^4 = 16) of the whole byte. Also the map can be also smaller than 256x256.
Let me introduce you to bit fields.
With bitfields you can map single bits of a byte to a member.
With that we can do:

class Enemy
{
   public:
   constexpr static uint8_t damage;

   uint8_t healtpoints : 4;   // The : 4 says only use the first 4 bits of the uint8_t.
   uint8_t posX : 6;           // The : 6 says only use the next 6 bits of the uint8_t. If the value doesn't fit anymore in a uint8_t, a next uint8_t will be started.
   uint8_t posY : 6;           // The : 6 says only use the next 6 bits of the byte. 
};

This takes up 16 Bits which results in only 2 bytes for one whole instance.
That’s a reduction to only 25% compared to our first try!

Note: These reductions are mostly relevant for objects in collections.
Note: Bitfields also have some limitations. You should not mix types, can’t do references to them, etc.

  • Avoid using dynamic memoryA
2 Likes
#7

PROGMEM saving:

  • Try to ommit using if else with simple code.
    Like :
if (condintion)
{
  x = 1;
}
else
{
 x = -1;
}

could be written as:

x = -1;
if (condintion)
{
  x = 1;
}
  • When if else is needed try to mimimize redundant code like @filmote example:

could be written as

uint8_t getWorldElement(uint8_t x, uint8_t y) {
  uint8_t element = pgm_read_byte( &world[(x / 2) + (y * 3)] ;
  if (x & 1 != 0) { // same as x % 2 != 0 but doesn't use % operator
    element  >> =  4;
  } 
  return elelement & 0x0F;
}

As for RAM saving:

  • for small number ranges use uint8_t (aka byte aka unsigned char)
  • use PROGMEM macro with read only datastructures.
  • use the F() macro for text strings F(“text” )
  • try to use local variables over global variables
  • try to initialize global variables with zeros as much as possible (use 0 as default state for your code)
  • use constexpr instead of const for constant expressions.
2 Likes
(Simon) #8

Mmmm … 24 / 2 = 14? Huh, oh right I failed math in year 8.

2 Likes
(Pharap) #9

This is the only piece of advice I’m not entirely happy about.

In general people should avoid bitfields because they behave differently on different systems.

If you’re not at all worried about the code working on another system or how the data is actually represented then by all means use bit fields,
but if those things do matter to you then you should do the shifting and masking yourself.


I’d suggest profiling before resorting to this.
In most simple cases you’ll probably only save about 2-3 bytes of progmem,
assuming the compiler hasn’t already pre-empted you.

In general you shouldn’t worry about this,
the compiler will optimse a modulo by a power of two to an &.

If it doesn’t, file a bug report immediately.

Unless a class has a user-defined consturctor or has explicitly initialised to another value,
then all globals will be zero-initialised anyway.

It’s in the rules.

Zero initialization is performed in the following situations:

  1. For every named variable with static or thread-local storage duration that is not subject to constant initialization (since C++14), before any other initialization.

(Globals have static storage duration.)

Otherwise, good advice.


Spelling too. You mean maths.

1 Like
(Simon) #10

No … I mean math as in the shortened version of mathematics. I do not spell the full word mathsematics :slight_smile: Or maybe this is the one US version of a word I am comfortable with.

1 Like
(Pharap) #11

Yeah, that’s maths. Mathematics is plural, thus so is maths.
(Technically it’s uncountable, but historically it was treated as a plural,
up until at least the early half of the 19th century.)

Just like:

  • abdominals becomes abs
  • quadriceps becomes quads

Likely.

1 Like
(Simon) #12

We have one abdomen … but we have multiple abdominal muscles. Unless you are fat and lazy like me, I seem to only have one abdominal muscle. The rest are on holidays somewhere.

But this has nothing to do with saving memory. Back on topic.

2 Likes
(Pharap) #13

I’m a skeleton…

Agreed.

1 Like
(Boti Kis) #14

Agree.
Wanted to catch that with my note with the limitations at the very and but good that you mention it. :v:

2 Likes
(Pharap) #15

I’m glad you replied again, I suddenly realised there’s another caveat I forgot to mention!


One of the golden rules of programming:
Everything is a trade-off.

Both using bitfields and using shifting & masking are a tradeoff.

They reduce your RAM size, but there will be a penalty to your performance and progmem.
The performance is less of an issue, because the Arduboy’s processor is pretty powerful.

The progmem on the other hand is an issue.

On their own, bishifting and bit fields will use more memory because they need to use additional shifting and masking instructions every time a variable is read from or written to.
These instructions are relatively small, but they can soon mount up.

However, there’s an additional complicating factor on Arduboy.
The Arduboy’s CPU (like all AVR chips) doesn’t have a barrel shifter,
which is a hardware component designed to do bit shifting in constant time.

As a consequence, shifting more than once means the compiler either:

  • generates multiple single-shift instructions
  • performs the shifting in a loop
  • generates a shift function and calls that function

Either way, it’ll eat more progmem than a system that does have a barrel shifter.

So be aware that your progmem will be taking a hit if you choose to do bit packing,
so you should think carefully about what you’re using bit packing with and decide if the trade-off is worth it.

In @filmote’s example where the map is packed the trade-off isn’t a problem because the benefit of halving the size of all your maps greatly outweighs the cost of the additional shifting & masking code,
but if you’re doing shifting & masking on frequently accessed member variables then the cost could soon mount up.

2 Likes
(Andrew Crawford) #16

Ah, thank you all so much, that’s incredibly helpful!

2 Likes
(Kevin) #17

I think the biggest advice for RAM saving in general that @Mr.Blinky talked about is just being careful about the scope of your variables.

I’m extremely guilty of declaring global variables in mass just because I’m lazy and never writing anything that is complicated enough for it to be a concern.

I’m not sure if it’s been covered or considered, but the ultimate RAM savings is to use a direct draw method so you don’t need to store the screen buffer in memory. This is heavier on the code implementation side though, so you are trading flash for ram.

2 Likes
(Pharap) #18

Your mention of the screen buffer reminded me…

If the screen is going to be cleared or drawn over anyway then it’s possible to use the screen buffer as extra memory for calculations.

But you have to be very careful and make sure you know what you’re doing,
otherwise you could end up with severely corrupted objects,
which in turn can lead to any number of bugs.

It’s also only useful for data that doesn’t need to persist between frames,
which means it’s a very niche optimisation

1 Like