Using display(CLEAR_BUFFER) instead of clear() and display()

If you have a clear() at the start of the loop, before rendering, and display() at the very end of the loop, after rendering, you can simplify and speed up things a bit by eliminating the clear() and using display(CLEAR_BUFFER)

This may result in one (only one) “scrambled” frame, containing the contents of both whatever was in the screen buffer when setup() exited (like the logo) and the first screen rendered after entering loop() for the first time. However even if it’s noticeable, the advantages may be worth it.

1 Like

Huh, I’ve been here over two years now and I’ve never seen that mentioned before.

Maybe you ought to write a “most underused Arduboy2 features” article or something?

3 Likes

So, I gather from reading the code it simply does the equivalent of clear() immediately after the display(). Could the single ‘scrambled’ frame be negated by performing a clear() as you transition from setup() to loop()? Obviously, this would not result in space savings but would give you the performance advantage.

Of course, you had to display the logo somehow and probably also used display(CLEAR_BUFFER) so this issue probably wouldn’t occur on the first frame of main loop() but rather when the logo was displayed? If you used clear() at the start of setup() would the issue vanish altogether (again clear() is being used and has a memory impact - however small).

Correct.

No, bootLogo() uses clear(), not display(CLEAR_BUFFER). This is because the Unit Name is overlayed on the logo screen by a separate function. Also, the logo is left in the screen buffer in case someone wanted to do something with it before starting the sketch, such as fading it out by writing random black pixels.

So, if you keep the logo, then as you say, you’re just gaining the advantage not spending the time to do a separate clear loop. (In the next release of the library, doing the clear using display(CLEAR_BUFFER) won’t add any more time than what using just display() will. The clearing is squeezed in while waiting for each byte to be sent.)

However, people looking for code space will probably eliminate the logo entirely. Since that’s the only place in the library where clear() is called, if they never use clear() anywhere else then they will gain the code space that clear() would have taken.

Clever!

Dare I bring it up … or they don’t like the logo or they remove it when testing and don’t put it back (err, like me).

I wonder, if you do not use the logo will the buffer be ‘empty’ on startup or will it contain junk? Something for me to test when I get home.

1 Like

@MLXXXp you might want to split this conversation into its own thread.

I have tested the compilation size and can confirm that clear() uses 20 bytes. Not much - so you can either leave it in when using arduboy.begin() (with logo) or take it out when using arduboy.boot() without too much of a memory impact.

1 Like

Yes, it’s more the time savings rather than the code size. Even though clear() is only about 20 bytes, it’s a loop that runs 256 times (clearing 4 bytes per loop, for 1024 total), which takes over 2800 instruction cycles to execute (even after having been optimised in assembler).

2 Likes

The screen buffer, sBuffer[], is declared as a static variable in class Arduboy2Base. I think by being static, it will be initialised to zeros by the compiler. This is true for global variables outside of any function or object, but I’m not entirely sure if it applies to static class or struct variables.

http://en.cppreference.com/w/cpp/language/zero_initialization

Something to test later. I like the idea of saving 20 bytes and 2800 instruction cycles (per frame) !

really?
That sounds not efficient.
I don’t really care about RAM and ROM (programdata) since the largest of my games eat up 60% 55% of them. 2,800 cycle seems a bit too long just to clear the screen.
What about drawing a big black thing to cover up the entire screen?
One of the reason the example draws black things when the arduboy.display(); is coming up.
P.S.: How is it possible to split and do other things (like unlist, or close) a post? That’s cool stuff.

Essentially, that’s what it does. Clearing the screen involves writing to a large amount of memory. The number might sound scary, but when you tally up the amount of cycles the rest of your game takes, its really not that much.

It can probably be done with a few less cycles (by further unrolling) but that would mean more code for not much speed. Instead, the best that can be done is clearing while writing to the screen, as @MLXXXp said.

This doesn’t apply to all systems, but not clearing (if the game is going to redraw the entire screen anyway) is a common optimization in games. The exception being mobile phones, recent nVidias, and, if I remember correctly, the Dreamcast.

Mod powers.

3 Likes

That’s alright for you, but some of us have games that chew up over 90% of progmem and would be grateful for the saving.

(I get the feeling @filmote’s already planning a PR to Dark & Under, and an update to many of his other games.)

1 Like

I guess that really would’ve depend on the code.
display(CLEAR-BUFFER) sounds good in this state as it basically updates the screen from a previous one to the current one, and if everything remained it does nothing. having the best logic to do that.
Cool stuff, but I guess it is really not that necessary when say, writing a … Snake game.
Would be helpful at keep everything fast and cool though.

The screen buffer is 1024 bytes, so even if you had a single “fill 1024 bytes with 0” instruction it would still take a minimum of 1024 cycles to complete. (You need at least one cycle per memory byte written.) 2800 cycles is 2.73 cycles per byte written, which is pretty darn good for a RISC processor.

And from an overall view of things:

Lets say our game runs at the default 60 frames per second. That’s 16.67ms per frame.

The processor runs at 16Mhz. An instruction cycle takes one clock cycle, so an instruction cycle is 62.5ns (0.0000625ms). 2800 cycles will take 0.175ms to execute.

The display() command writes the entire 1024 byte screen buffer to the display. It does this serially over the SPI interface at the maximum rate the processor can do, which is 8Mbps. At that rate, due to display requirements, we need a minimum of 18 processor cycles per byte, plus a bit of set up overhead. So, the display() command takes about 1.16ms to execute (at least it will on the next Arduboy2 library release. It’s a bit more than that right now).

Subtracting the display() time from the total frame time we get
16.67ms - 1.16ms = 15.51ms
This is the amount of processing time we have to render a frame at 60FPS.
15.51ms is time for 248160 processor cycles

If we do a clear() command once per frame:
The 2800 cycles that a clear() takes to execute, as a percentage of the 248160 cycles per frame that we have available, is 1.13% of the frame time. The other 98.87% is available for rendering the frame.

5 Likes

I know that have absolutely no effect on how the game runs, especially when talking over some game with 55% resource use.
That is good to know.

I would love to give you the “Most amazing answer”-award. :1st_place_medal::trophy:

1 Like

If clear is still required to remove visable junk you can ommit using it by using display(CLEAR_BUFFER) twice like this:

arduboy.display(CLEAR_BUFFER);
arduboy.display(CLEAR_BUFFER);

This idea is not mine. I saw it in one of the games but can’t recall which one it was (Let the rightful bright mind stand up now :slight_smile: )

and I’m just thinking now: To prevent using a 3rd usage of display(CLEAR_BUFFER) for the actual displaying you can put one occurance at the end of your setup() loop and one at the beginning of your main loop() like this:

setup()
{
  //do your initialisation stuff here

  arduboy.display(CLEAR_BUFFER);
}

void loop()
{
  if (!arduboy.nextFrame()) return;

  arduboy.display(CLEAR_BUFFER);

  //do your main stuff here
}

Huh, why would you want this at the start of loop()?

So the first (in setup) and the one in loop will be executed shortly after each other so when there is any junk in the display buffer it will not be shown longer then 1 frame. It will also display the last rendered buffer and clear the buffer for the next render.

On 2nd thought. The display(CLEAR_BUFFER) in setup may even not be neccesary if the 1st loop will stay within the framerate any junk will only be visible for a single frame. There will be a lag of one frame ofcourse but it will not be noticable at higher framerates.

I understand you might want two arduboy.display(CLEAR_BUFFER); as you transition from the setup to the loop() or any other transition where you might have garbage on the screen. However, the arduboy.display(CLEAR_BUFFER); is a relatively heavy exercise compared to the original clear(). For the 20 bytes, I would use celar() in this example.

1 Like