Are there any guides for direct pixel manipulation (via getBuffer?)

Just got my new Arduboy and and I’ve been working on a first little game. I want to be able to directly manipulate the current state of the screen so I can create a sort of “shader” effect. In this case all I want to do is quickly invert every white pixel to black and vice versa. From the documentation I can see that there’s a getBuffer() function, but I think I don’t totally understand what this returns. Here’s some example code of what I’ve done:

unsigned char* p = arduboy.getBuffer();
      for(int i = 0; i <= 1024; i++){
        if(p[i] == 0x00){
          p[i] = 0xFF;
          p[i] = 0x00;

This mostly works, but there’s some strange effects where not every pixel is correctly inverted. When I print the buffer it appears to be much larger than the number of pixels, and there’s a range of possible values in the array that don’t match what I would expect (just white or black).

In summary, my questions are:

  1. Is there an established method of manipulating the screen directly?

  2. What kind of data does getBuffer() return?

Thanks in advance! I’m really enjoying may arduboy so far, and I’m excited to contribute to this community.

1 Like

For reference, this is the game I’m working on:

If you look at the code you can see there’s a Screen class I’ve created with a method called invert() (Although it’s not currently being used).

In the whole screen?

If so you don’t even need to mess with the buffer, you can just give the screen a command to invert its pixels and then uninvert them.

Just do Arduboy2::invert(true) to invert and Arduboy2::invert(false) to uninvert.

The reason this won’t work is because you’re trying to look at 8 pixels at a time.
The Arduboy display format uses 1 bit per pixel, not 1 byte per pixel.

If you really want to invert the display buffer rather than the screen itself then you can do:

unsigned char * buffer = arduboy.getBuffer();
for(size_t index = 0; index < 1024; ++index)
	buffer[index] ^= ~0;

^ is ‘exclusive or’ and ~ is ‘bitwise not’,
both of which are bitwise operations.

~0 gives you a value where all the bits are 1,
thus x ^ ~0 toggles all the bits in x.
^= is a compound-assignment version of ^, so x ^= y is equivalent to x = x ^ y
(The same applies to +=, -= etc)

If you want to get really technical you can read the screen’s technical documentation to understand what commands it accepts and use Arduboy2::SPItransfer(command) to send it command data, but that’s getting a bit low level.
A number of the commands already have premade functions, like Arduboy2::displayOff(), Arduboy2::displayOn(), Arduboy2::flipHorizontal(true/false) etc

A picture is worth a thousand words in this case:

Emuty's Image Format Diagram

The 1s correspond to the lighter ‘pixels’,
the 0s correspond to the darker ‘pixels’.
So basically the high bit ends up at the bottom of the column.


Thanks for the great response! That invert function looks like exactly what I need in this case. In the future, if I wanted to say, create a shader on a portion of the screen that would take in each pixel and return a pixel value, is this function what I would want to use to do so? Maybe if I wanted to do something like create a shader that “waves” an image, or dissolves a portion of the screen, or something like that.

If you need to get and set the value of a single pixel you can use the getPixel(uint8_t x, uint8_t y) and drawPixel(uint8_t x, uint8_t y, uint8_t color) functions.

Of course altering large sections of the screen will be really slow. If you can restrict your self to updating blocks that are multiples of 8 high then you can use a for loop like @Pharap suggested but restricted to the section neede.


As @filmote says, getPixel and drawPixel will work for altering individual pixels, but it will be expensive for a number of reasons.

Firstly because the CPU allows you to operate on a whole byte at a time, so operating on a whole byte is cheaper than wasting time breaking that byte down into 8 bits.
(Comparable to SSE/vector extensions for x86 CPUs.)

And secondly, AVR chips don’t have a barrel shifter, so they can’t shift multiple bits at a time, they can only shift 1 bit per instruction.
(Shift left by 1 is actually implemented as x += x.)

So if you have to manipulate the screen then you need to try to find ways of manipulating 8 pixels (i.e. a whole byte) at a time.

In general any kind of fancy graphics effects will start eating into your processing power anyway, so try to avoid doing it on games that already need a lot of processing power.

Remember that on computers/consoles, shaders are written to be processed by the GPU, so they’ll only eat into rendering time, not logic time, but the Arduboy doesn’t have a GPU, so any rendering will start to eat into logic time.

If you only want to manipulate a part of the screen then I’m fairly certain that you’d have to manipulate the frame buffer rather than the screen.

There is a way to only write to parts of the screen,
but it’s a bit complicated and I doubt it would be faster than manipulating the screen buffer because the screen is off-board and you have to communicate over SPI, which is fast but probably not as fast as an on-board RAM chip.

It might be possible to manipulate the data just before it’s sent to the screen by writing custom arduboy.display/arduboy.nextFrame functions, in which case if the effect is simple it should just about work, but it would be relatively complicated.

It’s probably worth mentioning that these sorts of effects are much easier if you don’t have any extra logic or animation going on since then you can spend the entire frame just manipulating the frame buffer, but that’s probably unlikely.

Thanks again for the reply @Pharap! I feel like I’m starting to get a sense of the limitations imposed by the Arduboy. I’m used to working with HTML Canvas which obviously has way more memory to work with. I’ll probably work up some demos to see if any of what I’m envisioning is even possible.

1 Like

This is what I did in my Fire Panic game … there are spare processing cycles as the buffer is pushed to the screen and you can take advantage of that to put some simple logic in there.

Usually it’s actually memory that’s the main limitation on the Arduboy because for most games the logic doesn’t get complex enough to start slowing down the rendering.

But graphics-related logic tends to be more processor intensive than general game logic (excluding maybe AI) purely because of the nature of the beast.
When you manipulate graphics, you’re actually manipulating a large array of bytes.

That’s why GPUs are built to do everything in a super-parallel way - each tiny GPU ‘logic unit’ processes a single pixel, because having a single processor step through each pixel takes too much time.
On a GPU, (pixel) shaders aren’t actually run on the GPU, they’re run on each ‘logic unit’, of which there could be thousands.

(I think there’s a proper name for the little micro processors inside a GPU but I can’t remember it. I think Nvidia calls them ‘graphics cores’ or something like that, but that might not be the general term, so I’m calling them ‘logic units’.)

JavaScript is a scripting language, and scripting languages have behind-the-scenes overhead that you don’t tend to think about (e.g. typically each object is actually implemented as a hash table, with each property/attribute name stored as a string in memory) so often you’re very far removed from your script’s actual memory usage.

C++ gets you up close and personal to the memory usage, moreso than even languages like Java and C#.
(And about as much as C, but possibly not as much as assembly.)

That’s a good plan.

If you get something working then make sure to add a few busy loops or delays to stress test how much you can get away with.

(Be wary of compiler optimistations. If a compiler realises that the code within a loop doesn’t have any side effects then it will get rid of the loop entirely, so sometimes you have to do some stuff to force it to not optimise the loop away.)

Shame the magazine isn’t still running, that would make a good article.

What sort of logic did you add?

Very simple clearing to black (the default), white or a top half white and bottom half black (for the night scenes so I can use graphics without masks). So I am not manipulating the memory on the way through but instead manipulating the clearing of memory after the current block has been sent.

You could easily manipulate memory before it is sent to the screen though.

1 Like

I directly access the screen buffer in Arduwars to create a transition effect.
By all means that’s not a guide on how to do it nor a best practice but heres the code:

void AWGame::makeScreenTransition(){
  // get buffer
  uint8_t *dbuff = arduboy.getBuffer();

  for (uint8_t i = 0; i < arduboy.width(); i++) {
      for (uint8_t x = 0; x < arduboy.width(); x++) {
        for (uint8_t y = 0; y < 8; y++) {
          // move even rows
          if (y%2==0) {
            if (x == arduboy.width()-1)
              dbuff[x+y*arduboy.width()] = 0; // fill black
              dbuff[x+y*arduboy.width()] = dbuff[x+y*arduboy.width()+1]; // move
          // move odd rows
            uint8_t helperX = arduboy.width()-1-x;
            if (helperX == 0)
              dbuff[helperX+y*arduboy.width()] = 0;  // fill black
              dbuff[helperX+y*arduboy.width()] = dbuff[helperX+y*arduboy.width()-1]; // move

      // draw buffer

And here’s how it looks like:


The only negative part of that code is that you’re using delay() and calling arduboy.display() outside of the main game loop (and not using arduboy.nextFrame()), which are generally bad things.

Though to do this following the common ‘game loop’ pattern you’d need a byte of RAM to track the i variable, so there’s a trade off either way.

Rambling thoughts:

I’ve got a feeling there’s a way to create a variable that you can use 2D array syntax on, but I’m not sure.

It’s a shame the Arduboy buffer isn’t already a 2D array, that would be a lot more useful because it’s easier to cast a 2D array to a pointer than it is to cast a pointer to a 2D array.

At any rate, I’m glad you brought this code up,
because I found a way to knock 48 bytes of progmem off it.


void AWGame::makeScreenTransition()
	// Get buffer
	uint8_t * displayBuffer = arduboy.getBuffer();

	// Cache all constants, just in case
	const uint8_t width = arduboy.width();
	const uint8_t firstX = 0;
	const uint8_t lastX = (width - 1);

	const uint8_t height = arduboy.height();
	const uint8_t dataHeight = (height / 8);
	const uint8_t ySteps = (dataHeight / 2);

	for (uint8_t step = 0; step < width; ++step)
		for (uint8_t y = 0; y < ySteps; ++y)
			const uint8_t evenStep = (y * 2);
			const uint8_t oddStep = (evenStep + 1);	

			const uint16_t evenOffset = (evenStep * width);
			const uint16_t oddOffset = (oddStep * width);

			for (uint8_t x = 0; x < width; ++x)
				// Move even rows
				const uint16_t evenOffsetX = (evenOffset + x);
				displayBuffer[evenOffsetX] = (x < lastX) ? displayBuffer[evenOffsetX + 1] : 0;

				// Move odd rows
				const uint16_t oddOffsetX = (oddOffset + (lastX - x));
				displayBuffer[oddOffsetX] = (x > firstX) ? displayBuffer[oddOffsetX - 1] : 0;

		// draw buffer

(The compiler was probably caching all the variables it needed to cache anyway,
but if it wasn’t then it should be now,
and either way this code is hopefully slightly clearer.)


You are amazing as always :smiley:

That’s great, thanks <3

Kinda disagree.
From my point of view it’s a simple method to do show a blocking user state.

There is no logic running where cpu time is needed in background nor is there anything frame limited.
You could argue it being hacky since it’s not in a clean pattern (main game loop) hence redundantly calling display().

But that’s a compromise imo is ok to take on such a limited system.

1 Like

Don’t tell me that, I’ll get lazy (or lazier) and complacent. :P

Using the typical switch-based state machine allows every state to be a ‘blocking’ state by simply not changing the state.
States only carry on to the next state if the code choses to do so.

It’s not just that, it’s the fact that it leads to a deeper stack as well.
E.g. you have one blocking loop that calls another function that ends up in another blocking loop and so on until you potentially get a stack overflow (i.e. you have to be careful to make sure that you exit the loops at some point).
The switch-based state machine forces each state to make sure it returns when the update step is done, so the stack always drops back down to a low level.

I have yet to see a comparison between the two approaches to determine what the actual memory impact is for each approach, so until then I’m going to err towards the tried-and-tested/cleaner approach.

If I get chance someday I’ll write a comparison to try to settle the debate.

1 Like

I think my query is related to this old post… (mods please move if not).

@MLXXXp is there anything in Arduboy2 (or an established method) to get more than 1 pixel (getPixel) at a time, but in the physical ‘screen’ format (i.e. the first 16 bytes would be the first row of 128 pixels). I believe the getBuffer() returns the actual buffer array, which doesn’t match the expected screen layout…

Ideally I’d like to read across the screen, reading pixels from left to right, than dropping to the next row and reading across. I’m interested in passing 3 bytes (24 pixels) at a time to another function. Speed is no issue but the method has to have a very small compiled size.

I’m guessing I just have to loop through with getPixel… but hoping some of the great minds here will illuminate me! :sweat_smile:

There’s nothing in the library to do this. Since pixels are arranged in vertical bytes, for each horizontal pixel you would have to read a byte and mask off or test the bit for the horizontal row you’re interested in, then read the next byte and mask/test again for the next horizontal pixel.

You could probably come up with something smaller and faster than using getPixel(), though.

You can use getBuffer() to get a pointer to the RAM screen buffer array but this array, sBuffer, is public, so you can also manipulate it directly if you wish.


If you’re doing this frequently and/or randomly, there might be an advantage to writing transform functions to convert the buffer into “horizontal” format and back to native “vertical” format. At the start of each frame you would convert once to “horizontal” then do all the manipulations for the frame, then convert back to native before calling display().

You could do the transformations in 8 bit by 8 bit (8 byte) blocks if you wanted to do it “in place”. Or, if you can spare another 1024 bytes of RAM (leaving only 512 bytes for stack, heap, system and other sketch needs), you could have a separate “horizontal” buffer, which might improve the transform algorithms somewhat.


Another option is to only maintain your ideal “horizontal” format in RAM. Then instead of using display() you would write an equivalent function that converts “on the fly” while writing to the display. This might work best if you could allocate a 128 byte array to transform 8 rows at a time to write to the display.

1 Like

What I’m hoping to do is craft some really minimal code, to take a ‘screenshot’ by dumping a base64 encoded bitmap over serial. This will be used once per game (/program) to grab the title screen. Hopefully it will be of use in development too.

As this is to retrofit into existing projects, many near the 32kb limit (!), the priority is compactness over speed.

Couldn’t you just use the emulator for this? (For any sketch that runs under it, which is most). The emulator has built in screen grab capabilities.

1 Like