Quadtree for binary image compression

So I’ve been glancing at quad tree image compression…
I’m trying to wrap my head around the resulting bitsream, and looking for anyone who has played with this (@Pharap ?)…
As far as I can see, I’d need a bitstream holding essentially the topology (branch or leaf for each quadrant), and then another bitstream (or more bits anyway) to hold colour (0/1) for each ‘leaf’/node. … or have I totally missed something? Because you need to hold 3 states (branch / white leaf / black leaf) you cannot encode each node in 1bit; spilling over in to 2bits per node feels kinda… wrong(?)…

I’ve been thinking of a 2D area based compression technique highly specific for the Arduboy, but interested in quadtrees that may be more generic.


I don’t think I’ve done quadtree image compression before (or at least if I have, it would have been years ago), but I know what a quadtree is, so I may be able to help.

You’ll definitely need more than 2 bits per node because a branch has 4 sub-nodes. That’s pretty much the definition of a quadtree.

The other problem is that if a sub-node is present you have to somehow know where it’s located, and most of the ways to do that are going to start costing decent chunks of memory.

This is probably going to get very tricky, particularly on a microcontroller, and particularly when your images are already 1 bit per pixel.

Having thought about it for a bit, I’ve thought of several possible ideas. It’s hard to know what will work out best without some example data to test with, but I’ll throw some thoughts at you anyway…

If you start with a branch node is structured like this:

struct Branch
	uint8_t colour;
	uint8_t subnodes[4];

Where each subnode is an offset in nodes, and thus must be multiplied by sizeof(Branch) to locate the node it refers to. (This is done purposely to give you a wider range for fewer bits, and to avoid the problems that come with variably-sized nodes.)

Obviously you only really need 1 bit to identify the colour, so if you really wanted you could drop that byte and steal a bit from one of the subnodes, but that could get awkward.

However, what you could do instead is instead of treating that byte as a simple ‘black’ or ‘white’, you could also include common dither patterns, or an ‘inverse’ bit, or a draw mode indicator. That way if you’re forced to spend a whole byte you don’t have to waste a whole byte. The downside is that you’ll end up chewing up CPU cycles doing fancy patterns, and it’ll be more progmem to implement the patterns, but nothing in programming is free - everything’s a tradeoff.

Since a branch node already represents a block of colour, representing a leaf node as a colour and four null offsets seems a bit wasteful considering only one of those ~40 bits would actually be meaningful.

So here’s the clever part: instead of wasting those 4 subnode bytes, you can reuse them as image data. Unfortunately that only gives you 4 rows of data, so you may have to double-up the data somehow. Or again, you could use dither patterns, or you could somehow use those bytes as indices into an array of small sprites (more like tiles) included with the data.

The interesting thing about that is that it should help to mitigate one of the problems with compression - detailed images don’t compress as well. This way even if you end up with a few small detailed sections, you can include that detail verbatim without too much extra cost.

Obviously this deviates from a pure quadtree system, but real-world image formats rarely are ‘pure’. The inclusion of bitmap data puts me in mind of the 64 discrete cosine transform patterns used in JPEG.

I’m still not sure quite how well this will actually work, particularly without sample data to work with.

1 Like

Oh, and in regards to what the colours mean in that twitter image: I’m presuming that the the colours relate to the depth of the node in the tree. It starts white (representing the root node - the whole image), shifts towards blue, then to red as it gets deeper, and finally to black at the deepest.

That’s why the top left is the brightest area - it requires the least node depth of any region because it’s a large empty space, and why the black areas are directly around the most detailed/noisy parts of the image.

It’s also possible that blue represents white areas and red represents dark areas. Best wait to see if the guy replies to you to find out the truth.

1 Like


I managed to find the code for this - it’s in python via PIL.

This program loads the image (which I think was originally from thispersondoesnotexist, but I’m not 100% sure…), turns it into a quad tree representation (in this case just a big array of True or False, representing a stream of bits), then back into an image.

The colours were debug info to help me figure out why it wasn’t working, and 'cos it’s cool to see the different quad sizes.

A nice thing about this particular format is it doesn’t have any concept of node structs or addresses. Each successive bit just tells you the next thing you would want to know. The very first bit tells you whether the entire image is a solid colour or not. If so, then the next bit tells you which colour and that’s it, file ends. If it’s not then the next bit tells you whether the top left quadrant is solid or not, recursing if it’s not and telling you what solid colour to use if it is. Then you will get the same treatment for the top right, bottom left, bottom right. Does that make sense?

For very thin linework it might be better to omit the colour bits for the larger nodes and only stop on solid background squares or when you hit 1x1 pixel nodes. You’ll probably get down to that level of detail there anyway and it’d save you having to write out a background colour bit for all the large nodes.

The reason I looked into this is I was keen to try some form of area based compression, but everything I came up with burned a ton of data describing where the areas of solid colour were. So this approach was designed to do as little of that as possible :slight_smile:

The black and white image in this post averages out at ~0.23 bits per pixel. Not astounding, but it gets better at higher resolutions.

I grabbed a 64x64 of castpixel’s linework from your project and only managed to compress it to 70% using this technique, compared to just storing one bit per pixel. Using the linework variant I described above only brought it down to 64%. Compressing small images is hard!


Thanks @pharap and @Farbs for the detailed replies. The posted code will be super useful!

I should have been more precise in my opening post; I’m focussed on what is the most compact bitstream, and then dial that back based on implementation details. I was stuck on a fixed bit length symbol scheme that intuitively felt wasteful, but thanks to @Farbs tweets, I sketched out this variable bit length scheme. Does it make sense?. (Obviously there’s no efficiency gain for such small 8x8px images!).

I don’t think there’s a more compact representation for the network (1= branch, 00 = white terminal node/leaf, 01 = black terminator node/leaf, and as standard the 1x1px level is a literal bitmap). Of course other implementations may then use statistical models or just RLE to further compress the bitstream.

@Pharap an obvious difficulty of the variable length symbol encoding, is then the overhead to effectively stream data from PROGMEM, as well as RAM usage, I’m sure.

The real world results are very interesting. @Farbs it’d be great to test some more typical screens. I suspect my original idea for a very specific, area based compressor, tuned to this kind of artwork (128x64px with dithering) will win out… but intellectually there’s something really nice about the quad tree approach.

It does make sense, and compression-wise that’s a good scheme.

This is one of the reasons I didn’t start off thinking about a variable length scheme.

The problem with a variable length scheme is that you more or less have to read the entire stream to draw even part of the image (depending on which part you’re trying to draw).

That’s one of the reasons I went for a fixed-length scheme. I was thinking about the ease of skipping over data that you don’t need. If you have offsets stored as part of the data then you can skip large chunks of image data by just adding the offset, and that’s a very cheap operation compared to having to still interpret the branches you don’t really need (which will inevitably involve a lot of shifting).

If you’ve only got a small number of images that you want to render at once, or you’re happy to go for a ‘dirty rect’ approach instead of clearing the screen then using the VLE approach could be viable, but if you’ve got a lot of images then it might chew up too much processing time, leaving you with nothing left for game logic.

The other reasons I defaulted to a fixed-length scheme are ease of representation, and that with quadtrees as used in space partitioning in video games (which is the application of them I’m more familiar with) the ability to easily seek down into a branch to extract data at a leaf is an important property.

So the core of my answer remains the same: the scheme that works best will depend on your intended application of the technique, and we can’t be certain about what’s viable without actually implementing some schemes and benchmarking them.

If you’re going to venture into the realm of variable length compression schemes, you may wish to look at the compression scheme already offered by Arduboy2:

I believe I was told that someone in Team ARG originally devised it, and I think the compression tool was on Team ARG’s website (which means it’s probably in that archive I made, which I really need to tidy up properly at some point).

(I have a vague idea of how it works because I actually tidied it up a few years ago.)

Don’t even get me started on trees, I might turn you into a Haskell programmer. :P

1 Like

I’m imagining just doing 128x64px (fairly static) screens, so you’d want to decode all the data in one go anyway. (As opposed to using quad trees for collisions, etc. which is more dynamic and you’d want a ‘skippable’ / seek-able stream - as you pointed out).

If I were to write the decoder, I was thinking to just use fillRect initially. I think it’d be possible to construct the stream to minimise how much state information needs to be held (as per my diagram). i.e. you traverse each branch all the way down, to the lowest level, before moving to the next branch.

That’s super helpful. When I last looked (way back), the source was far less tidy. Thanks for tidying it up.

When I make some time, I’d like to compile a corpus of real world screens, to evaluate the different compression techniques.

1 Like

This is why I was asking about the use case. That’s quite different to what I was imagining. I was thinking that you’d have much larger backgrounds that you’d be scrolling through.

If it’s just for 128x64 backgrounds then yes, you can get away with having to decode the entire stream at once.

That also means that depending on the application you may want to not bother clearing the screen so you spend less time re-decoding the background.

That certainly would work for an initial version, however…

If you’re only wanting to do 128x64 static images then you’d probably be best off writing code that will decode it into a provided 1024 byte buffer, and then you could just pass it the frame buffer. That would allow you to skip the overheads of fillRect (e.g. repetative bounds checking).

One other point to mention about VLE versus FLE: FLE allows you to calculate the value for each pixel precisely once, whereas using VLE and a fillRect approach means you’ll inevitably spend cycles rewriting the same pixels.

For your case that may not be a problem, but I’m mentioning it anyway for the sake of completeness, and in case it helps anyone else who may stumble upon this thread in the future.

1 Like