Box2D-lite physics engine on Arduboy

I managed to get Box2D-lite, a full 2D rigid-body physics engine, running on the Arduboy! I had to convert all the floating point numbers to fixed points and introduce standard C++ libraries missing in the Arduino IDE. Its by no means optimized. I’m just excited it works at all and wanted to share. Maybe it’ll inspire someone.

ArduBox2D-lite-demo-v1.1.hex (64.8 KB)

This alone is just a proof of concept and a hack on all levels. I have some novel game ideas involving it and the Arduboy. But in its current state, there are plenty of improvements that need to happen first.

If anyone wants to directly help the project or GitHub or make suggestions while I teach myself about programming physics sims, I highly encourage it. It is indeed a fun topic for me.

More info in the GitHub:

Short demo on real hardware:


Planning a Angry Bird-like game for Arduboy?:wink:

Nice work and wow does this start glitching out pretty hardcore once the boxes fall off in the emulator!

Edit: It doesn’t seem like you can control anything?

That’d be cool. I was just thinking about silly block stacking games for now.

Yeah, all the fixed point values wrap from -128 to 128. I’m not checking for overflow, but its nice to know it handles it gracefully.

Haha not yet. I was too excited and had to share. Soon though. there’s a surprising amount of flash left over. RAM is another story.

Update: Added controls to original post: Pause, reset, and reset+1.

This is awesome! If it ever gets a little more performant I think I might use it in my flappy bird clone!

1 Like

A big way to cut down the size would be to implement fixed point versions of sin, cos and sqrt, primarily because that would eliminate the last uses of float and eliminating float would free up the progmem used to implement float in software.

I know a decent way to implement sin and cos using nothing but a 64-element lookup table and a few short functions if switching to brads is an option, but if radians are a requirement then you’re on your own there.

By the way, the Clamp you commented out will be better than Arduino’s constrain. constrain is a macro so it’s liable to unnecessarily create more code (depending on how good the compiler’s common subexpression elimination is and whether the expressions have side effects), whereas the Clamp being a template function will always force the arguments to be evaluated first, and thus is more sane and more likely to do what’s expected.

I was working on some kind of lookup table this morning. But I have a feeling that’s straying away from brads, which seems like a worthy option.

I saw that comment to a feature request on your GitHub. I don’t see why it wouldn’t be an option considering the discrete nature of the 128x64 simulation space. That’s an element of optimization beyond what I currently understand though.

Also yes, I reverted the constrain() back to Clamp() and it tested fine. I originally did that when I was freeing up some memory before I moved away from floats. Seemed to help back then.

The “in the past” I was referring to was likely this.

I did something similar with this old particle demo, which includes some C# code I used to generate some angles.

Of course the downside to brads is that since they aren’t radians any other equation or function expecting radians would have to be adapted.

Brads or implementing the lookup table?

That was probably a fluke or circumstantial. The template version should never be less efficient than the macro version unless it’s not being inlined for some reason.

I was looking for something like this! Nice!

Brads. I think I met him at a party once. :upside_down_face:

1 Like

In hindsight that Wikipedia article probably doesn’t explain them very well.

Brads are easy. They’re like degrees, but instead of dividing a circle into 360, you divide it into 256.

Much like 360 deg = 0 deg, 256 brad = 0 brad.

The main point of using brads is that 256 is the point at which an unsigned 8-bit integer naturally wraps around back to 0, thus allowing the modular arithmetic to be implicit and eliminating the need for an explicit modulo operation.

It also has the handy quality of keeping look-up tables small. For sine and cosine, since the values of quadrants 1 through 3 (0-indexed) are effectively the same as for quadrant 0 with some minor transformation applied, it becomes possible to implement both sine and consine with a mere 64-element lookup table.

As an added bonus, if you have a fixed point format with 8 bits for the fractional part, that fractional part is a number of brads and the whole value thus represents a number of turns. E.g. 1.25 in UQ8x8 format is 1.25 turns, and the fractional part alone (0.25 as a fixed point, 64 as an integer) is 64 brads. (I’m not sure how well that plays with negative values though.)

1 Like

That was the key phrase I was missing.
I got the gist of it, but the “why do we want to use brads” was cloudy for me. This answers that. Thanks for clearing it up!

I’ll try to implement something with that into this physics engine when I have another moment to focus. One thing I’ll have to see is if any negative data types will cause issue; like -19.24rads/sec. I didn’t study how rotational velocity is calculated yet.

I seem to be having trouble converting SQ7x8 radians to uint8_t binary radians so I can input it into @Pharap 's trig functions. I’m getting either all 0.0 or whole numbers greater than 1 when i cast the results back to SQ7x8. Maybe casting fixed points doesn’t work like I expect it to.

I know eventually the whole engine should use brads universally. But for now I just want to test to see if the physics still works with brads.

How exactly are you doing it?

I would presume multiplying the radians by τ (i.e. 2π) to convert it to turns, multiplying that by 256 to get a fractional brad, and then extracting the integer part to get a whole integer brad, but I expect there are other methods too.

What are you expecting?

And what is the actual code you are using?

I can’t really help much if I don’t know the context.

uint8_t binary_rad = static_cast<uint8_t>(round(rads * 256 / (2 * k_PI)));

I figured overflows would help solve for radians greater than 2π, but rads in this example is a SQ7x8 variable. So maybe I should make a new uint16_t variable with the conversion, THEN cast it to a uint8_t? (Just tested this in binSin() below and it still seems to return zero)

Say I have your Trig.h functions adapted here with the rad-to-brad conversion like so:

inline SQ1x14 binCos(SQ7x8 rads)
  // Convert to brads
  uint8_t brads = static_cast<uint8_t>(round(rads * 256 / (2 * k_PI)));  
  // Your quarter, index, and AngleLookup for sin code goes here.
  // Returns a SQ1x14 (-1 to 1)

And I call it while making a new SQ7x8 variable like so:

SQ7x8 c = static_cast<SQ7x8>(binCos(angle))

If binCos(angle) returns say 0.999847 (largest value for SQ1x14?), and I want to cast it to a SQ7x8 called ‘c’. I would expect c to be 1.0 or at the very least the least closest step of 0.99609375.
but I seem to be getting whole numbers less than 4 regardless of angle values.

I’ve done some investigation.

Possibly not enough to explain everything, but certainly enough to identify some issues.

Firstly, Arduino’s round function behaves incorrectly for fixed points.

Arduino’s round is defined as thus:

It has the implicit assumption that (long) rounds towards zero, but with the way casting is defined for SFixed, casting to long rounds towards negative infinity, thus round ends up doing a -1 for negative values.

roundFixed does not have this problem.

Secondly, doing a multiply then a divide causes precision problems. * 256 effectively discards the entire integer part of an SQ7x8.

rads * (128 / k_PI) or rads * (256 / (2 * k_PI)) is better, but if k_PI is SQ7x8 then you’ll end up with a negative value because 128 and 256 are both larger than SQ7x8 can hold.

Thus the sensible thing to do is rads * SQ7x8(128.0 / 3.14159).

(I don’t think the constructor is strictly necessary, but it makes it clear that the floating point constant is being turned into a fixed point constant.)

I’ve uploaded an improved version of Trig here:

It contains the following function which may be of use to you:

template<unsigned Integer, unsigned Fraction>
constexpr SFixed<Integer, Fraction> radiansToBrads(SFixed<Integer, Fraction> value)
	return (value * SFixed<Integer, Fraction>(128.0 / 3.14159265));
1 Like

Thank you so much for your awesome help! I’m definitely learning new things here.

This is amazing! It works great too. The only thing that has been challenging my evening today has been because of issues casting from a SFixed<1, 14> (i.e. sinFixed and cosFixed) to a SQ7x8 for the matrix operations. I think I was running to the same issue this morning too.

I was doing this:

SQ7x8 c = static_cast<SQ7x8>(cosFixed(static_cast<uint8_t>(radiansToBrads(angle))))

Which caused c to apparently read various vales between 0 and 4 when running the default simulation. Should I be doing something else?

Things came back in range and the simulation worked when I let AngleResult = SFixed<7, 8> but I’d like to keep the precision of a SQ1x14 where possible, and cast to the nearest SQ7x8 when needed.

No problem. Helping people is the main reason I visit the forum.

It’s also nice to see my library getting some use.
Most of the time I never know what it gets used for or if it’s even getting used, or if those stars on the library are people who visited once and then never did anything with it.

There was a bug in the SFixed conversion operator. I just fixed it and created a new release of the library. The Arduino library system should notice it within the hour and provide the update for everyone.

If you can’t wait, you can grab a copy from the GitHub repo and update it manually.

(One day I’d like to remake the library from scratch if I can ever find the time and motivation.)

To be honest, if you’re only working with SQ7x8 and aren’t likely to use any other type, the extra precision isn’t likely to be of much use, so changing AngleResult to SQ7x8 is what I’d recommend doing anyway.

Odds are all the calculation results will be exactly the same and your code will be a bit smaller.

Bear in mind that the result still won’t be absolutely perfect because by using brads and fixed points instead of radians and floating points you’re effectively trading away precision and accuracy for smaller and faster code, but the end result is that you might be able to afford to stack a few more boxes.


It’d get even more use if fixed point numbers were more commonly expressed in the Arduino communities. I have a newfound love for fixed points thanks to this library given that the benefits are huge for microcontrollers without an FPU. Looking back on all my previous uses of floats, I can see the precision they gave me was entirely unnecessary.

I think the communities need a ~2 minute video that quickly explains what floats and fixed points are, how and when to want to use each, and their pros/cons.

I got it and it works great! Awesome!

Perhaps my math is wrong, but can’t a SQ7x8 only store about 128 discrete values between -1 and 1? If that’s the case, the physics simulation certainly looks fine with objects that fit on the 128x64px screen (apart from the probably unrelated issue of collisions spazzing out with specific starting variables). Then again, what space savings can you get from kneecapping cosFixed/sinFixed to half the possible outcomes?

I suppose the next thing I’ll work on is making a float-less version of sqrt() . ChatGPT says I should use the Newton-Raphson method, whatever that means. Should be fun!

After that I’ll work on optimizing the heck out of the whole engine. At the moment it can still only support 5-6 bodies. Any more and the whole micro locks up or resets at the first collision.

There might be some videos about fixed point arithmetic out there somewhere, but generally I tend to prefer text to videos.

You may find this worth reading:

If you want any other resources about fixed points I’ve collected a handful of useful links over time.

The integer part goes between -128 and 127 and the fractional part goes between 0/256 and 255/256.

Fixed points are best understood either as mixed fractions or in terms of powers of two.

I’m not sure what you mean by ‘kneecapping’, but if you’re using SQ7x8 throughout I can think of quite a simple way to halve the size of the lookup table without losing any precision from the output of sinFixed and cosFixed. It’ll only save about 64 bytes of progmem though.

I’ve only ever heard it called Newton’s method.

As far as I understand it, the idea is to refine a guess at a square root by iteratively repeating newGuess = (oldGuess + (originalNumber / oldGuess)) / 2. That inner division is going to be expensive though.

(At least I think that’s how it works. Most of the explanations I could find insist on using meaningless variable names like ‘x0’ and ‘r’.)

This looks like it would be useful if I actually understood the meaning of all the vexing mathematical notation.

That sounds like either an attempt to access an invalid memory address or more likely the code has blown the stack.