How to convert floats to fixed point integers?

I’m working on a ray casting engine and would like to experiment with fixed point math. At the moment I’ve got a few thousand floats in pre-computed look-up tables. My thinking is that moving the engine over to fixed point math might help with:

  • memory (if these values can fit in 8:8 or some other 16-bit configuration, I’d cut my PROGMEM consumption by half!)
  • speed (even with LUTs precomputed, there’s a fair amount of floating point calculations that I might be able to get rid off)

I need help with:

  • what is the smallest fixed point type available, able to hold these values? Or more generally: how do I figure out if a floating point value will fit in a given fixed point type?
  • how do I convert these tables to use fixed point types? I’m generating these from a bit of C++ code so it’d be great if there’s a simple syntax to go from float to SFixed (or plain int_16t if possible?).


Below is a sample LUT from my app. The forum limits post-length so I put the full set up over on pastebin in case that helps.

constexpr float tan_table[769] PROGMEM {

For float to SFixed there’s an implicit conversion:

// Implicit conversion
constexpr SFixed<7, 8> fixed = 0.5f;

For the reverse you need an explicit cast:

float floating = static_cast<float>(SFixed<7, 8>(0, 127));

(C-style casting works too, but C++-style is more self-documenting.)

It’s certainly possible, but I’m not sure why you’d want to.

You can retrieve the underlying representation with:

SFixed<7, 8> fixed = 0.5f;
int16_t integer = fixed.getInternal();

(Note that the type returned by getInternal() depends on the size and type of the fixed point.)

To figure this out you need some statistics about your data:

  • The number with the highest integer part
  • The number with the lowest integer part
  • The number with the most fractional digits

E.g. I can already see a -1222.229004, which implies you’d need at least 12 bits of integer part (11 for ‘1222’, 1 for the sign).

When you know that, you can think about how much precision you need and how much you can afford to loose.

One thing to remember is that a fixed point is essentially a mixed fraction where the denominator is a power of two.
1.5f as an SFixed<7, 8> is effectively 1 128256.

Unfortunately this varies on AVR due to the lack of a barrel shifter and limitations on multiplication and division.

Simple adding and subtracting is much cheaper with fixed points,
but multiplication tends to be slower.

I don’t know how much you already know about fixed points,
but I recommend reading A Fixed Point Primer as an introduction.

If you have any more questions don’t hesitate to ask.
(By the way, I’m the one who wrote that particular library.)

1 Like

Sorry for the double post but something just occurred to me…

You might also want to consider using ‘binary radians’ (a.k.a. brads) if possible.
Despite the name, brads are more similar to degrees than radians.
Essentially you say there are 28 (256) or 216 (65536) ‘brads’ in a circle,
then you no longer have to do any modulo operations because computer integers are implicitly modulo a power of two,
and determining the quadrant becomes a bitwise operation,
which allows you to store a single quadrant worth of sin/cos values,
and determine the operation by inspecting the quadrant.

Here’s an example I helped @Dreamer2345 write a while back :

Using 8-bit brads meant only a 64 entry look-up table was needed.


Thanks for taking the time! I’ll definetly read up on brads - I’ve never heard of the concept before but it sounds immediately useful!

Most of the math in my program is happening in the ray caster. Thankfully it’s only ~120 LOC. Looking at this, do you think I have anything to gain from going fixed point?

#pragma once
#include "Config.h"
#include "LevelData.h"
#include "Graphics.h"
#include "Utils.h"
#include "LUT.h"
class RayCaster {    
    struct RayStart {
        float intersection = 0.0f; //the first possible intersection point
        int boundary = 0; // the next intersection point   
        int delta = 0; // the amount needed to move to get to the next cell position
        int next_cell = 0; //cell delta, to move left / right or up / down
    struct RayEnd {
        float distance = 0.0f; // the distance of intersection from the player
        int boundary = 0; // record intersections with cell boundaries        
        int intersection = 0; // used to save exact intersection point with a wall         
        bool operator <(const RayEnd& that) const noexcept { return distance < that.distance; };
    static constexpr auto WALL_BOUNDARY_COLOR = BLACK;
    static constexpr auto VERTICAL_WALL_COLOR = WHITE;
    static constexpr auto HORIZONTAL_WALL_COLOR = WHITE;      
    static constexpr auto K = 7000.0f;// think of K as a combination of view distance and aspect ratio. Pick a value that looks good. In my case: that makes the block on screen look square.
    //The MAGIC_CONSTANT must be an even power-of-two >= WORLD_SIZE. Used to quickly round our position to nearest cell wall using bitwise AND.
    static constexpr auto MAGIC_CONSTANT = (Utils::isPowerOfTwo(WORLD_SIZE) ? WORLD_SIZE : Utils::nextPowerOfTwo(WORLD_SIZE)) - Cfg::CELL_SIZE;  
    RayStart initHorizontalRay(const int x, const int y, const int view_angle) const noexcept {        
        const auto FACING_RIGHT = (view_angle < ANGLE_90 || view_angle >= ANGLE_270);        
        const int x_bound = FACING_RIGHT ? CELL_SIZE + (x & MAGIC_CONSTANT) : (x & MAGIC_CONSTANT); //round x to nearest CELL_WIDTH (power-of-2), this is the first possible intersection point. 
        const int x_delta = FACING_RIGHT ? CELL_SIZE : -CELL_SIZE; // the amount needed to move to get to the next vertical line (cell boundary)
        const int next_cell_direction = FACING_RIGHT ? 0 : -1;  //x coordinates increase to the left, and decrease to the right      
        const float yi = pgm_read_float(&tan_table[view_angle]) * (x_bound - x) + y; // based on first possible vertical intersection line, compute Y intercept, so that casting can begin                                
        return RayStart{ yi, x_bound, x_delta, next_cell_direction };

    RayStart initVerticalRay(const int x, const int y, const int view_angle) const noexcept {
        const auto FACING_DOWN = (view_angle >= ANGLE_0 && view_angle < ANGLE_180);
        const int y_bound = FACING_DOWN ? CELL_SIZE  + (y & MAGIC_CONSTANT) : (y & MAGIC_CONSTANT); //Optimization: round y to nearest CELL_HEIGHT (power-of-2) 
        const int y_delta = FACING_DOWN ? CELL_SIZE : -CELL_SIZE; // the amount needed to move to get to the next horizontal line (cell boundary)
        const int next_cell_direction = FACING_DOWN ? 0 : -1; //remember: y coordinates increase as we move down (south) in the world, and decrease towards the top (north)               
        const float xi = pgm_read_float(&inv_tan_table[view_angle]) * (y_bound - y) + x; // based on first possible horizontal intersection line, compute X intercept, so that casting can begin              
        return RayStart{ xi, y_bound, y_delta, next_cell_direction };

    RayEnd findVerticalWall(const int x, const int y, const int view_angle) const noexcept  {
        auto [yi,  x_bound, x_delta, next_x_cell] = initHorizontalRay(x, y, view_angle);//cast a ray horizontally, along the x-axis, to intersect with vertical walls
        RayEnd result;
        while (x_bound > -1 && x_bound < WORLD_SIZE) {
            const int cell_x = ((x_bound + next_x_cell) >> CELL_SIZE_FP);
            const int cell_y = static_cast<int>(yi) >> CELL_SIZE_FP;                   
            if (!isWall(cell_x, cell_y)) {
                yi += pgm_read_float(&y_step[view_angle]); //"calculate" y-intercept
                x_bound += x_delta;//move to next possible intersection points
            result.distance = (yi - y) * pgm_read_float(&inv_sin_table[view_angle]); //distance to hit
            result.boundary = x_bound; //record intersections with cell boundaries
            result.intersection = static_cast<int>(yi);
            return result;                        
        return result;          
    RayEnd findHorizontalWall(const int x, const int y, const int view_angle) const noexcept {
        auto [xi, y_bound, y_delta, next_y_cell] = initVerticalRay(x, y, view_angle);//cast a ray vertically, along the y-axis, to intersect with horizontal walls
        RayEnd result;
        while (y_bound > -1 && y_bound < WORLD_SIZE) {
            const int cell_x = static_cast<int>(xi) >> CELL_SIZE_FP; //the current cell that the ray is in             
            const int cell_y = ((y_bound + next_y_cell) >> CELL_SIZE_FP);
            if (!isWall(cell_x, cell_y)) {
                xi += pgm_read_float(&x_step[view_angle]); //compute next X intercept
                y_bound += y_delta;
            result.distance = (xi - x) * pgm_read_float(&inv_cos_table[view_angle]);
            result.boundary = y_bound;
            result.intersection = static_cast<int>(xi);                                        
            return result;            
        return result;
    RayCaster() {} 
    void renderView(Graphics& g, const int x, const int y, int view_angle) const noexcept {
        // This function casts out RAY_COUNT rays from the viewer and builds up the display based on the intersections with the walls.
        // The distance to the first horizontal and vertical edge is recorded. The closest intersection is the one used to draw the display.
        // The inverse of that distance is used to compute the height of the "sliver" of wall that will be drawn on the screen                
        if ((view_angle -= HALF_FOV_ANGLE) < 0) { //compute starting angle from player. Field of view is FOV angles, subtract half of that from the current view angle
            view_angle = ANGLE_360 + view_angle;
        for (int ray = 0; ray < RAY_COUNT; ray++) {
            RayEnd xray = findVerticalWall(x, y, view_angle);  //cast a ray along the x-axis to intersect with vertical walls
            RayEnd yray = findHorizontalWall(x, y, view_angle); //cast a ray along the y-axis to intersect with horizontal walls
            auto color = WALL_BOUNDARY_COLOR;
            const float min_dist = (xray < yray) ? xray.distance : yray.distance;
            if (xray < yray) { // there was a vertical wall closer than a horizontal wall                
                if (xray.intersection % CELL_SIZE > 1) {
                    color = VERTICAL_WALL_COLOR;
            else { // must have hit a horizontal wall first                            
                if (yray.intersection % CELL_SIZE > 1) {
                    color = HORIZONTAL_WALL_COLOR;
            // height of the sliver is based on the inverse distance to the intersection. Closer is bigger, so: height = 1/dist. However, 1 is too low a factor to look good. 
			// Thus the constant K, which has been pre-multiplied into the view-filter lookup-table (cos_table).
            const int height = static_cast<int>(pgm_read_float(&cos_table[ray]) / min_dist);
            const int clipped_height = (height > Cfg::VIEWPORT_HEIGHT) ? Cfg::VIEWPORT_HEIGHT : height;
            const int top = VIEWPORT_HORIZON - (clipped_height >> 1); //Optimization: height >> 1 == height / 2. slivers are drawn symmetrically around the viewport horizon.             
            const int bottom = (top + clipped_height)-1; //we're off by one, overdrawing 1px to the left and bottom of the viewport. 
            const int sliver_x = ray;       
            g.drawVerticalLine(sliver_x, top, clipped_height - 1);    
            if (++view_angle >= ANGLE_360) {
                view_angle = 0;

RAY_COUNT == 128 (arduboy screen width). So just counting the potentially costly stuff (ignoring shifts, adds or substractions) renderView will execute 128 * (4 multiplications, 1 division), per frame.

For completeness sake, here’s the code for generating the LUTs, including a bunch of the constants used throughout the ray caster.

static constexpr auto RAY_COUNT = Cfg::VIEWPORT_WIDTH; //one ray per column of screen space (horizontal resolution)
static constexpr auto FOV_DEGREES = 60; //Field of View, in degrees. We'll need to break these into RAY_COUNT sub-angles and cast a ray for each angle. We'll be using a lookup table for that        	
static constexpr auto TABLE_SIZE = static_cast<int>(VIEWPORT_WIDTH* (360.0f / FOV_DEGREES)); //how many elements we need to store the slope of every possible ray that can be projected.        
static constexpr auto ANGLE_360 = Cfg::TABLE_SIZE; //East (and total number of possible angles in a full rotation)
static constexpr auto ANGLE_90 = ANGLE_360 / 4; //South
static constexpr auto ANGLE_180 = ANGLE_360 / 2; //West
static constexpr auto ANGLE_270 = ANGLE_360 - ANGLE_90; //North
static constexpr auto ANGLE_0 = 0;  //back to East
static constexpr auto TWO_PI = 2.0f * 3.141592654f;
static constexpr auto ANGLE_TO_RADIANS = (TWO_PI / ANGLE_360);
static constexpr auto HALF_FOV_ANGLE = Cfg::VIEWPORT_WIDTH / 2; // FoV/2 in angles (for table lookup) instead of degrees.    
static constexpr auto K = 7000.0f;// think of K as a combination of view distance and aspect ratio. Pick a value that looks good. In my case: that makes the block on screen look square. (p.213)            

// tangent tables equivalent to slopes, used to compute initial intersections with ray
std::array<float, ANGLE_360 + 1> tan_table; 
std::array<float, ANGLE_360 + 1> inv_tan_table;

// step tables used to find next intersections, equivalent to slopes times width and height of cell    
std::array<float, ANGLE_360 + 1> y_step; //x and y steps, used to find intersections after initial one is found
std::array<float, ANGLE_360 + 1> x_step;

// 1/cos and 1/sin tables used to compute distance of intersection very quickly  
// Optimization: cos(X) == sin(X+90), so for cos lookups we can simply re-use the sin-table with an offset of ANGLE_90.    
std::array<float, ANGLE_360 + ANGLE_90> inv_sin_table; //+90 degrees to make room for the tail-end of the offset cos values.    
float* inv_cos_table = &inv_sin_table[ANGLE_90]; //cos(X) == sin(X+90).    

// cos table used to fix view distortion caused by radial projection (eg: cancel out fishbowl effect)
std::array<float, HALF_FOV_ANGLE * 2 + 1> cos_table;

void buildLookupTables() noexcept {
	constexpr auto TENTH_OF_A_RADIAN = ANGLE_TO_RADIANS * 0.1f;
	for (int ang = ANGLE_0; ang <= ANGLE_360; ang++) {            
		const auto rad_angle = TENTH_OF_A_RADIAN + (ang * ANGLE_TO_RADIANS); //adding a small offset to avoid edge cases with 0. 
		tan_table[ang] = std::tan(rad_angle);
		inv_tan_table[ang] = 1.0f / tan_table[ang];
		// tangent has the incorrect signs in all quadrants except 1, so manually fix the signs of each quadrant.
		if (ang >= ANGLE_0 && ang < ANGLE_180) { //upper half plane (eg. upper right & left quadrants)
			y_step[ang] = std::abs(tan_table[ang] * CELL_SIZE);
		} else {
			y_step[ang] = -std::abs(tan_table[ang] * CELL_SIZE);
		if (ang >= ANGLE_90 && ang < ANGLE_270) { //left half plane (left up and down quads)
			x_step[ang] = -std::abs(inv_tan_table[ang] * CELL_SIZE);
		} else {
			x_step[ang] = std::abs(inv_tan_table[ang] * CELL_SIZE);
		assert(std::fabs(y_step[ang]) != 0.0f && "Potential asymtotic ray on the y-axis produced while building lookup tables.");
		assert(std::fabs(x_step[ang]) != 0.0f && "Potential asymtotic ray on the x-axis produced while building lookup tables.");                     
		inv_sin_table[ang] = 1.0f / std::sin(rad_angle);         
	//duplicate the first 90 sin values at the end of the array, to complete the joint sin & cos lookup table.
	auto end = std::end(inv_sin_table) - ANGLE_90;
	std::copy_n(std::begin(inv_sin_table), ANGLE_90, end); 
	// create view filter table. Without this we would see a fishbowl effect. There is a cosine wave modulated on top of the view distance as a side effect of casting from a fixed point.
	// to cancel this effect out, we multiple by the inverse of the cosine and the result is the proper scale. 
	// inverse cosine would be 1/cos(rad_angle), but 1 is too small to give us good sized slivers, hence the constant K which is arbitrarily chosen for what looks good. 
	for (int ang = -HALF_FOV_ANGLE; ang <= HALF_FOV_ANGLE; ang++) {
		const auto rad_angle = TENTH_OF_A_RADIAN + (ang * ANGLE_TO_RADIANS);
		const auto index = ang + HALF_FOV_ANGLE;
		cos_table[index] = (K / std::cos(rad_angle));
1 Like

It’s hard to say really.

As I say, multiplication tends to be slower for fixed points,
but some things might work out cheaper, for example:

const int cell_x = static_cast<int>(xi) >> CELL_SIZE_FP;

xi is float, so converting to int is probably complicated,
(at the very least I’d guess it’s a function call, but I’m not sure,)
wheras with fixed points it would be a shift and a mask,
so there’s a possibility the compiler might be able to fuse the two shifts.

And similarly I would expect that the fixed point equivalent for:

yi += pgm_read_float(&y_step[view_angle]);

Would be cheaper because it would effectively use integer addition.

Fixed points also open up the possibility for manual optimisations.
E.g. if you know a certain SFixed will never be negative then you do a manual conversion without the sign extension step.

The main reason multiplication is expensive for fixed points (as far as I’m aware) is because you need a type twice as wide as the fixed point to avoid loss of information, and AVR CPUs don’t cope with that very well.
AVR only has a ‘mul’ instruction for smaller types,
so it has to resort to a manual algorithm for larger ones.

Ultimately I think the only way to know for sure is to try it.

Whether or not fixed points solve your problem,
I think using brads is worth considering.
Or at least finding a means to cut your table sizes to a power of two so you can use more bitwise operations when handling the indices.

Might I ask, are you using something other than the Arduino IDE to compile your code?

I notice you’re using a ‘structured binding declaration’ which are a C++17 feature,
but as far as I’m aware the Arduino IDE uses GNU++11 (C++11 with compiler extensions).

It probably doesn’t make much difference to the problem at hand,
I was just surprised to see that feature used in Arduboy code.

Ah, yeah. I’ve set the Arduino IDE up to compile with C++17. I’m actually playing around with cross platform development so this project is originally written in Visual Studio for Windows. I’m porting to Arduboy manually right now, to figure out the least common denominator might be. Eg. what my abstraction layers will need to support.

This thread is basically me getting excited and side-tracked with premature optimization. But also - more new things to learn. :smiley:


Is there any way for me to get an assembler listing from the Arduino IDE? Or perhaps a separate tool, to map the source code to compiler output? I’d love to figure out what really works on the Arduboy.

From a command line you can use objdump. You should be able to find the AVR version avr-objdump somewhere in the installed IDE directory tree. On my Ubuntu Linux system it’s:
I usually create a soft link to it in my local bin folder, named arduino-avr-objdump, so I can easily execute it from anywhere.

I believe you want to use the -S option but you can experiment with various switches.

You have to feed it the .elf file that the compiler outputs for your sketch, which should be found in a temporary build folder after compiling your sketch in the IDE (using Verify).
If you set the IDE preferences for:
Show verbose output during: ☑ compilation
the output should contain the location of this temporary folder. Search for arduino_build_xxxxxx where xxxxxx is a unique number.

On my system it appears in the system /tmp directory. I usually switch to this temporary directory and run the command from there. E.g.:
cd /tmp/arduino_build_242045
arduino-avr-objdump -S MyArduboySketch.ino.elf > MyArduboySketch.disasm

The output can be scrambled and a bit confusing, mostly due to the compiler optimisations.