Another FPS style 3D demo

Those two things aren’t mutually exclusive.

Some of my stylistic preferences also use less progmem and/or improve performance.
(As you’ve discovered.)

Others are just a more direct translation of what the compiler will generate anyway,
or just something I generally prefer.
(Pretty much every one of my stylistic preferences has a reason behind it.)

Sometimes it’s not about how clever the compiler is.

Sometimes the compiler is actually legally not allowed to make certain changes because they’d break certain guarantees that the language specification enforces.
(And if it broke them, it would be non-compliant.)

In this case, texPtr++ has a significant side effect (i.e. it alters the value of texPtr),
whereas in my version, everything on the right hand side of the = has no side effects,
(apart from pgm_read_byte, which is implemented in assembly)
so the compiler has more flexibility with how it’s allowed to optimise the statements.

I tend to avoid post increment in general because of its weak ordering.

For an expression x++, there are two parts: x is incremented, and the original value of x is produced.
There is no guarantee about the ordering of those events.
As a result there are a lot of expressions using post-increment that have undefined behaviour.
For example, i = v[++i] is well-defined, but i = v[i++] is undefined,
and i = ++i + 1 iswell-defined but i = i++ + 1 is undefined.

Rather than remembering what is and isn’t defined, I prefer to do a = v[i]; ++i; because then there’s absolutely no ambiguity as to what’s actually happening.

Also in general I tend to write my code to have as few side effects as possible,
which is why I’ll typically introduce new (constant) variables rather than modifying old ones.
(This is actually similar to something the compiler does internally.)

If that’s the DrawScaled function then you may have some clashes with some of my changes.

I don’t know if I’ll get chance to look today.

1 Like

Arduboy3D.ino.hex (37.0 KB)

I’ve pushed some changes including the new sprite scaling routine. There are a couple of compile time options:
set DEV_MODE to 1 in Defines.h to show a CPU usage bar graph on the top of the screen
toggle WITH_SPRITE_OUTLINES in Draw.cpp to enable / disable outline drawing of sprites

With outlines disabled the frame rate stays solid, even when right up close a sprite. With outlines enabled it can still slow down a bit.

@Pharap you will need to fix up some conflicts when you merge. I haven’t taken any of your changes yet

The sprites are still stored as 1 byte per pixel so would be better to do 2 bits per pixel to free up progmem for animations etc. The sprite scaling method would need to change to accommodate, but might even make it slightly more efficient.

edit: oh and the sprite is now a skeleton guy which is approximately 3 / 4 the height of the walls

1 Like

You mean #define DEV_MODE true. :P

I’ll probably either have to cut out my edits to Draw.cpp or redo them completely because this commit is a big change.

I’m not actually sure how the compiler will handle this.
Normally calling functions only once will cause them to be inlined (and template functions are generally more likely to be inlined anyway),
but the question is will the compiler be smart enough to realise the redundancy of the inline code and fold it into a single function block.

If not, this approach will take a significant chunk out of progmem.

Having a quick skim, a few observations:

I doubt if ((up | down | left | right) & 1) is going to be smaller than if ((up | down | left | right) != 0).
Theoretically & 1 should add an extra instruction (an ANDI),
while != 0 should just make use of the status register’s zero flag for the branch (which was probably happening already with if (up | down | left | right), but implicit casting to bool is evil).

DrawScaledTx doesn’t need inline linkage.
(In fact, since this is a .cpp file, none of your functions need inline linkage.)

Also, implementing abs as a template will sometimes result in less code than defining it as a macro.
Remember that macros are just text subtitution, so if you did (for example) ABS(a - b), the compiler would actually see (((a - b) < 0) ? -(a - b) : (a - b)).
A template function on the other hand would force its arguments to be evaluated before its body is evaluated.

1 Like

This is actually a functional change for the outline drawing, not for efficiency. Sprite data is stored as:
0 = transparent
1 = white
2 = black

The outline should only be applied if at least one of the neighbouring pixel is white. So I guess really it should be:

if ( ( (up | down | left | right) & 1) != 0 )

if you want to avoid the implicit cast to bool.

This would have to change anyway if I rewrite the sprite code to more efficiently pack into 2 bits per pixel.

I agree that my abs() implementation should just be a template function though :slight_smile:


Ok, that makes sense.

It depends on what the logic is supposed to be.
Presumably it should evaluate to true if any of up, down, left or right is white?
(Even if one of the other values is black?)

I think it would be worth trying:

if(up == 1 || down == 1 || left == 1 || right == 1)

Just in case it produces smaller code.
(It might not be able to due to || being short circuiting,
or the compiler not knowing how or not being allowed to.)

Either way, you’d be best off employing a scoped enumeration to make the new purpose clearer:

enum class Colour : uint8_t
	Transparent = 0,
	White = 1,
	Black = 2,

// If you do really need the bitwise operators, at least these are typesafe:

constexpr Colour operator |(Colour left, Colour right)
	return static_cast<Colour>(static_cast<uint8_t>(left) | static_cast<uint8_t>(right));

constexpr Colour operator &(Colour left, Colour right)
	return static_cast<Colour>(static_cast<uint8_t>(left) & static_cast<uint8_t>(right));

constexpr Colour operator ^(Colour left, Colour right)
	return static_cast<Colour>(static_cast<uint8_t>(left) ^ static_cast<uint8_t>(right));

Something like this:

class PackedColour
	static constexpr shift0 = 0;
	static constexpr mask0 = 0x03;
	static constexpr shift1 = 2;
	static constexpr mask1 = 0x03;
	static constexpr shift2 = 4;
	static constexpr mask2 = 0x03;
	static constexpr shift3 = 6;
	static constexpr mask3 = 0x03;

	uint8_t value;
	constexpr PackedColour(uint8_t value) :
		value { value }
	constexpr Colour getColour0() const
		return static_cast<Colour>((this->value >> shift0) & mask0);
	constexpr Colour getColour1() const
		return static_cast<Colour>((this->value >> shift1) & mask1);
	constexpr Colour getColour2() const
		return static_cast<Colour>((this->value >> shift2) & mask2);
	constexpr Colour getColour3() const
		return static_cast<Colour>((this->value >> shift3) & mask3);


PackedColour colours[] PROGMEM
	// Dummy data
	0x45, 0x88, 0x65, 0x77,
	// et cetera

PackedColour readPackedColour(const PackedColour * object)
	return { pgm_read_byte(object) };

So you can:

PackedColour colour = readPackedColour(&colours[index]);
1 Like

DAAAAMN! You totally rewrote how sprites are drawn up close too!

A little bit jiggly but maybe worth it for the performance gain. I wonder if it could be filtered some and made a bit better, but incredible work!

This is my favorite part I could watch it forever:

1 Like

Something that’s just ocurred to me:

If you implement abs as (value < 0) ? -value : value) then be careful if you use it with float, double or long double at any point because giving it -0.0 will give you -0.0 back.

It’s probably very unlikely to happen, but it’s something worth being aware of.

specialise the template for floats using

signbit(value) ? -value : value

But I wouldn’t worry about abs(-0.0) returning -0.0
If -0.0 vs 0.0 causes a bug then there’s probably a bigger issue elsewhere. :smile:

I didn’t know zero can be signed.

I’ve often wondered why signbit doesn’t work directly with integer types.
If it did then you wouldn’t need a specialisation, this could be the default implementation of abs for all types.

(There’s an overload that accepts integral types, but it just casts to double and calls signbit(double), which is correct but inefficient.)

It can with IEEE 754 floating points.
(Or with one’s complement integers, but those are rare enough to be practically nonexistant.)

Fun fact: as a side effect this means that −(x − y) and (−x) − (−y) cannot be replaced by y − x.

Shouldn’t be a problem since I’m not using floating point variables anywhere.

I’ve added some new features to the demo:

  • Procedural level generation - generates a random level in 32x24 space
  • Added projectile code so now you can throw fireballs
  • Rewrote the sprite scaling again so that now sprites are packed more efficiently

Arduboy3D.hex (53.9 KB)


This is impressive. Fluid and works nicely. I love the screen shake effect. Can almost hear the fireball sound without the sound😉

And procedural generation is always great.


I just realised that ProjectABE doesn’t support the Arduboy2Base::generateRandomSeed() functionality so in the emulator you always get the same level. If you run on device you’ll get a different level each time you start. Maybe something @FManga can fix?

1 Like

I thought you probably weren’t,
but it’s good to be aware of that issue anyway.

Looks good.

At the moment it looks like more of a rock than a fireball.
Or maybe a dragonball.

In which case it wouldn’t support Arduboy2Base::initRandomSeed() at any point in Arduboy2’s history, since generateRandomSeed() is actually just the former contents of initRandomSeed() adapted to return the seed instead of feeding it to random().

Most likely the problem is with reading ADC.
Arduboy2 is relying on it producing random noise,
and I’m guessing in ProjectABE it’s currently just producing 0 because it’s an unconnected pin.

This is so incredible!!! It looks like it could soon be more than a demo! :smiley: How much space do you have left?

Noticed a small bug where if you stand inside one of the skeletons (just past where it disappears) it will occasionally glitch and draw some squares in the upper left or upper right of the screen.

If your looking to add some entropy without an ADC, you can make the seeed based on the milliseconds since the user presses the first button from a menu screen.


Arduboy2Base::generateRandomSeed(), and thus Arduboy2Base::initRandomSeed(), uses a combination of both the random noise on unconnected Pin A4 and the time since the system started. Therefore if you wait for a button press before calling either, you will still get a different seed even if the value read from the pin is always the same.

@jhhoward I was just reading the Game Engine Black Book about Wolfenstein 3D…

void  scaleTextureTo2(void* src , void* dst) {
	dst [0] = src [16];
	dst [1] = src [48];

void  scaleTextureTo4(void* src , void* dst) {
	dst [0] = src [0];
	dst [1] = src [16];
	dst [2] = src [32];
	dst [3] = src [63];

Seems familiar :P

(Side note: I have no idea how this would actually have compiled. Is a void * dereferenceable in C?)

Although the reason this works for Wolfenstein3D is more complicated than it may seem at first.

Mix of old C compilers being what they were and some language features:

  • sizeof(void) is 1
  • You can ignore the return value of a function.
  • And you could compile & call a function with a return value declared but that didn’t have any return statement in it. It would still work. (with whatever garbage was in the CPU’s AL/AX register as the return value if you tried to use it)

My guess is the compiler internally aliased the void type to the char type (or unsigned char) to save on memory since it pretty much worked just the same. They forgot to special-case disable dereferencing on it.

I just saw that both the wolfenstein 3D and Doom books are both available as free PDFs now:
I have hard copies of both and I would definitely recommend them! Lots of interesting tricks used for the old PC hardware.

I assumed that this was the case. Maybe the emulator can sample random noise when trying to read an unconnected pin?

Maybe! There is still plenty of flash and RAM left available.

Good idea! Alternatively just keep calling the random function each frame until a button press would work too.

I remember making floating pins and/or the ADC return random values at one point, and I think it was visible in the theremin app someone made. Did that break?

Edit: Found it… looks like it still works:
ptheremin.hex (37.7 KB)