Arduboy clone on RISC-V/FPGA

Currently, assembly code is only used in 5 places in the Arduboy2 library:

  • One is just a no-op to provide a delay, so it’s easy to replace or discard as necessary.
  • Three include the equivalent C++ code in a comment block accompanying them.
  • The Sprites class has one large block that is fairly well commented but doesn’t include equivalent C++ code. However, the SpritesB class is equivalent to Sprites, so can be used for both, and contains no assembly.

Therefore, the fact that Arduboy2 contains assembly code shouldn’t be a porting issue. The main difficulties would be its direct interfacing to specific hardware, and (as noted) dealing with the underlying Arduino environment if porting or replacing it is necessary.

Of course, the reason for using assembly code is to gain a speed or size advantage over what the compiler would produce with equivalent C++ code. This fact may have to be considered when porting, depending on the speed and storage resources in the new environment.

Another issue could be code that (perhaps unintentionally) has a reliance on being in an 8/16 bit environment, when porting to a 32 or wider bit environment.

However, many Arduboy sketches use other libraries in addition to Arduboy2. Some of these libraries could present porting difficulties.

Three include the equivalent C++ code in a comment block accompanying them.

I am extremely thankful for this part. The C++ code equivalent was a drop in replacement. As for the SPRITES_PLUS_MASK routine in the Sprites::drawBitmap - I looked at the esp8266_arduboy2 port. However, I now I am thinking I should just clean it up to use the SpritesB call – since it’s official.

I hope I didn’t give the impression that it was hard to port the Arduboy2 library :sweat_smile:. It was actually a breeze because of the exact reasons you listed. The code was a pleasure to read. Seriously readable. I simply removed all ASM references for C++. Took half a Sunday, no kidding.

In truth, I designed the SoC with awareness of how the Arduboy2 library was interfacing with the underlying hardware. For example, my SPI call is trivial memory mapped write, just like in the original. With a small hardware twist. To speed things up, I have a 128B buffer.

// Write to the SPI bus (MOSI pin)
void Arduboy2Core::SPItransfer(uint8_t data)
   // Non-blocking write if there's space in the TX Queue.
   while ((KRZ_SPIM_STATUS & 0x00ff) >= SPIM_TXQ_SIZE);
   MMPTR8(KRZ_SPIM) = data;

Another issue could be code that (perhaps unintentionally) has a reliance on being in an 8/16 bit environment, when porting to a 32 or wider bit environment.

This would only be worrisome if there was serious serious misaligned memory access. But, byte access (the most common memory iterator in the lib) can never be misaligned. So, things worked out!

However, many Arduboy sketches use other libraries in addition to Arduboy2. Some of these libraries could present porting difficulties.

This is going to be touch and go, I am afraid. Like Print and WMath for starters. Gonna port relevant parts of the ArduinoCore as I come across them.

The hardest part was the C++ pure virtual functions which needed re-tagetting the system calls (sbrk). And need to look at calls like srandom() from the avr-libc. Right now, I put workarounds for all of them in a rush to demo ArduBreakout. I need to clean this part up for reals.

@MLXXXp - thanks for the Arduboy2 library. If you ever build another game engine, I’d like to help in anyway I can.

1 Like

No, it’s more a case of overflow and wrap around. For example, with AVR using 16 bits for an int and another architecture using 32 bits for int. I think @Pharap has encountered this problem, but I haven’t looked into it.

jawdrop hadn’t even thought of that!!! Forgot that int isn’t 32b in avr-libc. Gotta look into this deeply. And map basic types from avr-libc. I just saw that typedef signed int int16_t in . Fuark…

1 Like

Also, if someone has allocated storage with a specific layout on the assumption that an int will occupy 16 bits.

All of these types of issues could occur in sketches as well as libraries.

Might be worth combining your efforts with the maker of this board (also based on the iCE40UP5K):

I painstakingly created the equivalent C++ once (as literally as possible).
It’s here if anyone wants to see it, but I don’t think I’d be prepared to guarantee that it’s bug free.

(On second thought, it was actually this that Adafruit used,
and I never got round to putting this into my other port.)

Indeed. There’s a lot more functions that use hardware-specific code.
Of those, things like generateRandomSeed() are likely to be overlooked.

Interesting… It looks like they’ve used the SpritesB code as a base and modified it.

Those should more or less work out-of-the-box unless long is a different size on your system.

I have ported as much of avr-libc as Arduboy uses,
but there’s probably the odd game that uses something that isn’t covered.

For srandom, if you’re on a 32-bit system (with access to the C or C++ standard libraries) you can cheat by using srand/std::srand.

Indeed I have.

There was (and still is) code in Sprites.cpp that depends on 16-bit integer overflow and it breaks when int is suddenly 32-bit.
(Specifically, ofs + WIDTH causes overflow, and the code relies on that.)

When I finally figured out the problem I wrote a longwinded tirade about solving it because it took me at least a few hours to get to the bottom of the issue.

For future ports I recommend just using the contents of SpritesB instead because it’s less hassle.
(If the CPU is powerful enough that is.)

I’m fairly certain that’s the only case though.
I can’t speak for other libraries of course, but for Arduboy2 that’s all.

I’ve never encountered a case like this, but I wouldn’t be surprised if one exists.

This is one of the reasons I try to encourage people to use the fixed width types instead.


Would you open an issue for this on Github?

running a virtual ATmega32U4 on an FPGA

I am trying to run this natively on risc-v rather than a soft avr core. However, I do see that @lulian has some plans for risc-v on his arduFPGA board too. Couldn’t find a port on the git though. So, perhaps this has not happened, yet?

True, except I was not using avr-libc – and didn’t realize what that meant until @MLXXXp pointed it out. Total noob mistake. Now all those long in the ArduinoCore makes sense. It’s 32b. Need to do some serious cleanup.

I finally realized what my problems were. It was newlibc + it’s crt0. Switching over to picolibc solved my vtable (Print) and C++ constructor (Arduboy) issues and bloated rand (WMath) problem (compare newlibc’s rand vs picolibc rand – no default dependency on reent – which malloc’d the rand state holding variable for thread safety). Straight from the author:

PicoLibc is library offering standard C library APIs that targets small embedded systems with limited RAM. PicoLibc was formed by blending code from Newlib and AVR Libc.

Great. Apparently there’s talks of making it the default libc for machine-mode/semi-hosted risc-v systems.

However, I should have looked at this first - - @Pharap, you’ve already done all the work!


Without an ADC, I just gave it the good old cycles since boot (for now). Which is a recipe for RNG hacking (anyone remember Golden Sun?). Or a simple PRNG on the fpga.

For future ports I recommend just using the contents of SpritesB instead because it’s less hassle.
(If the CPU is powerful enough that is.)

Aye. I’ll just use SpritesB and alias Sprites to it (using Sprites = SpritesB – as you mentioned here).

I am a bit concerned about the type width mismatch in games. I’ll have to typedef it or swap out all ambiguous types for fixed types with a script.

1 Like


Yeah, the only reason random exists in the first place is because rand is defined by the C standard to return an int, and the authors evidently wanted a 32-bit PRNG.

avr-libc is actually open source, so if you really wanted you could use the actual random implementation, but I doubt anyone’s depending on the implementation details in any meaningful way, so any old PRNG should be a suitable replacement.
Even something as crap as a linear congruential generator would probably be fine.

Yikes. std::rand() isn’t supposed to be thread safe,
anyone expecting it to be is being unreasonable as far as I’m concerned.

They could probably find a decent off-the-peg PRNG without too much looking.
Xorshift is particularly good for something small and cheap.

More or less.
The EEPROM code would probably have to be changed.

You don’t necessarily need an ADC,
but you do ideally need some source of nondeterminism.
An unconnected pin would probably be suitable if it measurably has a reasonable degree of noise.

It’s a seed generator, not a full RNG so it doesn’t have to be fast or extremely random, just enough to provide some variance between start ups.

I doubt it will be an issue for most games.
If it is, you can probably fix it with a simple text replacement.
unsigned int -> uint16_t, int -> int16_t et ecetera.

For most cases, the sudden size increase shouldn’t be an issue,
it’s only likely to be a problem if someone’s depending on integer overflow or if they’re doing something daft like using a hardcoded value when they should be using sizeof(Type).

ikr, when I saw the objdump for the rand with newlibc, I was like O_o?. Rand is not supposed to be thread-safe.

Can’t float pins with this fpga, the pad driver have a weak pull up. But, I am happy with picolib’s rand implementation. For seed I’ll xor some of the internal hardware performance counters (cycles ^ instr retired – using some factor of time as a seed).

I am going to clean start the port using your - - as a base. It’s for the best. Should have started with this in the first place. And, now that I know things will work out (as evidenced with my current rushed dirty port on breadbord), I’ll tackle this with patience. Can’t wait for the gamepad pcb to come in! It’s gonna look good.

If it is, you can probably fix it with a simple text replacement.
unsigned int -> uint16_t , int -> int16_t et ecetera.

That’s the plan! Some python script.

I also need to think how to use the flash for storing state (“eeprom”). Some small and simple flat filesystem, because it will be per game. And store all games! Any thoughts on that? You have on-chip eeprom and separate sdcard on the pokitto.

1 Like

As long as they’re unlikely to be the same value it should be fine.
x ^ x is always 0 and seeding a PRNG with 0 usually breaks things.

Otherwise have a look around for some hash combining algorithms.

It’s been quite some time since I was last working on it,
but let me know if you have any questions or issues.

If in doubt, an identifier, offset, size allocation table in a fixed location often works alright.

(Though I suppose size would always be 1024?)

I just mapped eeprom to eeprom for the sake of simplicity.

There is a so called ‘cookie’ system for Pokitto’s eeprom but there’s no decent documentation for how it works under the hood so I never use it.

1 Like

As far as I know, the plan is to have the board primarily run on RISC-V, but with the secondary ATmega core for Arduboy compatibility. Running Arduboy games on a RISC-V CPU is fun, but in the end just creates a lot of friction for users having to recompile every game they want to play - that is assuming they can even find the source to recompile from! Having a core with the potential for full compatibility that can just run precompiled Arduboy hex files is ideal…

If you go backwards in the commits to my Arduboy_MiSTer project, at one point before switching to the ATmega core I had also customised the Arduboy2 library to run on RISC-V (FPGArduino’s version), here it is - might not be a lot of use to you though? :sweat_smile:

Everything you said is true. But, I am not trying to make a generic FPGA arduboy solution at all. This endeavour is show off the Kronos RISC-V core. Hence, I am porting the arduboy2 library and compiling arduboy games for the platform (Kronos powered SoC) with the native risc-v toolchain (+picolibc). This port isn’t quite intended for folks who simply want Arduboy on an FPGA (MiSTer project is the obvious choice for this). This is more for risc-v soc builders (custom socs or litex builds) and people who want to mess around with risc-v.

Yesss! Thanks for the porting reference. I see you got rid of EEPROM and sound entirely?

Not quite - I did some hacky business by re-writing the sound functions to just continually pass across a value for the desired pitch (or zero for off) to a verilog square wave module… and for the EEPROM I just left a 512 byte block at the front of the actual compiled hex file and took advantage of MiSTer’s interface for writing files back to the SD card in 512 byte blocks!

ooooh. Noice! How often do you commit the writes? From my understanding of the Arduino EEPROM - the writes are “immediate write-back” – which makes sense for in the integrated EEPROM. However for nand-flash (you case - SD card) or nor-flash (my case - typical w25 series chip), the writes are much costlier.

I saw that in ESPBoy, the writes are specifically committed (esp EEPROM lib has a commit()). The author inserted such code in the games itself, after a sequence of EEPROM modifications (say new entry in the scoreboard or something).

I don’t want to do that. Preliminary idea - I am thinking of caching and auto committing the writes every second. I’d have the entire 1K of the “eeprom” in ram anway - so, I’ll just the changes once in a while, if any.

In my system (which has a generous 128KB ram), the games are warm-loaded. The Loader is the main app from which the user picks a game to play, and then the loader copies over the game to a specific bounded region of the ram - and then jumps to it.

Hence as a second idea, I am thinking of a “start” button which suspends the game and returns to the Loader. At which point the gamer can chose to commit the “save” to memory. And other options - close game, resume, etc.


I have given the problem of saving files on flash some thought, and slept on this idea. FPGA boards almost always use a typical W25 series SPI NOR Flash. The iCEBreaker fpga board that I am using has a generous 16MB W25Q128JV.

I am leaning towards something dead simple like a copy-on-write circular buffer in the flash. The savegames are fixed 1KB, as @Pharap said – since it’s what the size of the EEPROM is. I am thinking of a scheme where the savegame, if modified is written back (committed) to the flash at most once every second.

Assume, each “file” takes up one 4KB block (min erase sector on a typical w25-series spi nor flash = 4KB). A certain number of blocks forms the “filesystem” (FS). When starting the game, I’d O(N) search the blocks for its savegame and load it into the ram. No index block required (as you’d expect in CoW or for that matter, most FS), and the search cost is no biggie.

As the game runs, the savegame in the ram is modified. If the savegame remains modified for some time – indicating it is stable, it is copied and a write-back starts. I will search for the next free block starting from the current location in the circular buffer, write the new savegame into the free block, and issue an erase for the current one.

The find (block allocation), write (0.7~3ms per page of 256B) and erase sequence (45~400ms) would be sequenced across game-frames or soft-timers and would thus be non-blocking (and totally handled in the background). Since, I only need to write-back at most once per second, the sequence would be done and the flash would be ready for a new write. Usually, we’d not expect to write back that often.

I think this would be the cheapest code solution for the job. Very application specific, and I get wear-levelling. Power-off resilience can be ensured with some metadata and atomic sequence of the write+erase op (maybe? – need to think about this more). I’d need some code to check the blocks upon boot and ensure the bad blocks (half erased/half written) are cleared out, i.e. some garbage collection. That’s a stretch goal though, I am not strongly aiming for power-resilience. But, overall, this would be cheap.

I can do things a little smarter by committing 3 full writes (256B header + 3 x 1KB data) to the same 4KB block before moving onto a different block and issuing an erase. A bit mask in the header can keep this status. When moving to a new block, the header has to be copied over and started fresh. The important part is that a savegame uses the whole block (4KB) and does not share. In this scheme I can mess with the logical block size to decrease the overhead of the header – though at the cost of erase time (of physical blocks – for example 32KB erase takes 120ms~1.6s). This scheme is a hybrid CoW + Logging FS.

However, if I were to chose a real FS – littleFS is seriously looking to be the best choice. Very informative presentation - - from the author of littleFS.

EDIT: I approached this wrong. I should spin this - Flash cart(ridge)

So, I re-ported Arduboy2. It feels much cleaner this time around - project.

SiNe-DeMo (with learnings from the Arduboy_MiSTer thread. Default sin with doubles was slow as you folks already figured out.

Sorry for the lame duck questions, but: How open is the MiSTer? Can it be cloned/forked?

Curious about spinning up a dedicated arduboy / mister compatible board?