Adding Hashing Capabilities to EEPROM

Small thing: prefer update to write. I find it reassuring that we aren’t doing unnecessary writes :wink:

Basically, we’re damned regardless of which names the API uses for its read and write operations:

  • If it uses read and update, people unfamiliar with EEPROM’s write cycle limit will forever be asking “Why isn’t it read and write?”
  • If it uses read and write, people familiar with EEPROM’s write cycle limit will forever be asking “But write does update instead of write, doesn’t it?”
  • If it includes all three, we have to go around reminding people to never use the write functions, and then when people ask “Why?”, explaining the write cycle issue.

Choose thy poison. :P

My reasoning for opting for read and write is that people who know about EEPROM’s cycle limit are much fewer in number and hopefully more likely to check the documentation to see if it answers their question (which it will). Even if they don’t check, “Yes, write avoids unnecessary writes.” is a lot easier to explain than to explain why there’s no write function to someone who doesn’t yet understand the write cycle limit.

But if it’s a deal-breaker for anyone else we could vote on it, perhaps?

Hmm… then how about get and put … then everyone’s equally confused :crazy_face:

Looks like xor shift. As you know, the magic numbers are critical for quality. Assuming it will do 32bit maths- the compiler tends to add 100’s of bytes for that. Is that what you found? Not sure if you have the book? Great chapter on PRNG.

It uses xor and shifting, but it doesn’t seem to follow the xorshift pattern.
I think it might technically be an linear feedback shift register, but binary polynomials are above my pay grade (or at least what my brain can manage at the moment).

I published the sizes of all the hash functions I tested several comments ago. Here’s the link again if you missed it:

For comparison:

  • int16_t summation used 6622 bytes of progmem
  • uint16_t summation used 6700 bytes of progmem
  • uint32_t summation used 6706 bytes of progmem
  • The Knuth hash used 6742 bytes of progmem

I have no clue why the difference between int16_t summation and uint16_t summation is so big, but if you take summation of int16_t as a base case and compare that to the Knuth hash, that’s a 120 byte difference. Compared to uint32_t summation, the Knuth hash is only 36 bytes larger. Knuth was the smallest non-trivial hash, everything else was larger. The next smallest was John Skeet’s hash at 6758, which is 16 bytes larger, and it only gets worse from there.

It wasn’t a particularly thorough test, merely:

char object[256] {};

using namespace Hashing;

using Hash = KnuthHash;

// Note: 'static_cast<int>' is to ensure the same overload of println is used regardless of hash's return type
arduboy.println(static_cast<int>(Hash::hash(object)));

If you want to do your own testing I can provide you with the other hash implementations.

Even if I didn’t have those figures though, ‘hundreds’ for addition is unlikely. At worst, 32-bit addition should only be about 4 instructions for the actual addition (one for each byte) and then a small amount for loading and storing the value.

Shifting is more likely to be an issue due to the lack of a barrel shifter, but the above figures seem to suggest it’s not that much of an issue.

Apparently I have a virtual copy. There’s a lot of mathematical notation that would just go completely over my head, so odds are I wouldn’t know the part the hash function was derived from even if I was looking at it.

1 Like

Yes, that is surprising.

Maybe something along the lines of signed int overflow being undefined behavior so the compiler assumes it can’t happen and optimizes weirdly.

A brief update…

I got most of the Doxygen commenting sorted yesterday,
though there’s a few things that need tidying up still (e.g. formatting/grouping text).

I’ve removed beginWrite and endWrite for now.

I’m hoping to replace them with something more suitable when I’ve done a bit more research, but if it takes too long to refine them I’ll publish the API without that feature.

(I’m being intentionally vague. I think some of you will have guessed what they’re supposed to be for, but for those who haven’t guessed it’ll hopefully be a nice surprise.)

2 Likes

I have a feeling I know what you are doing … :slight_smile:

How is the documentation coming along? I am keen to retrofit this to a few of my games to see how it goes.

1 Like

You were partly correct.

Since I’ve had a fair bit of interest, I’ll publish an early draft of the code and the documentation.

The code can be read online:

Unfortunately I’m not sure how to make the documentation viewable online, so you’ll just have to download it. There’s a lot of files, but it’s less than 700KB so it should be quick to download.

I’ve done a few things slightly differently to how the main library is documented, such as including the computational complexity and preconditions for each function.

If you do, I may ask to use your retrofitted code as a guinea pig to see if it’s worth making use of eeprom_read_block and eeprom_upload_block rather than a plain for loop.

I have been busy for most of the last week, but have just had chance to update the documentation of the new API and provide a draft version on a branch of my Arduboy2 fork.

The updated draft API and doxygen documentation are here:

The relevant Arduboy2 fork is here:

All 100% untested of course, but if it actually compiles then it should theoretically work.

The beginWrite and endWrite functions have now been replaced with the notably different functions begin and commit, which are provided to support systems that do not have true EEPROM and must instead provide some kind of simulated EEPROM. Theoretically these functions allow for both flash-simulated EEPROM and proper file systems.

The API is still subject to change for the moment, so if you have any grievances then now is the best time to make them known.

1 Like

Nice!
I made some comments on the first GitHub commit.
My main issue is with naming… :grimacing:

Arduboy2EEPROM:: is quite a mouthful!..
I like Sprites:: - short and memorable. Would you consider Storage::, etc?


Was it a design decision to *not* add guards for addresses out of range? There's lots of comments in the code, rather than protection. Is it code size concerns?

Evidently you’ve never had to write std::is_trivially_default_constructible or a std::enable_if guard. :P

The main problem with Storage is the likelihood of such a simple name conflicting with another library.

You can always add using Storage = Arduboy2EEPROM; to your own code if you want, or even make an Arduboy2EEPROM eeprom object if you aren’t happy with type aliases or the scope resolution operator for some (strange) reason, but a unique name that resembles existing Arduboy2 class names is better from a library standpoint.

As far as I understand, Sprites is an exception because it was originally a separate library that already existed before being incoporated into the Arduboy2 library.


Then if the function names were the same (readByteread), the transition would be simple and painless…

Migrating existing code is going to be relatively trivial in the majority of cases, because existing functions can be changed with a simple (non-regex) find and replace:

  • EEPROM.writeArduboy2EEPROM::writeByte
  • EEPROM.readArduboy2EEPROM::readByte
  • EEPROM.putArduboy2EEPROM::write
  • EEPROM.getArduboy2EEPROM::read

The more complex cases will be cases where the result of get or put are actually used, but I don’t think I’ve ever seen anyone use the return values, so that’s not a major concern.

In addition to those simple find-and-replace changes, any migrated games should add Arduboy2EEPROM::begin(); to their setup function and Arduboy2EEPROM::commit() after a sequence of writes.

This library was never intended as a drag-and-drop replacement for EEPROM, it was intended to be a slimmed-down improvement with added support for devices without native EEPROM and built-in hash computation, storage and retrieval.

What would the guards even do?

  • You can’t throw an exception because exceptions are disabled.
  • You could potentially call abort();, but that’s probably just going to reset the Arduboy without any kind of explanation.
  • You could return early, but then you’re just failing silently, and if you’re going to fail then it should happen loudly and quickly.
  • You could return some kind of success notifier (bool or an enum class), but then readByte would have to either pass a result to a referenced variable or return a struct.

(Realistically abort(); is the least terrible answer for AVR.)

Ultimately for the AVR implementation readByte and writeByte merely forward to avr-libc’s eeprom_read_byte and eeprom_update_byte. If those have some kind of error detection, that’s handy. If they don’t, they don’t.

That’s no less than what Arduino’s EEPROMClass does - EEPROMClass does no bounds checking either, and so far I’ve yet to see anyone complain about that.

(In fact, using eeprom_update_byte is actually an improvement over EEPROMClass, which conditionally calls eeprom_write_byte based on the result of a call to eeprom_read_byte.)

Partly size, partly speed, partly because I don’t think it’s likely that anyone is ever actually going to attempt to write to an invalid address, and partly because I’m expecting avr-libc to handle it.

Any kind of error checking will add both extra code and extra overhead, which could soon mount up if it’s being done once per byte for an object. (Granted the speed probably isn’t too much of an issue given that writing to EEPROM is an infrequent occurance, but a code size wouldn’t go unnoticed.)

I could partition the functions into ‘safe’ and ‘unsafe’ variants, which is usually my preferred solution, but that would more or less double the number of functions and amount of documentation, and possibly increase the maintenance cost.

It would be even better (and theoretically possible) to detect invalid paramters at compile time and refuse to compile, but that could only guard the cases where the address is known at compile time and would require syntax that beginners will probably frown at. E.g. Arduboy2EEPROM::write<1020>(object); could static_assert(((address + sizeof(object)) < 1024), "Error: Object is too large to write to the specified address");.

1 Like

– Huh!? I missed that :sweat_smile: … Guess I need to read the manual…


(Goes to skim read manual again)...

Very confused about commit:

this function is technically unneccessary and performs no work

and

Failure to call commit() **may** result in the discarding of any or all of the data written by previous write operations.

… Erm… so do I need to do this or not!?
I’m guessing at some point in the future @Mr.Blinky may use this for the EEPROM-in-SPI-flash system/ emulation?



– Neat, so it saves some bytes?

– That might be good :slight_smile:

I hope that’s a joke :slight_smile:

I’m still working on some supplemental material as and when I have time, including (hopefully) a simplified explanation of what’s what.

The Doxygen documentation is the most important part because it has to be provided alongside the code and is a prerequisite for the API implementation being included in the library.

For the benefit of anyone else reading, the full note says:

This function exists to support devices that do not have native EEPROM. On devices that do have native EEPROM, such as the Arduboy, this function is technically unneccessary and performs no work, thus allowing it to be optimised away by the compiler.

(Emphasis added.)

I designed this API with the intent that the API itself would be a kind of cross-platform standard that could have different back-ends.

The ‘does nothing’ note only applies to devices that have ‘native’ EEPROM (i.e. actual, genuine hardware EEPROM, or hardware that provides the same qualities as EEPROM). The Arduboy’s version does nothing, but implementations provided by devices that don’t actually have hardware EEPROM are more or less guaranteed to so ‘something’.

That note exists almost entirely to placate the “Is using this going to waste my precious progmem?” people.

I’m expecting that rather than trying to provide flash-simulated on the FX chip as a backend to Arduboy2EEPROM, there would be an FX::EEPROM class that provides the same API and could act as a drop-in replacement. At which point it would be a case of either a simple find-and-replace or, for the people savvy enough to create a type alias, replacing the type alias (using Storage = Arduboy2EEPROM;using Storage = FX::EEPROM;).

I did contact @Mr.Blinky for comment, but received no reply, so rather than wait for one, I decided to presume my assumptions were all fine and announce the current revision.

I also contacted @ESPboy to make sure that begin and commit would be enough to support the ESPBoy, and together we concluded that they would be. (The names begin and commit are actually taken from Arduino’s ESP8266 implementation of EEPROMClass.)

You do if you care about making life easier for people porting games to other devices.

Much of the documentation is written under the assumption that you do care and want to write code that will behave properly on any device.

It’s theoretically likely to save at least some.

At the moment I haven’t actually attempted to compile the code because I’ve been too busy writing the documentation, responding to GitHub messages and dealing with those other pesky responsibilities that life thrusts upon me. :P

It might be, but it’s going to mean an extra template function for every function that excepts an address parameter, and another heap of documentation to go along with it.

I could replace all the functions that take address parameters (except for readByte and writeByte) with versions that accept the address as a template, but I’m sure someone would complain somewhere along the line. (For some strange reason people tend to start screwing their faces up when they have to use angle brackets in addition to round brackets.)


Nope. As mentioned above, I was too busy writing documentation.

But the actual code is really simple, so if I’ve buggered up anything it’s more likely to be a typo than a logic error. (Or if it is a logic error, I’ve probably got the forwarding wrong - trying to improvise perfect forwarding without implementing std::forward or std::remove_reference is a nuisance.)

I’ll look into it later when I’m done scraping together some lunch and trying to figure out what an “invert the colours of this image” icon is supposed to look like.

That is, unless someone else wants to ‘brave’ it.

1 Like

Bear in mind that it’s an opt-in feature.
Useful to have and highly encouraged, but an opt-in nonetheless.

The curse of easily giving in to requests, writing too much, and never being able to resist replying.

Which is why I’m taking it reasonably slowly, giving it quite a bit of thought, and asking for input (even if I end up overruling half of it).

The documentation is definitely thankless.

(When I’ve got some less technical documentation I’m definitely going to tell people to RTFM from time to time purely to justify the time spent writing and rewriting it.)


Another Australianism to add to my collection.


Just to throw it out there, theoretically it would be possible to restrict the range of EEPROM that a game is allowed to write to by having something like:write something like:

Arduboy2EEPROM<start, size> eeprom;

Or

Arduboy2EEPROM<start, object_type> eeprom;

And then have one of eeprom.write<address>(object);, eeprom.write<offset>(object);, eeprom.write(address, object) or eeprom.write(offset, object);, but I’m doubtful anyone would actually use that, and it’s yet more documentation and maintenance.

1 Like

Kindness Good documentation - is its own reward! :stuck_out_tongue_winking_eye:

– So just found this in the wild with the ‘CalendarApp’: it adds EEPROM.begin() and .commit() for optional ESP32 support. Makes a lot more sense to me now. Nice! But… also seems it can write up to address 1029… :grimacing: