Adding Hashing Capabilities to EEPROM

I’m still working on some supplemental material as and when I have time, including (hopefully) a simplified explanation of what’s what.

The Doxygen documentation is the most important part because it has to be provided alongside the code and is a prerequisite for the API implementation being included in the library.

For the benefit of anyone else reading, the full note says:

This function exists to support devices that do not have native EEPROM. On devices that do have native EEPROM, such as the Arduboy, this function is technically unneccessary and performs no work, thus allowing it to be optimised away by the compiler.

(Emphasis added.)

I designed this API with the intent that the API itself would be a kind of cross-platform standard that could have different back-ends.

The ‘does nothing’ note only applies to devices that have ‘native’ EEPROM (i.e. actual, genuine hardware EEPROM, or hardware that provides the same qualities as EEPROM). The Arduboy’s version does nothing, but implementations provided by devices that don’t actually have hardware EEPROM are more or less guaranteed to so ‘something’.

That note exists almost entirely to placate the “Is using this going to waste my precious progmem?” people.

I’m expecting that rather than trying to provide flash-simulated on the FX chip as a backend to Arduboy2EEPROM, there would be an FX::EEPROM class that provides the same API and could act as a drop-in replacement. At which point it would be a case of either a simple find-and-replace or, for the people savvy enough to create a type alias, replacing the type alias (using Storage = Arduboy2EEPROM;using Storage = FX::EEPROM;).

I did contact @Mr.Blinky for comment, but received no reply, so rather than wait for one, I decided to presume my assumptions were all fine and announce the current revision.

I also contacted @ESPboy to make sure that begin and commit would be enough to support the ESPBoy, and together we concluded that they would be. (The names begin and commit are actually taken from Arduino’s ESP8266 implementation of EEPROMClass.)

You do if you care about making life easier for people porting games to other devices.

Much of the documentation is written under the assumption that you do care and want to write code that will behave properly on any device.

It’s theoretically likely to save at least some.

At the moment I haven’t actually attempted to compile the code because I’ve been too busy writing the documentation, responding to GitHub messages and dealing with those other pesky responsibilities that life thrusts upon me. :P

It might be, but it’s going to mean an extra template function for every function that excepts an address parameter, and another heap of documentation to go along with it.

I could replace all the functions that take address parameters (except for readByte and writeByte) with versions that accept the address as a template, but I’m sure someone would complain somewhere along the line. (For some strange reason people tend to start screwing their faces up when they have to use angle brackets in addition to round brackets.)


Nope. As mentioned above, I was too busy writing documentation.

But the actual code is really simple, so if I’ve buggered up anything it’s more likely to be a typo than a logic error. (Or if it is a logic error, I’ve probably got the forwarding wrong - trying to improvise perfect forwarding without implementing std::forward or std::remove_reference is a nuisance.)

I’ll look into it later when I’m done scraping together some lunch and trying to figure out what an “invert the colours of this image” icon is supposed to look like.

That is, unless someone else wants to ‘brave’ it.

1 Like

Bear in mind that it’s an opt-in feature.
Useful to have and highly encouraged, but an opt-in nonetheless.

The curse of easily giving in to requests, writing too much, and never being able to resist replying.

Which is why I’m taking it reasonably slowly, giving it quite a bit of thought, and asking for input (even if I end up overruling half of it).

The documentation is definitely thankless.

(When I’ve got some less technical documentation I’m definitely going to tell people to RTFM from time to time purely to justify the time spent writing and rewriting it.)


Another Australianism to add to my collection.


Just to throw it out there, theoretically it would be possible to restrict the range of EEPROM that a game is allowed to write to by having something like:write something like:

Arduboy2EEPROM<start, size> eeprom;

Or

Arduboy2EEPROM<start, object_type> eeprom;

And then have one of eeprom.write<address>(object);, eeprom.write<offset>(object);, eeprom.write(address, object) or eeprom.write(offset, object);, but I’m doubtful anyone would actually use that, and it’s yet more documentation and maintenance.

1 Like

Kindness Good documentation - is its own reward! :stuck_out_tongue_winking_eye:

– So just found this in the wild with the ‘CalendarApp’: it adds EEPROM.begin() and .commit() for optional ESP32 support. Makes a lot more sense to me now. Nice! But… also seems it can write up to address 1029… :grimacing:

I was going to cover this in more detail in one of the documents I was intending to write…

The names I chose didn’t come from nowhere.

Arduino’s ESP8266 implementation of EEPROMClass defines them:

And the ESP_EEPROM library that ESPBoy switched to uses them:

Yes, those classes have other functions that the Arduboy2EEPROM API doesn’t. Most of them don’t make sense for the Arduboy, and others didn’t seem to be used in any of the games I looked at.

I did actually consider introducing end(), but I found that ESPBoy ports of Arduboy games never seem to use it - they just allocate the memory buffer at the start (with begin) and never free it.

If it is doing that, that’s a definite bug. It only allocates 1024 bytes.

An ESP device can theoretically support more EEPROM addresses than the Arduboy because it simulates EEPROM with flash. (Simulating more than 1KB would be necessary for simulating other boards’/consoles’ environments.)

The ESPBoy actually tends to allocate 2400 because Roman wasn’t sure how much the Arduboy had.

Turns out I was wrong about that.

abort(); doesn’t do a reset…

/* Copyright (c) 2002, Marek Michalkiewicz
   All rights reserved.

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are met:

   * Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.
   * Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in
     the documentation and/or other materials provided with the
     distribution.
   * Neither the name of the copyright holders nor the names of
     contributors may be used to endorse or promote products derived
     from this software without specific prior written permission.

  THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  POSSIBILITY OF SUCH DAMAGE. */

#include <stdlib.h>
#include "sectionname.h"

ATTRIBUTE_CLIB_SECTION
void
abort(void)
{
	for (;;);
}

For those who want to skip past the mandatory copyright header:

void
abort(void)
{
	for (;;);
}

This kind of proves my point about how difficult it is to do error handling when exceptions are disabled - even the C standard library implementation just drops into an infinite loop.

(For the record, a manual while(true); is 12 bytes cheaper than abort();. Though the advantage to abort(); is that it might do something more useful on other systems.)


I decided to test this because after seeing how abort() was handled, I had a suspicion about how this was going to work out, and unfortunately I was right!

I ended up with this little demo showing off what happens:

// This code is released under the CC0 licence.
// For more information see:
// https://creativecommons.org/publicdomain/zero/1.0/

#include <Arduboy2.h>

#include <avr/eeprom.h>

Arduboy2 arduboy;

void setup()
{
	arduboy.begin();
}

uint8_t read(uintptr_t address)
{
	return eeprom_read_byte(reinterpret_cast<const uint8_t *>(address));
}

void test(int value)
{
	arduboy.print(read(value));
	arduboy.print(F(" : "));
	arduboy.print(read(value));
}

void loop()
{
	if(!arduboy.nextFrame())
		return;

	arduboy.pollButtons();

	arduboy.clear();

	for(int i = 0; i < 16; ++i)
	{
		test(1024 + i);
		if(i % 2 == 0)
			arduboy.print(' ');
		else
			arduboy.print('\n');
	}

	arduboy.display();
}

Anyone want to guess what the output is?

Well for me it's...
1 : 1 2 : 2
0 : 0 0 : 0
0 : 0 0 : 0
0 : 0 0 : 0
0 : 0 0 : 0
66 : 66 117 : 117
103 : 103 103 : 103

101 : 101 114 : 114

I can't confirm exactly how it's doing it because the implementation is in assembly, but basically it has the effect of...

Doing arithmetic modulo 1024 on the address, thus an input of 1024 actually accesses address 0, 1025 accesses address 1, 1026 accesses address 2…

eeprom_read_byte(reinterpret_cast<const uint8_t *>(address)) == eeprom_read_byte(reinterpret_cast<const uint8_t *>(address % 1024))

At one point I was actually going to add that to my list of ‘possible ways of handling invalid addresses’, but I purposely left it off because I decided it was too horrible and likely to cause people surprising runtime bugs.

The joke’s on me I guess ¯\_(ツ)_/¯.


I’m gradually leaning towards introducing some ‘safe’ and ‘unsafe’ versions of the functions, or more likely a ‘safe’ and ‘unsafe’ version of the class.

(In the latter case, purely so people can do the using Storage = Arduboy2EEPROM;using Storage = Arduboy2UnsafeEEPROM; trick. It’s unlikely people will need to mix and match, though they would be able to if they wanted.)

I’m also open to the idea of changing the name to Arduboy2Storage instead of Arduboy2EEPROM, but definitely not just Storage. If Arduboy2 had used namespaces from the outset then I’d be happy with Arduboy2::Storage, but I’m not going to risk Storage on its own.


As for the idea of being able to do Arduboy2EEPROM::read<address>(object); and get a compile-time error, it might be better if I offer that as a separate library at a later date since I expect only the really dedicated programmers will care enough to want that feature. (Unless someone wants to try to convince me otherwise?)

Just in case it isn’t obvious: that trick wouldn’t work for cases where people need to calculate addresses at runtime*, so it’s only really going to be useful for people who are saving/loading entire structs/arrays at a constant address.

(* Technically it’s possible, but it would take a large amount of template trickery and boilerplate, and I’m worried I’d end up trying to reinvent <type_traits> (again!), and then lose sight of what I set out to do in the first place (again!).)

My last contribution for the evening:

I have decided to rename hash_type to HashType because it was starting to look out of place. (I’m considering renaming it again to HashValue, to assist beginners.) I updated the doxygen accordingly. (I have not yet updated my Arduboy2 branch, I will wait until I’m sure there aren’t likely to be any more changed.)

I’ve also finished the first draft of my ‘Crash Course’ document that gives an overview of how to use the new API without going into too much detail.

This document is aimed primarily at beginners/newcomers to the Arduboy, so it’s focused more on the ‘how’ than the ‘why’.

(It purposely omits the fact that begin and commit only exist for other devices to make use of. I don’t want to go confusing beginners/newcomers with talk of ‘simulated EEPROM’ when they probably only care about how to get their game data saved. Questions like “What is a struct?” on the other hand are to be expected.)

It includes three (completely untested) partial examples, including a very interesting example of why, if you have multiple save game ‘profiles’, you might want to save each one with a separate hash.

1 Like

Really great document. I’d suggest adding the use of 1–2 char signature, to mitigate some of the issues you’ve mentioned. It might be good to include in the examples’ stuct too.

Also, if it doesn’t bloat the code example, demonstrate the of writing a value to a struct before saving, and then accessing some value after reading.

I’m not sure how HashType would be used…? I think it’s always the size in bytes that would be needed? Perhaps switch it over to ~

constexpr HashSize = sizeof(uint32_t); // 4 bytes

Is it for future proofing?

If the user wants to do that, they can, but I don’t think adding library support would justify the cost (i.e. of planning, implementing, testing, documenting, maintaining…).

If you can find some evidence that indicates it is a more significant issue than I give it credit for then perhaps I’ll reconsider. A list of games that use the exact same range of addresses would suffice. (Not just overlap, the exact same start and end addresses.)

Though as things are, one game can simply switch to using a different hash function. The library already provides support for that.

Ultimately there’s no silver bullet. There’s only so much you can do with 1024 bytes of EEPROM.


Frankly, using 2 chars is a statistically poor approach anyway.

Theoretically it should be 16 bits of extra entropy because a char can have 256 values (on Arduboy), but in practice it won’t achieve that stastic because most people are going to dodge the unprintable characters.

If everyone used only letters, then suddenly instead of 256×256 possibilities it’s just 52×52, or 62×62 with alphanumerics. You know as well as I that most people are just going to say “What two initials would reflect my game’s name?” and suddenly ArduKong is clashing with ArduKarateka.


If someone doesn’t already understand how structs work, a few lines demonstrating member variables being assigned isn’t really going to help much. They’d be much better off going away, learning about structs properly, and then coming back to saving afterwards.

It’s the type of the hash value, the type returned from the hash function. It is used any time that type is useful or necessary.

You might think HashType is actually uint32_t, but perhaps one day it might change, in which case any code using uint32_t instead of HashType would potentially break.

For example, if it was decided that arithmetic operators (+, -, /, *, %) should not work on HashType then that might be achieved by turning HashType into a struct and defining only the operators that were permitted for it (which might be as minimal as == and !=).

(In fact, the only thing putting me off doing that is that it would mean more documentation.)

For calculating the size of save data perhaps, but people may want to do other things with the hash.

There’s no reason why people can’t use hash on its own, without reference to the EEPROM functionality, and if someone wants to use the hash function they need to know the type of the result, and that type is HashType.

Firstly, it would be constexpr size_t hashSize = sizeof(HashType); because:

  • size_t is the return type of sizeof*
  • HashType is not necessarily the same thing as uint32_t

Expecting people to use sizeof(HashType) really shouldn’t be an imposition. If they’re going down the ‘multiple profiles’ route then they’re going to need to know what sizeof is anyway because they’ll need to know how to apply it to their own struct type(s).


* In fact, the very definition of std::size_t is:

the unsigned integer type of the result of the sizeof operator

That’s where it comes from - it’s the type used to measure the size of data.


If I were going to offer anything to assist with size calculation then it would more likely be a template<typename Type> size_t sizeWithHash()/template<typename Type> size_t sizeWithHash(Type) function that returns sizeof(HashType) + sizeof(Type).

I’m hesitant because I’m not sure it justifies the cost (of documenting, testing, maintaining et cetera), and I worry that adding extra support to help out with the ‘multiple profiles’ approach is going to lead to feature creep. If I added that, how long before someone says “Can’t you just add a function that calculates the address of each profile as well?”. (I could, but is that really the Arduboy2EEPROM class’s responsibility?)

Also, I worry people might use it as an excuse to avoid learning about sizeof or any other relevant information.


To be honest, if I were designing this library for a more general programming audience, I would keep the hashing and EEPROM functions in separate classes (as I did in my original demonstration), because that’s the proper way to do it. (See separation of concerns, single responsibility principle and god object.)

They are lumped together here solely for the sake of:

  • Making life a bit easier for beginners (and the people explaining things to them)
  • To avoid worrying about name clashes
    • (If Arduboy2 had an Arduboy2 namespace I would have been much more inclined to keep them separate.)

— Sorry I wasn’t very clear; I just mean add an example to the document. You highlight the unlikely (but possible) issue of two games colliding, with the same address and number of bytes stored. This adds some mitigation.

— Great point. That might be the ‘best practice’ example to give ~ constexpr uint16_t signature = 0xABCD with some note about assigning 4 ‘random’ hexadecimal digits…? (Thinking about it, perhaps just 1 byte is sufficient? The odds are already pretty low…?)

The 2x char is just what has a long history of use in the community. Perhaps an advantage is people are more likely to change it for their game? Agreed, the entropy is pretty low (~26*26 =676, in reality nearer to 2^9, so ~9 bits).


— Fair enough. Was just thinking you’ve explained most of the basics in great detail- so it’s ideal for beginners. Although it can be dangerous, having boilerplate code to copy-paste is a great way for newcomers to get started.


— Perfect justification. Understood and agreed.


Complete aside: It’s only just occurred to me that the first 4 bytes of the stored range (with hash) will be getting updated (almost) every write now. (Assuming most data blocks only have a few changed bytes that get written).I don’t think there’s any way to reduce the impact of this, through wear levelling or reducing the hash size. :person_shrugging:

1 Like

Even then, I’m not sure it’s worth doing.It would take quite a bit of extra work to cover all the relevant information:

  • How to use the technique
  • Why it works
  • When it doesn’t work
  • What the alterantives are (e.g. using a different hash function)
  • When/why the alternatives are better

Also, it’s potentially better to handle the issue when it actually happens, since it’ll draw attention to which games have the incompatibility.


Although I’m not intending to add the information to the document, I’ll discuss the situation a bit here briefly.

(It’s much easier to explain here in terms that people familiar with the scenario will understand than trying to explain it in a more accessible way.)

Consider this…

Presume game A and game B have save data consisting of 10 bytes, and are both storing data at address 40, and are both using the built in hash.

Now, game B inserts 2 characters in front of their data, bringing the data size to 12 bytes. Note that the values of the 2 chars don’t actually matter at this point because suddenly games A and B have different sizes, which is (theoretically) enough to change the hash value.

The author of game A also decides to add 2 characters, bringing the sizes of game A’s data and game B’s data to the same value: 12 bytes. Let’s presume they both chose different values for these identifying char values. All is good.

Game C, however, uses 12 bytes of data and stores that data at address 40. (Can you see where this is going?)
Previously it didn’t clash with anything, and now suddenly it’s now clashing with game A and game B.

Suddenly games are chasing each other’s tails.
(Granted it’s statistically very unlikely, but stastistically unlikely scenarios must be considered.)

Now, rewind back to the start a moment. What would have happened if game B had decided to switch to another hash function?

It would no longer be clashing with A, and as long as A didn’t use the same hash function as B, A would also no longer clash with B. (The odds of two hash functions producing the same hash for the same data is probably quite tiny.)

If both games had switched to the same hash function, one of them would have to change again, but they don’t run the risk of getting game C involved because the size of the data didn’t change.

While I think of it, I should point out that the very act of a game’s data increasing in size will actually invalidate existing save data and will run the risk of clashing with other games that were previously using the same address but a different size of data.

Incidentally, I actually future-proofed Minesweeper by building in a mechanism that would allow save data to grow and shrink as necessary without invalidating the hash value.
(I get the feeling you’re going to ask if that could be added to Arduboy2EEPROM and then I’m going to regret telling you… ( -_-'))


Hopefully I’ve just explained why inserting characters is not a complete solution.
Granted, neither is using a different hash really.

I can think of half a dozen other approaches (my favourite is probably to exclusive or the hash value with a randomly chosen per-game or per-user identifier since that seems to combine the best of both worlds), but pretty much everything is going to fall prey to some stastistical anomaly somewhere down the line, and I’m not really a good enough mathematician to statistically prove which occurances are more likely, I can only prove by logic and example.

More accurately, Filmote started doing it for his games and then taught several other people to do it.

Granted it was better than not doing it because it at least detects whether the game has actually been run before, but in the grand scheme of things it’s not a massive improvement.

It’s also a great way to encourage people to copy and paste code without attempting to understand what the code is doing, which is frankly just setting them up for a fall later on, and it allows the no-effort script kiddies to prosper.

The code I wrote intentionally requires more than just the code presented to actually do anything useful. It showcases the important parts such that someone who understands the features used could work out what’s going on and adapt it, but it won’t work through copy and pasting alone, it needs to be modified and integrated.

Well that was easier than I was expecting.

To clarify, if the size of HashType or the hash implementation ever changed, that would be a breaking change, hence I had to give an example of a change that would technically not be a ‘breaking’ change.

(The more I think about it, the more I think perhaps I should go ahead with the struct approach just so there’s no ambiguity as to what the interface/API is supposed to be…)

This is true, but it’s the price that must be paid.

You could do some crazy attempt at wear levelling by moving the data around each time, but that’ll just end up occupting more of the already scarce EEPROM.

Besides which, some games will have save patterns like that anyway. A simple scoreboard might not, but an RPG is potentially going to be overwriting the player coordinates every time. Anything attempting some kind of time stamp will definitely be overwriting that every time.

It could possibly be mitagated by using 4×8-bit hashes or 2×16-bit hashes, and treating the data as 2-4 streams of parallel bytes, which would theoretically make certain bytes of the hash change less frequently, but I would expect that to weaken the hash (though I have no clue by what factor) .

1 Like

Alternative:
Rather than consuming precious EEPROM, an option to pass a unique ‘seed’ (first byte) to the hash function (i.e. not in the main data struct). Would need to pass the same key on write and read. Default is unused / 0x00, perhaps an overloaded function? Then it just uses PROGMEM / RAM if someone wants to pass their unique seed value. Ownership could be managed with a simple shared document… until all 256 values are claimed… :wink:
I’m staking my claim to 0xAD !!

1 Like

— Me: …weakly raises hand and waves… :grimacing:

It’s possible, but I’m still reluctant if it isn’t actually needed.

It would be backwards compatible, so it could be delayed until a problem has actually been demonstrated to exist.

The problem there would be getting people to register their value, since we have users beyond the forum.


To clarify: the user would pick some 4-byte number and then it would be ^ed with the hash. It would have most of the same properties as your suggestion, with a larger number value. I’m as good as certain that it would guard against the scenario where games A and B use the same address and size by giving them different ids, but I am worried about the potential of one game’s hashValue ^ identifier clashing with another games purely because of the nature of ^.

E.g. if hash A were 0x55555555 before the exor, and it were exored with 0xAAAAAAAA (id A) to get 0xFFFFFFFF, and hash B were 0x44444444 before the exor, and it were exored with 0xBBBBBBBB (id B) then the result would also be 0xFFFFFFFF. Surely the odds must be pretty high, but I don’t have the mathematical skill to say how high or how it compares to the odds of injecting a single byte resulting in the same effect.


Seriously though, there are people out there who just copy and paste code they find online without even attempting to understand it.

On StackOverflow I’ve seen some pretty dour “I copied and pasted this code that I found online and it isn’t working, what’s wrong with it?” questions.

They had so many homework questions that someone wrote an ‘open letter’ begging people to be more considerate and to stop cheating themselves out of the learning experience.

Even as a joke, it’s unfair to compare yourself to those kind of self-sabotaging time wasters.
(Fortunately we don’t really get many here on the forum.)


(Of course, if there comes a time that you want to get more into programming or try some trickier stuff, you know there’s plenty of people here who can help you out.)

1 Like

Fair enough. Here’s some data:
Out of a survey of 287 games, of which 140 used EEPROM, there are 5 address+length collisions, affecting 11 games, so ~3.8% of games surveyed.

Details:

Start: 16 , End: 16 —

  1. Ardulem
  2. Blocks
  3. Harambes Revenge

Start: 16 , End: 19 —

  1. Fatsche
  2. Pyoro

Start: 16 , End: 22 —

  1. Pipes
  2. SFCave

Start: 16 , End: 70 —

  1. Shattered Lands: Towers of Perdition
  2. Shattered Lands 2: Sea of Despair
    – Hmmm… this might be deliberate. Not sure how we should handle that @filmote !?..

Start: 100 , End: 134 —

  1. Asteroids
  2. Snake

I’m not sure if ~3% is a rare problem…?
Personally, I will still use & advocate 1–2 bytes of signature.
There’s also a theoretical benefit of passing in more data, regarding ‘spinning-up’ the state of the hash (relevant when only storing a few bytes).


– Yeah, my instinct is that’s not good, but practically may be ok… :man_shrugging:

Harambe’s Revenge is actually an exceptional case because it fills the entirety of EEPROM due to how it’s designed.

I think I asked if we had permission to change that many moons ago and the answer was ‘yes’. I’ll check the PM tomorrow.

It seems that nearly all of these developers are actually starting from the beginning of (permissible) EEPROM instead of trying to spread their data, which is why every single collision in that list has a start address of 16.

In which case, I’d argue that’s the real problem here.

@filmote could probably bump pipes down a few addresses.

Adding hash support would break backwards compatibility anyway, so the rest of these may as well have their addresses moved if/when they’re updated to the hashing technique.

Shattered Lands wasn’t a @filmote game. That’s @tuxinator2009’s work.

But it probably is intentional. I think it’s supposed to be like LodeRunner - multiple executables reusing the same save data.

(I’m not sure Sea of Despair was ever even finished?)

This I would take to be a genuine example, except I happen to know (without even clicking the links) those are both by the same person (because I remember him making them), thus that’s probably a case of copy & pasted code not properly updated.


I’m uncertain. I ran some numbers but couldn’t get my head around what I actually calculated.

I think I discovered that out of the 18,446,744,073,709,551,616 (232×232) possible (ordered) pairs of 32-bit operands to the exor operator, 576,460,752,303,423,488 (1⁄32) will produce the same value. I.e. that each 32-bit number as 576,460,752,303,423,488 possible (ordered) ‘factors’ (or summands?).

This was based on observing that by drawing the exor tables for 1 bit, 2 bit and 3 bit quantities, a duplicate value occupied 1⁄2, 1⁄4 and 1⁄8 of the table respectively. So actually maybe it’s 1⁄64? I don’t know, I’m not very good at maths.

Even if that’s accurate, I’m not sure whether or not that implies any collision statistics. Even if it does, I don’t know what the statistics of the other technique would be either.

I’ll think about it more when I next have chance.

1 Like

I checked.

Among other things, I said:

One user has already created a version that only uses a fixed amount of EEPROM

And the response was:

I think probably it would be a good idea to update the main branch with it, I’m just wondering how much it was tested.

So I’m presuming Akkera’s version was used?

And I think I found the source for that version this time (I didn’t spot it when responding to the other thread earlier):

In which case, yes, it probably collides with the other two, though it actually already uses a three character guard value, so the collision wouldn’t be a problem.

Also, I seem to calculate it as actually using 11 bytes (3 characters plus an 8 byte HighScoreEntry struct), meaning it spans from 16 to 27.

Hang on, these games use only 1 byte at address 16?
Are you sure about that?

Thinking about them again now that I’m not keeling over at 7 to midnight…

Of those 11 clashing games you identified:

  • Asteroids and snake were written by the same person, and are a case where the code was just reused and the address not changed, so I’m discounting those.
  • Shattered Lands 2 was never finished, so that value either wasn’t supposed to be the final value, or it was intended that the player’s data should carry across to the new game, so I’m ruling those out too.

That leaves 7 games that don’t have mitigating circumstances, and all of them start at address 16, so I’m inclined to declare that this problem only exists here because the developers didn’t attempt to pick a different address for their game.

If all those developers had done the proper thing (pick a random address further along the data pool) then your list wouldn’t be anywhere near as long.

Thus I think it’s safe to rule the clashing problem as uncommon enough (for the moment anyway) that it’s not worth adding any kind of extra library support. If it becomes a problem later down the line then it can be reexamined, but for now I think it’s better to just forget about it.

To put it another way: I could sit down and do a bunch of boring and awkward attempts at calculating some statistics to decide what’s best to fix a problem that only (just about) affects 11 games (out of a possible 287 - i.e. 276 games do not have this problem), or I could spend that time focusing on more realistic issues like writing help documents and deciding whether it’s worth having a ‘safe’ and ‘unsafe’ version of the API.

I think it’s also worth pointing out that there’s nothing the Arduboy2EEPROM library will do that people couldn’t do themselves, so if anyone wanted to attempt any of the techniques discussed (the 2-char value, a uint16_t value, an extra byte inserted into the hash calculation, exoring the hash with a uint32_t) they are completely free to do so, providing that they have the skill/ability to do so.

But a guard - like I use - isn’t enough.

No, but the scenario we’re currently discussing is (hypothetically) if every game were using the hash feature that will be provided by the Arduboy2EEPROM library.

Under that scenario, any two games that use exactly the same start address and size of data would end up determining each other’s hashes to be valid (because they’d find the hash in the same place and be attempting to validate the same amount of data) and thus misinterpreting each other’s data. (Which is why @acedent was highlighting 11 games that fall under that collision criteria.)

Some kind of game identifier value (be it 2-3 characters or a numeric identifier) would be enough for the game to know whether or not the data was its own even if the hash gave a false positive.