Creating a Datafile

Hi, just wanted to open this thread to discuss ideas to create a data file for a sketch. Just brainstorming for now.

My first reflex regarding this was to put all flash data in a particular data sections using the linker and then use relative addressing from there. Something like

uint8_t hugeData[3000] __attribute__((section(.flashdata), align(<if required>))

and then use the flash functions with something like:

flash_seek(<offset given by flashcart writer> + &hugeData)

if that is achievable the location of data within the datafile must not be hardcoded and the relation between the datafile and the code is always ensured. Not sure it can work but it would be straightforward if we can make this happen without fiddling too much with makefiles and the like

How about taking data out of the code?
Make a folder for your data, sub-folders for each level, each individual file is an asset.
Run a tool that generates the data.bin and data.h. The header will have structs of pointers to each asset.
To access the data you’d call loadLevel and get a struct pointer in return.

Just beat me in in creating a thread for it!

My idea is to create a special header/script file that is fed into a parser tool which poops out a header file and bin file to include with your project.

Haven’t giving it’s format much thought yet but it could be something like:

constexpr uint24_t map1 %bitmap% = “map1.png”;
constexpr uint24_t map2 %bitmap% = “map1.png”;

then afterwards the created header file would be something like:

constexpr uint24_t map1 0x000000;
constexpr uint24_t map2 0x012340;

1 Like

Does this work also for arbitrary data? May feeling is that we miss something. For instance I plan to put also init data in there or lookup tables. Images and maps might be the main use case but there will be others as well.
For instance I have python scripts to generate init structures, map tiles and map data from a level map and I want to put these in the flash as well. Also we might need a way to “align” data in this memory to give users the opportunity to put them in an order that fits best the game requirement. E.g. put each texture to a page boundary, effectively wasting some memory per page (just an artificial example of course :wink: ).

Some kind of boundaries was something I had thought of too so that the “blocks” could be managed? Dude I need to go through and read all what you have wrote…

the idea is that any data will end up in the bin file.

basically there will be a kind of data pointer which starts ar 0 and is incremented after every data byte added to the bin file a special derective / keyword is replaced by it’s value by the script

constexpr u24_t label ;
“imagefile.png”, “imagefile.png”
" binaryfile.bin"
“text”,0xD,“more text”,0x00
<uint 24> 0x123456, LABEL1, …

the parser will/should also support labels. So it would be easy to create tables


This is all still in idea phase so nothings set instone about it’s format.

The scripts sound nice but maybe writing the scripts can be a long journey with a lot of support running at you. Maybe we can reuse the avr tools.
I did a quick test with avr-objcopy and this is what I ended up with after searching the web a littlebit:


uint8_t huge[3000] __attribute__((section(".flashcard"))) = {0xa5};

later in the code just to make sure:

huge[2000] = 5;

The 0xa5 is just to see if I can put data in it.
Now I just build my .elf file. All good, nothing like out of memory or the like because the .flashcard section is not used anywhere.

Now run the following command:

avr-objcopy -j .flashcard --set-section-flags=.flashcard=alloc,load --change-section-lma .flashcard=0 --no-change-warnings  -O binary <your game>.elf flashcard.bin

et voila:

hexdump flashcard.bin 
0000000 00a5 0000 0000 0000 0000 0000 0000 0000
0000010 0000 0000 0000 0000 0000 0000 0000 0000

Not sure how to use it but maybe something like flash_read( + &huge) would work. Also need to check arrays larger than 64KiB.
If that works we could make a macro out of

#define FLASHCARD __attribute__((section(".flashcard")))

and then

uint8_t huge[3000] FLASHCARD;

tada, like PROGMEM but for the flashcard. So the users could just work as they are used to work but use FLASHCARD instead of PROGMEM and use the new library to access this data. And of course run the one command to create the binary file.

What do you think?

1 Like

So true.

Thats cool! didn’t know that could be done. It would make adding data sctructures easy.
But I worry abou the commands an unexperience user would need to use (locating the .elf file)

Damn it. Ok I hate to admit this but it looks like this does not work anyway. I forgot that the addressing range of the AVR is limited. Also it seems like the section size is limited to 32KiB too. So creating huge data arrays with this approach is not feasible. I did more experiments this morning and failed to create big arrays due the above restrictions. Thinking longer about this it is kind of obvious that the addressing would have been an issue sooner or later.


I am curious to try my game logic on the devkit to see if the performance suits my requirements or if I need to improve in some cases. The code is in a way prepared, still needs some hands on in some parts but generally I added delay to required places to simulate long access times. What is open is the datafile and the 24bit pointer.

Do you have an early version of your python script? For now it would be sufficient for me with “binary file” support. Something like:


then getting an .h file I guess with the offsets in my flash section.
Would be great :star_struck:

Ah and:

do not forget to add support for comments. With huge data files users might want to put comments to the list to better understand the layout. I might be one of those users.

1 Like

Shame I didn’t spot this thread sooner, I could have warned you:

I’m confused, is this supposed to be C++ or some new data format?

Frankly I doubt there’s going to be a way other than writing our own tools.
We’re probably breaking new ground here.

I think the easiest way would be to just use XML or JSON to store the description of the data and then use an existing library to process that XML/JSON and bundle the data accordingly into a binary file/FX chip image.

We could do something like:

<?xml version="1.0" encoding="UTF-8"?>
	<image name="player">
		<frame index="0" source="/images/player/frame0.png"/>
		<frame index="1" source="/images/player/frame1.png"/>
		<frame index="2" source="/images/player/frame2.png"/>
	<image name="enemy">
		<frame index="0" source="/images/enemy/frame0.png"/>
		<frame index="1" source="/images/enemy/frame1.png"/>
		<frame index="2" source="/images/enemy/frame2.png"/>
	<text name="dialogue">
		<textblock index="0" source="/text/dialogue/block0.txt"/>
	<sound name="backgroundMusic">
		<track index="0" source="/sounds/music/track0.wav"/>
	<raw name="puzzle0">
		0x00, 0xFF, 0x88, 0x92,

And of course, we could then have all that data stashed in a .zip (a bit like the .arduboy format) to make it easy to move around.

And if anyone needs named offsets in their C++ code, have it also generate a header like this:

#pragma once

#include <uint24.h>

// Image Data Offsets

constexpr uint24_t image0Offset = 0x010000_u24;
constexpr uint24_t image1Offset = 0x011000_u24;
constexpr uint24_t image2Offset = 0x012000_u24;

// Text Data Offsets

constexpr uint24_t text0Offset = 0x013200_u24;

(If anyone’s wondering about the _u24, yes, that’s doable in C++. It’s called user defined literals.)

If nobody likes XML or JSON then there are other markup languages to choose from (like YAML) but those two are probably the most ubiquitous.
(I’m biased in favour of XML because C# has XML reading and writing facilities built in.)

By choosing a human-readable format we can write it by hand to start with,
but as tools are developed we can eventually move up to having GUI programs to generate the data with.

Another advantage of XML is that you can have ‘DTD’ files that allow you to assign common values to ‘entities’.
(There’s some examples here.)

Yes I would need them.

As long as the commandline version still exists. I prefer to avoid GUIs.

Maybe something like adding a binary file is still required. I use my images columnwise and convert the images to suit my needs accordingly. To keep the tools simple we can offer binary file tags as well so users can use their own conversion tools on the images and then create the datafile using other tools.

Why? What’s wrong with GUIs?

Pretty much everyone uses the native Arduboy format.

The idea would be that the thing that stuffs all the resources into a binary FX chip image would convert the linked images to Sprites format.
(The same with sound assuming there’s some tool that auto-converts.wavs.)

Or do you have some sort of custom image format that you use?

We could add format attributes, but I doubt there’s going to be much call for anything beyond .png, .gif and external raw data.

I vote JSON because I like it better.

EDIT: Doesn’t the Arduboy format already use JSON?

1 Like

Nothing wrong. Just my personal preference. I feel I am faster on the commandline and can avoid the mouse.

I do not use the Sprites class or something like that. Basically it is how the engine of my game works. It works on columns. So to work efficiently I fetch image data columnwise and process it for rendering. Other functions like “drawBitmap” I use rarely. But that’s only for my engine of course. I get your point. Just wanted to say that there might be other ways games would address data so we can be flexible in storing them via the new file format and not excluding them. At least for me something like the raw tag but for binary files would be sufficient.

<raw name="data0" source="data/data0.raw"/>

@Mr.Blinky, do the binary FX image files (or whatever they’re called - the files containing the raw binary data to be flashed onto the chip) have a specific format?

That would suit my needs. Thanks.

Don’t thank me yet, this is still just conceptual.

I’d like to throw together an example, but I don’t know what the output format would be like.

You mean the binary output format or the input format of the file (e.g. JSON, XML)?