Smart Response XE Re-purposed into Arduboy

(serisman) #363

Ahh… that sucks, although having a more accurate progress bar is useful.

I have been able to make a bit of progress on reducing upload speed, but it still seems slower than it should. When I started, it was taking around 52 seconds to erase/write/verify around 11 kB of flash. After removing or reducing a lot of the delays (in the Arduino sketch), I got it down to around 38 seconds.

I think we are running into a limitation on how fast data can be sent over the debug interface. The program writes a page worth (2 kB) of data to the XDATA memory a (64 byte) packet at a time. Then, it writes a cpu code bundle to XDATA memory that reads the page of data from XDATA and writes it to flash. Then it points the CPU at the program in memory, and starts it up. It will halt at the end of the cpu code, and the program is waiting for the cpu to be halted before continuing with another command.

For each (64 byte) packet, in addition to setting up the address at the beginning of a packet, writing each byte (in a loop) to XDATA memory takes seven 8-bit writes (plus three 8-bit reads). It executes (3) debug/cpu instructions:

  • [0x56,0x74,0x(byte)] - MOV A, #byte; // write byte to accumulator
  • [0x55,0xF0] - MOVX @DPTR, A; // write accumulator to XDATA based on pointer location
  • [0x55,0xA3] - INC DPTR; // increment pointer location

We are bit-banging the debug interface which means we are not transferring data as quickly as we could be, but even still, this doesn’t seem like a very efficient way of programing flash.

There is a faster way of doing this on newer CC25xx devices using DMA, but unfortunately it looks like the CC2430 doesn’t support using the debug interface as a DMA source. :frowning:

(serisman) #364

Ok, I figured out how to get a MAJOR speed improvement…

The .NET app has a pretty big issue with it’s sendCommand method, where it sends each byte separately and sleeps the thread for at least a millisecond between each byte (actual sleep is dependent on the OS scheduler). So, we aren’t even getting the data to the Arduino very rapidly. :man_facepalming:

I checked in a new version to my fork ( that fixes the sendCommand method. I also refactored the Arduino code to make it easier to follow and remove a bunch of delays.

After both of these changes, I can now erase/write/verify my 11 kB test .bin file in 8.3 seconds! It is under 5 seconds to just erase/write.

I’m happy enough with those results (for now)! :smile:

(GabyPCgeeK) #365

I added on the fly baud rate changing. Been testing and:

64KB file with no empty pages

Old Code:

Write & Verify
115200 baud - 379.3s

115200 baud - 74.5s

New Code:

Write & Verify
115200 baud - 60.7s
250000 baud - 38.9s
500000 baud - 38.8s
1000000 baud - 38.4s
2000000 baud - Error (Seems to be some buffer and timing issue)

115200 baud - 23.3s
250000 baud - 18.3s
500000 baud - 17.8s
1000000 baud - 17.7s
2000000 baud - 17.6s

After 250000 baud there doesn’t seem to be much of an improvement.

I’m going to try with direct port manipulation or spi

(serisman) #366

Excellent testing. I’m glad to see that the changes are working for you too.

By the way, by using the FastGPIO library, we are already doing direct port manipulation.

I have an idea for how we might be able to make use of the hardware SPI port to stop bit-banging which I will hopefully be able to try out tonight.

Also, I noticed that the protocol send two bytes (ascii hex code) for every byte of data. We could probably save some bandwidth by changing that back to sending the actual byte instead of the ascii/hex version a the byte.

(GabyPCgeeK) #367

Made the SPI work using the Arduino SPI library. Connect both MISO, MOSI to DD. For Writing set MOSI as output and do SPI.transfer. For Reading set MOSI as input and get value form SPI.transfer(0). SPI_MODE1 seems to work but with errors when SPI speed > 1Mhz (only tested 1 and 4 Mhz). Seems to work best when setting SPI_MODE1 when reading and SPI_MODE0 when writing (1, 4 and 8Mhz works).

The Read and Write speeds seem to be the same as with bit-banging.

(serisman) #368

Oh interesting, I would have thought that connecting MISO and MOSI together would have caused conflicts. I have a hardware solution in mind that should reduce the conflict, but maybe it isn’t necessary.

Its odd that you are getting errors when SPI speed is > 1MHz. According to the timing diagrams, we couldn’t possibly drive the debug interface faster than it allows (at 16 MHz). Maybe some weird capacitance/resistance is creeping in because of how things are connected? What does your test setup look like?

(serisman) #369

We might still have some bottlenecks with the .Net application talking to the Arduino then. I have a Node.JS app (command line) in development that might produce better results.

(GabyPCgeeK) #370

There doesn’t seem to be conflicts because I don’t use the received value when Writing and when Reading I put MOSI as input and send 0x00 so it’s supposed to be hi-z and not change to pullup.

Could be. It’s not the prettiest thing.

(serisman) #371

I finally threw this on a (cheap) logic analyzer to get a look at the actual timings.

Here is the ‘byte period’, i.e. what it looks like to send one byte into xdata memory:

This is using the currently checked-in bit-banging approach.

The total period (to write 7 bytes and read 3 bytes) is about 80 uS. (again, it takes 7 writes and 3 reads of the debug interface to get one byte into xdata memory).

For writing a bit (i.e. dbg_write), the clock is high for about 0.2 uS, and low for about 0.7 uS, and takes about 7.7 uS total to write the whole byte. (i.e. start of write to start of next write).

For reading a bit (i.e. dbg_read), the clock is high for about 0.4 uS, and low for about 0.3 uS, and takes about 6.4 uS total to read the whole byte. (i.e. the end of previous write to the end of read).

The whole 64 byte packet takes about 5.25 mS. to ‘send’ through the debug interface. So, sending a whole 1 kB could in theory be done in 84 mS, or sending all 64 kB would only take 5.4 seconds (minus other overhead). So, bit-banging may not be hurting us too much.

By the way, just using hardware SPI won’t necessarily help with the delay between writes/reads, which is about 1.5 uS of the 7.7 uS.

The bigger issue is that it seems to be taking around 15ms. between each 64 byte packet. So the debug interface is only even active 1/4 of the time.

So, it may be more beneficial to focus on reducing the latency in the communication before worrying too much more about the speed of toggling bits.

To reduce the latency, I think switching away from the two ascii byte hex code per byte will help, as well as potentially increasing the packet size and uart baudrate.

(serisman) #372

I found another easy optimization…

In the Arduino sketch, in the CMD_XDATA_READ method, it calls printHex(data) for every byte. That method was doing Serial.print(data, HEX); which apparently is pretty slow.

Before (takes ~85 uS before the serial character starts printing after it is read from memory):

I changed the printHex method to use a simple lookup table:

char nibbleToHex[] = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};

void printHex(unsigned char data) {
  byte nibble1 = data >> 4;
  byte nibble2 = data & 0xF;

and now it is noticeably faster (character starts printing after less than 5 uS):


This is also after switching to a baudrate of 250000, and increasing my serial buffer to 128 (from 64).

My ~11 kB erase/write/verify test .bin (hangman) is about 6.3 seconds now (almost 10x faster than original!).

A full read of 64 kB is about 15 seconds.

After increasing the baudrate to 500000 and increasing the packet size to 120, I can erase/write/verify (reliably) in ~4.9 seconds now. A full 64 kB read is still ~15 seconds.

(Larry Bank) #373

A quick update about my experiments with the nRF52840 dongle (to use as a wireless hub for OTA sketch uploading to the XE)

The dongle is a great piece of hardware for $10. Nordic’s SDK is a cryptic mess of elephantine proportions. To make use of any subsystem of the SoC (e.g. USB serial port emulation) requires a huge amount of cryptic code that isn’t properly documented. The samples which come with the SDK are mostly for their other boards. By hacking on an existing sample, so far I’m able to blink LEDs and make use of the virtual COM port. Hopefully the 802.15.4 functions are not too complicated to get going. I’m optimistic that I can use this as a hub. I’ll share updates as I make more progress.

(GabyPCgeeK) #374

Made an experimental branch:

*More of a binary protocol. (Commands and Tokens still text but Address, Sizes, Data in binary)
*Increased PACKET_SIZE to 192
*Changed SERIAL_RX_BUFFER_SIZE to 256 (This is done to the arduino core and not in the CC_Flash.ino). Baud 2000000 now works.

With those changes:
Using 64KB file with no empty pages

	Packet Size - 128Bytes
		SPI          2000000 baud - 20.5s
		Bit-bang  2000000 baud - 20.5s
	Packet Size - 192Bytes
		SPI          2000000 baud - 15.4s
		Bit-bang  2000000 baud - 23.4s


	Packet Size - 128Bytes
		SPI       128B  2000000 baud - 8.8s
		Bit-bang  128B  2000000 baud - 8.8s
	Packet Size - 192Bytes
		SPI       192B  2000000 baud - 6.2s
		Bit-bang  192B  2000000 baud - 8.8s

The speeds are about the same for 1000000 baud (At least on SPI).


Something is telling that cranking up the baudrate is not a very good thing.
Yes, I like the idea that you knocked a second from the writing of the program by increasing baudrate from 250000 to 500000, but it was just one second and the rate is a 100% increase. That sounds not as good.

My way of making things more reliable.

Awesome software! Like this thing. Fancy stuff with clear graphics. Appreciate the effort you improved. Could be also used in conjunction for @Crait’s screen buffer being sent over serial?

(serisman) #376

@GabyPCgeeK - Nice. Thanks for taking this further.

I took the night off, but will throw it on the logic analyzer again in a day or two to see if anything else jumps out.

Actually, the other thing I saw that might help is to not wait for the whole packet to be downloaded before starting to process the command on the Arduino. There could be built in waits in real time if the debug protocol ends up being faster than the serial protocol. i.e. intermingle Serial.available and code down in the individual command methods as needed. This would mean giving up the checksum verification, although does that really matter if we can do a verify after write anyway?

Either way, this is so much better than before, and at the moment I’m not sure how much faster it really needs to get. As fun as the optimization process may be we are probably approaching the point of diminishing returns.

(Larry Bank) #377

I received a box of SMART Response PE’s and they’re different from the older models. The newer model (03-00220) is based on the CC2533 SoC. Similar, but slightly less RAM. @serisman is working on changes to the toolchain to get them working with SDCC+CC.Flasher. I don’t need 32, so if anyone would like a few, I’ll sell them for $2 each + shipping. 2-3 can fit well in a USPS Priority mail small box ($7.20 for 2-day shipping).

(serisman) #378

UPDATE: I got it working!

It looks like the onboard ATmega16U2 was not totally dead, but just had messed up fuses.

I was able to whip up a high voltage parallel programmer using a spare Arduino Uno (, and it indicated that the high byte fuse somehow had become 0x06, which means that (among other things) the reset pin was disabled (hence the ISP interface was no longer working). I’m still not sure how the fuse got messed up. It was either a bad firmware update from the original firmware on the ATmega128RFA1 or possibly a weird quirk with the chip erase command on this IC. But, it was easy enough to change it back to a reasonable value (with the HVP) and commence with normal ISP programming.

I then changed out the ATmega16U2’s 8 MHz crystal for a 16 MHz one and uploaded the standard UNO R3 ‘bootloader’ (Arduino-COMBINED-dfu-usbserial-atmega16u2-Uno-Rev3.hex) to the ATmega16U2 along with these fuse settings: lfuse: 0xFF, hfuse: 0xD9, efuse: 0xF4, lock: 0x0F

This allows the computer to see the ATmega16U2 as a USB CDC serial device (it now looks like an Arduino Uno to the computer).

I also uploaded a custom compile of the Optiboot bootloader to the ATmega128RFA1 (using UART1), and connected the ATmega16U2’s PD7 pin to a 1uF capacitor and then to the ATmega128RFA1’s reset pin (not sure why, but a 100nF didn’t give a long enough reset pulse). This allows me to upload programs directly to the ATmega128RFA1 through the Arduino IDE just like you would do for an Arduino Uno (or pretty much any other Arduino) without any other external programmer.

This makes it much easier to ‘hack’ on these Smart Response XE Receivers. I have a few extra receivers available if anyone wants one (already converted) for a reasonable price. Send me a message if interested.

Hardware mods:

(Larry Bank) #379

I have a bunch of $1 CP2102 usb->serial adapter boards, so bypassing the 16U2 and getting easy Arduino IDE access isn’t very difficult.

(GabyPCgeeK) #380

@serisman how can I send message? I have looked around but can’t find any send message option.

(serisman) #381

Yep, that certainly is an option as well. I just wanted to make use of the hardware that was already there, and wasn’t happy to leave it broken. Technically, the ATmega16U2 could be used for more advanced purposes as well, like USB HID interfaces. It now has the standard DFU bootloader on it, making re-programming it fairly easy even without a ISP programmer.

If you click on my username it should take you to my profile page where there should be a ‘Message’ button in the upper right corner. If the button isn’t there, maybe you don’t have enough badges?

(Larry Bank) #382

I just opened the receiver that came with my set of PE’s and it’s the same as the XE receiver (ATmega128RFA1 + 16U2). I have no use for it, so if anyone wants it, make me a reasonable offer and it’s yours.