I haven’t read through the whole thing yet, but I feel the need to point out that the ‘cycles’ as quoted by the article are meaningless for the Arduboy.
Cycles are an architecture-specific (and sometimes CPU-specific) measurement and can’t be generalised.
Likewise the table of machine code instructions are only relevant for the architecture that’s being analysed and may not be comparable to what would be produced/required for AVR.
I haven’t found a place where the article mentions which architecutre is being used, but without further information I’m guessing it’s MOS 6502 purely based on the presence of instructions like
txa et cetera, which I take to refer to X, Y, and A registers, which would match the MOS 6502’s register set.
Modulo is division by definition - it’s the remainder of the division rather than the quotient.
Some CPUs actually have an instruction that produces both (often in different registers, sometimes in dedicated or special purpose registers), which allows the compiler to optimise a division and a modulo performed on the same operands in close proximity to each other.
digit = value / 10;
value = value % 10;
Some programming languages even include a standard library function for getting both components (quotient and remainder) in a single operation. E.g. C++ has
std::div et cetera, which returns a simple
struct containing the quotient and remainder. Haskell has
divMod. (The difference between the two is rounding characteristics - the former truncates integer division towards zero, the latter truncates towards negative infinity.)
Unfortunately AVR is not one of those. It doesn’t have a division instruction at all - division must be performed by algorithm.
I.e. the compiler actually inserts a sequence of instructions that implement a division algorithm. Sometimes it can optimise if the operands have suitable characteristics, other times it can’t and just inserts the whole thing. (I believe it has the sense to generate and call a function after a certain point though.)
I read the whole thing. (Seems I was right about the 6502.)
Unfortunately the Python source code at the end doesn’t appear to be the whole line drawing algorithm, only part of it, and as far as I can tell there’s no source code for the demo either, only precompiled code.
sorted(bitreverse[y1:y2])[(x2-x1) % (y2 - y1)] in the article makes some sense, but I’m not sure what that value is actually being used for.
One thing I’m certain about is that the technique depends on a 256 element lookup table, so it’s likely to consume a lot more progmem than the existing line drawing function.
Unless there’s an actual demand for faster line drawing, I’m not sure it’s worth trying to decipher what the complete code should actually look like. If the article were more comprehensive I’d have been tempted to give it a try, but as it stands there’s too little information and I for one can’t be bothered to do the work to fill in the gaps.