A couple of things I thought when I looked into 2 bits per pixel greyscale on the Arduboy a while ago:
Because perceived pixel intensity is the result of pixel PWM at frame rate, missing a frame is baaaad so trying to keep up rendering from the application’s main loop like most (all?) games do is fragile and probably too restrictive. With 1 bit color if you miss a frame no one notices, but when alternating ON-OFF to get shades of grey missing frames changes the shade of grey the user sees. One way to solve this is decouple + prioritise i.e. to give display rendering priority over game logic (simplest way is to use an interrupt handler).
A full screen 2 bits frame buffer is likely too large for most games (~ 500 bytes for stack and game state left). Games could reduce the portion of the screen using shades of grey to reduce frame buffer size but… if rendering is decoupled from the game loop as suggested in the previous paragraph double buffering is needed.
My guess is that greyscale applications would end up having special render loops that fill (portions of) the screen from program memory (sprites for instance) and other tricks to reduce frame buffer size, or do away with it completely and use tile + sprite based rendering loops and double buffered tile/sprite maps like in most games for personal computers from the 80s. The bottom line is that writing greyscale games with so little RAM is not as easy as filling a frame buffer and pushing it to the screen every once in a while.