Although I like @drummyfish’s idea, I wonder how much code it would generate?
RLE is not very efficient if you cannot expand the data out and have to walk through the process every time you want a value.
You could look at RLE for each row (or col) and have an array of these encoded values represeting the cols. In this way, you are decoding a relatively small amount each time.
You seem to have 43 tile (6 bits). A simple RLE could use the top two bits to indicate length with 00 = 1, 01 = 2 … 11 = 4. However you have some longer runs so I wonder if you could have byte with its high bit set indicating that this is a run of more than one tile followed by the run length. Where the byte’s high bit is low, it indicates a single run.