I am trying to implement a fast 256bit integer addition with overflow detection. The fastest code I have so far is:

```
void add256b(uint32_t* a, uint32_t* b){
uint64_t sum_carry = 0;
for (int8_t i=0; i<8; i++){//calculate a + b
sum_carry += (uint64_t)a[i] + b[i];
a[i] = sum_carry;
if (sum_carry > 4294967295) sum_carry = 1;
else sum_carry = 0;
}
...
}
```

I am looking for ideas of how to make it faster. I was trying to use the internal carry flag that is bit zero in the SREG variable. With it one should in principle be able to do something like

```
a[i] += b[i] + carry;
carry = SREG & 0x01;
```

This would be helpful, because then the carry only needs to be a one byte integer and a[] and b[] could be 64 bit integers, reducing the number of passes through the loop. However this does not work, because SREG may not hold the flags from the prior addition. The C optimizer can mix things up, because it doesnāt understand the dependence of SREG to the prior addition. Any ideas?