Reversing Bits In A Byte

I am working on a small project that requires interfacing a HD44780 20×4 LCD display to an ATTINY 2313 chip. While designing the PCB for this project, I needed to route the lower nibble of PORTB (PB0-PB3) to the high nibble of the LCD data (D4-D7). Keeping the port mapping straight (PB0->;D4, PB1->;D5, etc) required the addition of at least 6 vias. Reversing the order of the bits ( PB0->;d7, PB1->;D6,etc) required no additional vias, but the content of the lower nibble in PORTB would have to be reverse in code in order to align the data properly. This post documents the analysis I did on different bit reversing routines to determine which one was the most efficient for the task.

I found two viable C implementations to reverse the bits in a byte. Listing 1, illustrates the C code for the reverseByteWithForLoop implementation and the assembly code generated by the compiler. This functions loops through every bit in a byte and reverses every bit that is set. The function is 28 bytes long and takes 92 cycles to run when the data is 0x00 and 100 cycles when the data is 0xFF. It represents 1.37% of the 2K of flash memory in the ATTINY2313.

uint8_t reverseByteWithForLoop( uint8_t num )
{
uint8_t bit;
uint8_t output;
for( int count=1;count>;1;
output = output<;if(bit==1)
output = output+1;
}
return output;
}
/* ===============ASSEMBLY CODE ==================
 ldi	r18, 0x08			; 1 cycle
 ldi	r19, 0x00			; 1 cycle

 mov	r20, r24			; 1 cycle
 andi	r20, 0x01			; 1 cycle
 lsr	r24					; 1 cycle
 add	r25, r25			; 1 cycle
 cpi	r20, 0x01			; 1 cycle
 brne	.+2      			; 1 or 2 cycles
 subi	r25, 0xFF			; 1 cycle
 subi	r18, 0x01			; 1 cycle
 sbci	r19, 0x00			; 1 cycle
 brne	.-20     			; 2 cycles

 mov	r24, r25			; 1 cycle
 ret					; 4 cycle
*/

Listing 2, contains the source code and the assembly code generated by the compiler for the function reverseByteWithShifts. This function is 32 bytes long and takes 19 cycles to run independently of the data. The size of the function represents 1.56% of the available flash memory in the ATTINY2313.

uint8_t reverseByteWithShifts( uint8_t x )
{
x = ((x >;>; 1) &amp; 0x55) | ((x <;<; 1) &amp; 0xaa);
x = ((x >;>; 2) &amp; 0x33) | ((x <;<; 2) &amp; 0xcc);
x = ((x >;>; 4) &amp; 0x0f) | ((x <;<; 4) &amp; 0xf0);
return x;
}
/* ===============ASSEMBLY CODE ==================
 mov    r25, r24            ; 1 cycle
add    r25, r25            ; 1 cycle
andi   r25, 0xAA           ; 1 cycle
lsr    r24                 ; 1 cycle
andi   r24, 0x55           ; 1 cycle
or     r25, r24            ; 1 cycle
mov    r24, r25            ; 1 cycle
add    r24, r24            ; 1 cycle
add    r24, r24            ; 1 cycle
andi   r24, 0xCC           ; 1 cycle
lsr    r25                 ; 1 cycle
lsr    r25                 ; 1 cycle
andi   r25, 0x33           ; 1 cycle
or     r24, r25            ; 1 cycle
swap   r24                 ; 1 cycle
ret                        ; 4 cycle
*/

Listing 3, contains the last function which is my own implementation in assembly language. It is 12 bytes long and uses a loop to reverse all the bits in a byte. It takes this function 37 cycles to complete the task. The size of the function represents 0.59% of the available flash memory in the ATTINY2313.

#if (__GNUC__ * 100 + __GNUC_MINOR__) <; 303
#error &quot;This library requires AVR-GCC 3.3 or later, update to newer AVR-GCC compiler !&quot;
#endif

#include <;avr/io.h>;
.global reverseByte
.func   reverseByte
reverseByte:
ldi        r25, 0x80
rotate_bit:
rol        r24
ror        r25
brcc       rotate_bit
mov        r24, r25
ret
.endfunc

Table 1

Table 1, illustrates the characteristics previously outlined for each one of the functions. This analysis begs the question, which one of these functions is better? Well, it depends. If execution time is the primary concern, reverseByteWithShifts is the best choice at the expense of 1.56% of flash memory. On the other hand, if code size is the driving concern, reverseByte is the best candidate, but it takes twice as long as to run. ReverseByteWithForLoop is smaller than reverseByteWithShifts, but takes the longest to run. Its execution time is dependent on the input data, which makes it a bit more complicated to use in an project that requires accurate timing. To see other alternatives and how this topic evolved over time, visit AVRFreaks. As far as my project goes, reverseByte is the way to go since space concern is more important than execution timing.