I am working on a small project that requires interfacing a HD44780 20×4 LCD display to an ATTINY 2313 chip. While designing the PCB for this project, I needed to route the lower nibble of PORTB (PB0-PB3) to the high nibble of the LCD data (D4-D7). Keeping the port mapping straight (PB0->;D4, PB1->;D5, etc) required the addition of at least 6 vias. Reversing the order of the bits ( PB0->;d7, PB1->;D6,etc) required no additional vias, but the content of the lower nibble in PORTB would have to be reverse in code in order to align the data properly. This post documents the analysis I did on different bit reversing routines to determine which one was the most efficient for the task.
I found two viable C implementations to reverse the bits in a byte. Listing 1, illustrates the C code for the reverseByteWithForLoop implementation and the assembly code generated by the compiler. This functions loops through every bit in a byte and reverses every bit that is set. The function is 28 bytes long and takes 92 cycles to run when the data is 0x00 and 100 cycles when the data is 0xFF. It represents 1.37% of the 2K of flash memory in the ATTINY2313.
uint8_t reverseByteWithForLoop( uint8_t num ) { uint8_t bit; uint8_t output; for( int count=1;count>;1; output = output<;if(bit==1) output = output+1; } return output; } /* ===============ASSEMBLY CODE ================== ldi r18, 0x08 ; 1 cycle ldi r19, 0x00 ; 1 cycle mov r20, r24 ; 1 cycle andi r20, 0x01 ; 1 cycle lsr r24 ; 1 cycle add r25, r25 ; 1 cycle cpi r20, 0x01 ; 1 cycle brne .+2 ; 1 or 2 cycles subi r25, 0xFF ; 1 cycle subi r18, 0x01 ; 1 cycle sbci r19, 0x00 ; 1 cycle brne .-20 ; 2 cycles mov r24, r25 ; 1 cycle ret ; 4 cycle */
Listing 2, contains the source code and the assembly code generated by the compiler for the function reverseByteWithShifts. This function is 32 bytes long and takes 19 cycles to run independently of the data. The size of the function represents 1.56% of the available flash memory in the ATTINY2313.
uint8_t reverseByteWithShifts( uint8_t x ) { x = ((x >;>; 1) & 0x55) | ((x <;<; 1) & 0xaa); x = ((x >;>; 2) & 0x33) | ((x <;<; 2) & 0xcc); x = ((x >;>; 4) & 0x0f) | ((x <;<; 4) & 0xf0); return x; } /* ===============ASSEMBLY CODE ================== mov r25, r24 ; 1 cycle add r25, r25 ; 1 cycle andi r25, 0xAA ; 1 cycle lsr r24 ; 1 cycle andi r24, 0x55 ; 1 cycle or r25, r24 ; 1 cycle mov r24, r25 ; 1 cycle add r24, r24 ; 1 cycle add r24, r24 ; 1 cycle andi r24, 0xCC ; 1 cycle lsr r25 ; 1 cycle lsr r25 ; 1 cycle andi r25, 0x33 ; 1 cycle or r24, r25 ; 1 cycle swap r24 ; 1 cycle ret ; 4 cycle */
Listing 3, contains the last function which is my own implementation in assembly language. It is 12 bytes long and uses a loop to reverse all the bits in a byte. It takes this function 37 cycles to complete the task. The size of the function represents 0.59% of the available flash memory in the ATTINY2313.
#if (__GNUC__ * 100 + __GNUC_MINOR__) <; 303 #error "This library requires AVR-GCC 3.3 or later, update to newer AVR-GCC compiler !" #endif #include <;avr/io.h>; .global reverseByte .func reverseByte reverseByte: ldi r25, 0x80 rotate_bit: rol r24 ror r25 brcc rotate_bit mov r24, r25 ret .endfunc
Table 1, illustrates the characteristics previously outlined for each one of the functions. This analysis begs the question, which one of these functions is better? Well, it depends. If execution time is the primary concern, reverseByteWithShifts is the best choice at the expense of 1.56% of flash memory. On the other hand, if code size is the driving concern, reverseByte is the best candidate, but it takes twice as long as to run. ReverseByteWithForLoop is smaller than reverseByteWithShifts, but takes the longest to run. Its execution time is dependent on the input data, which makes it a bit more complicated to use in an project that requires accurate timing. To see other alternatives and how this topic evolved over time, visit AVRFreaks. As far as my project goes, reverseByte is the way to go since space concern is more important than execution timing.