I am working on a small project that requires interfacing a HD44780 20×4 LCD display to an ATTINY 2313 chip. While designing the PCB for this project, I needed to route the lower nibble of PORTB (PB0-PB3) to the high nibble of the LCD data (D4-D7). Keeping the port mapping straight (PB0->;D4, PB1->;D5, etc) required the addition of at least 6 vias. Reversing the order of the bits ( PB0->;d7, PB1->;D6,etc) required no additional vias, but the content of the lower nibble in PORTB would have to be reverse in code in order to align the data properly. This post documents the analysis I did on different bit reversing routines to determine which one was the most efficient for the task.
I found two viable C implementations to reverse the bits in a byte. Listing 1, illustrates the C code for the reverseByteWithForLoop implementation and the assembly code generated by the compiler. This functions loops through every bit in a byte and reverses every bit that is set. The function is 28 bytes long and takes 92 cycles to run when the data is 0x00 and 100 cycles when the data is 0xFF. It represents 1.37% of the 2K of flash memory in the ATTINY2313.
uint8_t reverseByteWithForLoop( uint8_t num ) { uint8_t bit; uint8_t output; for( int count=1;count>;1; output = output<;if(bit==1) output = output+1; } return output; } /* ===============ASSEMBLY CODE ================== ldi r18, 0x08 ; 1 cycle ldi r19, 0x00 ; 1 cycle mov r20, r24 ; 1 cycle andi r20, 0x01 ; 1 cycle lsr r24 ; 1 cycle add r25, r25 ; 1 cycle cpi r20, 0x01 ; 1 cycle brne .+2 ; 1 or 2 cycles subi r25, 0xFF ; 1 cycle subi r18, 0x01 ; 1 cycle sbci r19, 0x00 ; 1 cycle brne .-20 ; 2 cycles mov r24, r25 ; 1 cycle ret ; 4 cycle */
Listing 2, contains the source code and the assembly code generated by the compiler for the function reverseByteWithShifts. This function is 32 bytes long and takes 19 cycles to run independently of the data. The size of the function represents 1.56% of the available flash memory in the ATTINY2313.
uint8_t reverseByteWithShifts( uint8_t x ) { x = ((x >;>; 1) & 0x55) | ((x <;<; 1) & 0xaa); x = ((x >;>; 2) & 0x33) | ((x <;<; 2) & 0xcc); x = ((x >;>; 4) & 0x0f) | ((x <;<; 4) & 0xf0); return x; } /* ===============ASSEMBLY CODE ================== mov r25, r24 ; 1 cycle add r25, r25 ; 1 cycle andi r25, 0xAA ; 1 cycle lsr r24 ; 1 cycle andi r24, 0x55 ; 1 cycle or r25, r24 ; 1 cycle mov r24, r25 ; 1 cycle add r24, r24 ; 1 cycle add r24, r24 ; 1 cycle andi r24, 0xCC ; 1 cycle lsr r25 ; 1 cycle lsr r25 ; 1 cycle andi r25, 0x33 ; 1 cycle or r24, r25 ; 1 cycle swap r24 ; 1 cycle ret ; 4 cycle */
Listing 3, contains the last function which is my own implementation in assembly language. It is 12 bytes long and uses a loop to reverse all the bits in a byte. It takes this function 37 cycles to complete the task. The size of the function represents 0.59% of the available flash memory in the ATTINY2313.
#if (__GNUC__ * 100 + __GNUC_MINOR__) <; 303 #error "This library requires AVR-GCC 3.3 or later, update to newer AVR-GCC compiler !" #endif #include <;avr/io.h>; .global reverseByte .func reverseByte reverseByte: ldi r25, 0x80 rotate_bit: rol r24 ror r25 brcc rotate_bit mov r24, r25 ret .endfunc
Table 1, illustrates the characteristics previously outlined for each one of the functions. This analysis begs the question, which one of these functions is better? Well, it depends. If execution time is the primary concern, reverseByteWithShifts is the best choice at the expense of 1.56% of flash memory. On the other hand, if code size is the driving concern, reverseByte is the best candidate, but it takes twice as long as to run. ReverseByteWithForLoop is smaller than reverseByteWithShifts, but takes the longest to run. Its execution time is dependent on the input data, which makes it a bit more complicated to use in an project that requires accurate timing. To see other alternatives and how this topic evolved over time, visit AVRFreaks. As far as my project goes, reverseByte is the way to go since space concern is more important than execution timing.
Great information! I came across your post while checking for suggestions on the fastest and shortest assembly code for a routine to form a mirror image of a byte.
I had to reverse the order of the 4 bits in the lower nibble of a byte while simultaneously putting the reversed bits in the upper nibble. (In my case, the bits in the lower nibble after the routine were not relevant.) This was for an embedded 8051 assembly code program requiring both the shortest code and the fastest execution. I used the following:
RRA C ; carry holds original bit 0
RRA ; bit 7 now has original bit 1
RRA C ; bit 7 now has original bit 0, bit 6 now has original bit 1, and carry holds original bit 2
MOV BIT5,C ; bit 5 now has original bit 2
MOV C,BIT0 ; carry holds original bit 3
MOV BIT4,C ;bit 4 now has original bit 3
Note that the 8051 can shift/rotate only the accumulator, thru or without the carry bit.
The requirement was to convert the lower nibble of a register that had been incremented or decremented as needed for the brightness level to be serially sent to an LED driver (MAX7221). Unfortunately, the LED driver required that all the serial data it received be in the reverse order (MSB first) from how the UART of the 8051 shifts out the data (LSB first).
Admittedly, the routine which adjusted the register with the LED brightness level could have been changed to something other than just an increment/decrement routine, but doing so was slower and took more code.
Form posting also does not encode correctly.
I meant “& a m p ;”, but had to include spaces here, to prevent HTML encoding interpretation of this content.